[PATCH vhost 00/23] vdpa/mlx5: Pre-create HW VQs to reduce LM downtime

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH vhost 00/23] vdpa/mlx5: Pre-create HW VQs to reduce LM downtime
@ 2024-06-17 15:07 Dragos Tatulea
  2024-06-17 15:07 ` [PATCH vhost 01/23] vdpa/mlx5: Clarify meaning thorough function rename Dragos Tatulea
                   ` (22 more replies)
  0 siblings, 23 replies; 53+ messages in thread
From: Dragos Tatulea @ 2024-06-17 15:07 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Si-Wei Liu
  Cc: virtualization, linux-kernel, linux-rdma, netdev, Dragos Tatulea,
	Cosmin Ratiu

According to the measurements for vDPA Live Migration downtime [0], one
large source of downtime is the creation of hardware VQs and their
associated resources on the devices on the destination VM.

Previous series ([1], [2]) addressed the source part of the Live
Migration downtime. This series addresses the destination part: instead
of creating hardware VQs and their dependent resources when the device
goes into the DRIVER_OK state (which is during downtime), create "blank"
VQs at device creation time and only modify them to the received
configuration before starting the VQs (DRIVER_OK state).

The caveat here is that mlx5_vdpa VQs don't support modifying the VQ
size. VQs will be created with a convenient default size and when this
size is changed, they will be recreated.

The beginning of the series consists of refactorings and preparation.

After that, some preparations are made:
- Allow creation of "blank" VQs by not configuring them during
  create_virtqueue() if there are no modified fields.
- The VQ Init to Ready state transition is consolidated into the
  resume_vq().
- Add error handling to suspend/resume code paths.

Then VQs are created at device creation time.

Finally, the special cases that need full VQ resource recreation are
handled.

On a 64 CPU, 256 GB VM with 1 vDPA device of 16 VQps, the full VQ
resource creation + resume time was ~370ms. Now it's down to 60 ms
(only VQ config and resume). The measurements were done on a ConnectX6DX
based vDPA device.

[0] https://lore.kernel.org/qemu-devel/1701970793-6865-1-git-send-email-si-wei.liu@oracle.com/
[1] https://lore.kernel.org/lkml/20231018171456.1624030-2-dtatulea@nvidia.com
[2] https://lore.kernel.org/lkml/20231219180858.120898-1-dtatulea@nvidia.com

---
Dragos Tatulea (23):
      vdpa/mlx5: Clarify meaning thorough function rename
      vdpa/mlx5: Make setup/teardown_vq_resources() symmetrical
      vdpa/mlx5: Drop redundant code
      vdpa/mlx5: Drop redundant check in teardown_virtqueues()
      vdpa/mlx5: Iterate over active VQs during suspend/resume
      vdpa/mlx5: Remove duplicate suspend code
      vdpa/mlx5: Initialize and reset device with one queue pair
      vdpa/mlx5: Clear and reinitialize software VQ data on reset
      vdpa/mlx5: Add support for modifying the virtio_version VQ field
      vdpa/mlx5: Add support for modifying the VQ features field
      vdpa/mlx5: Set an initial size on the VQ
      vdpa/mlx5: Start off rqt_size with max VQPs
      vdpa/mlx5: Set mkey modified flags on all VQs
      vdpa/mlx5: Allow creation of blank VQs
      vdpa/mlx5: Accept Init -> Ready VQ transition in resume_vq()
      vdpa/mlx5: Add error code for suspend/resume VQ
      vdpa/mlx5: Consolidate all VQ modify to Ready to use resume_vq()
      vdpa/mlx5: Forward error in suspend/resume device
      vdpa/mlx5: Use suspend/resume during VQP change
      vdpa/mlx5: Pre-create hardware VQs at vdpa .dev_add time
      vdpa/mlx5: Re-create HW VQs under certain conditions
      vdpa/mlx5: Don't reset VQs more than necessary
      vdpa/mlx5: Don't enable non-active VQs in .set_vq_ready()

 drivers/vdpa/mlx5/net/mlx5_vnet.c  | 422 +++++++++++++++++++++++++------------
 drivers/vdpa/mlx5/net/mlx5_vnet.h  |   2 +
 include/linux/mlx5/mlx5_ifc_vdpa.h |   2 +
 3 files changed, 291 insertions(+), 135 deletions(-)
---
base-commit: c8fae27d141a32a1624d0d0d5419d94252824498
change-id: 20240617-stage-vdpa-vq-precreate-76df151bed08

Best regards,
-- 
Dragos Tatulea <dtatulea@nvidia.com>


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH vhost 01/23] vdpa/mlx5: Clarify meaning thorough function rename
  2024-06-17 15:07 [PATCH vhost 00/23] vdpa/mlx5: Pre-create HW VQs to reduce LM downtime Dragos Tatulea
@ 2024-06-17 15:07 ` Dragos Tatulea
  2024-06-19 10:37   ` Eugenio Perez Martin
  2024-06-17 15:07 ` [PATCH vhost 02/23] vdpa/mlx5: Make setup/teardown_vq_resources() symmetrical Dragos Tatulea
                   ` (21 subsequent siblings)
  22 siblings, 1 reply; 53+ messages in thread
From: Dragos Tatulea @ 2024-06-17 15:07 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Si-Wei Liu
  Cc: virtualization, linux-kernel, linux-rdma, netdev, Dragos Tatulea,
	Cosmin Ratiu

setup_driver()/teardown_driver() are a bit vague. These functions are
used for virtqueue resources.

Same for alloc_resources()/teardown_resources(): they represent fixed
resources that are meant to exist during the device lifetime.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index ecfc16151d61..3422da0e344b 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -144,10 +144,10 @@ static bool is_index_valid(struct mlx5_vdpa_dev *mvdev, u16 idx)
 	return idx <= mvdev->max_idx;
 }
 
-static void free_resources(struct mlx5_vdpa_net *ndev);
+static void free_fixed_resources(struct mlx5_vdpa_net *ndev);
 static void init_mvqs(struct mlx5_vdpa_net *ndev);
-static int setup_driver(struct mlx5_vdpa_dev *mvdev);
-static void teardown_driver(struct mlx5_vdpa_net *ndev);
+static int setup_vq_resources(struct mlx5_vdpa_dev *mvdev);
+static void teardown_vq_resources(struct mlx5_vdpa_net *ndev);
 
 static bool mlx5_vdpa_debug;
 
@@ -2848,7 +2848,7 @@ static int mlx5_vdpa_change_map(struct mlx5_vdpa_dev *mvdev,
 		if (err)
 			return err;
 
-		teardown_driver(ndev);
+		teardown_vq_resources(ndev);
 	}
 
 	mlx5_vdpa_update_mr(mvdev, new_mr, asid);
@@ -2862,7 +2862,7 @@ static int mlx5_vdpa_change_map(struct mlx5_vdpa_dev *mvdev,
 
 	if (teardown) {
 		restore_channels_info(ndev);
-		err = setup_driver(mvdev);
+		err = setup_vq_resources(mvdev);
 		if (err)
 			return err;
 	}
@@ -2873,7 +2873,7 @@ static int mlx5_vdpa_change_map(struct mlx5_vdpa_dev *mvdev,
 }
 
 /* reslock must be held for this function */
-static int setup_driver(struct mlx5_vdpa_dev *mvdev)
+static int setup_vq_resources(struct mlx5_vdpa_dev *mvdev)
 {
 	struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
 	int err;
@@ -2931,7 +2931,7 @@ static int setup_driver(struct mlx5_vdpa_dev *mvdev)
 }
 
 /* reslock must be held for this function */
-static void teardown_driver(struct mlx5_vdpa_net *ndev)
+static void teardown_vq_resources(struct mlx5_vdpa_net *ndev)
 {
 
 	WARN_ON(!rwsem_is_locked(&ndev->reslock));
@@ -2997,7 +2997,7 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status)
 				goto err_setup;
 			}
 			register_link_notifier(ndev);
-			err = setup_driver(mvdev);
+			err = setup_vq_resources(mvdev);
 			if (err) {
 				mlx5_vdpa_warn(mvdev, "failed to setup driver\n");
 				goto err_driver;
@@ -3040,7 +3040,7 @@ static int mlx5_vdpa_compat_reset(struct vdpa_device *vdev, u32 flags)
 
 	down_write(&ndev->reslock);
 	unregister_link_notifier(ndev);
-	teardown_driver(ndev);
+	teardown_vq_resources(ndev);
 	clear_vqs_ready(ndev);
 	if (flags & VDPA_RESET_F_CLEAN_MAP)
 		mlx5_vdpa_destroy_mr_resources(&ndev->mvdev);
@@ -3197,7 +3197,7 @@ static void mlx5_vdpa_free(struct vdpa_device *vdev)
 
 	ndev = to_mlx5_vdpa_ndev(mvdev);
 
-	free_resources(ndev);
+	free_fixed_resources(ndev);
 	mlx5_vdpa_destroy_mr_resources(mvdev);
 	if (!is_zero_ether_addr(ndev->config.mac)) {
 		pfmdev = pci_get_drvdata(pci_physfn(mvdev->mdev->pdev));
@@ -3467,7 +3467,7 @@ static int query_mtu(struct mlx5_core_dev *mdev, u16 *mtu)
 	return 0;
 }
 
-static int alloc_resources(struct mlx5_vdpa_net *ndev)
+static int alloc_fixed_resources(struct mlx5_vdpa_net *ndev)
 {
 	struct mlx5_vdpa_net_resources *res = &ndev->res;
 	int err;
@@ -3494,7 +3494,7 @@ static int alloc_resources(struct mlx5_vdpa_net *ndev)
 	return err;
 }
 
-static void free_resources(struct mlx5_vdpa_net *ndev)
+static void free_fixed_resources(struct mlx5_vdpa_net *ndev)
 {
 	struct mlx5_vdpa_net_resources *res = &ndev->res;
 
@@ -3735,7 +3735,7 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name,
 			goto err_res;
 	}
 
-	err = alloc_resources(ndev);
+	err = alloc_fixed_resources(ndev);
 	if (err)
 		goto err_mr;
 
@@ -3758,7 +3758,7 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name,
 err_reg:
 	destroy_workqueue(mvdev->wq);
 err_res2:
-	free_resources(ndev);
+	free_fixed_resources(ndev);
 err_mr:
 	mlx5_vdpa_destroy_mr_resources(mvdev);
 err_res:

-- 
2.45.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH vhost 02/23] vdpa/mlx5: Make setup/teardown_vq_resources() symmetrical
  2024-06-17 15:07 [PATCH vhost 00/23] vdpa/mlx5: Pre-create HW VQs to reduce LM downtime Dragos Tatulea
  2024-06-17 15:07 ` [PATCH vhost 01/23] vdpa/mlx5: Clarify meaning thorough function rename Dragos Tatulea
@ 2024-06-17 15:07 ` Dragos Tatulea
  2024-06-19 10:38   ` Eugenio Perez Martin
  2024-06-17 15:07 ` [PATCH vhost 03/23] vdpa/mlx5: Drop redundant code Dragos Tatulea
                   ` (20 subsequent siblings)
  22 siblings, 1 reply; 53+ messages in thread
From: Dragos Tatulea @ 2024-06-17 15:07 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Si-Wei Liu
  Cc: virtualization, linux-kernel, linux-rdma, netdev, Dragos Tatulea,
	Cosmin Ratiu

... by changing the setup_vq_resources() parameter.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 3422da0e344b..1ad281cbc541 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -146,7 +146,7 @@ static bool is_index_valid(struct mlx5_vdpa_dev *mvdev, u16 idx)
 
 static void free_fixed_resources(struct mlx5_vdpa_net *ndev);
 static void init_mvqs(struct mlx5_vdpa_net *ndev);
-static int setup_vq_resources(struct mlx5_vdpa_dev *mvdev);
+static int setup_vq_resources(struct mlx5_vdpa_net *ndev);
 static void teardown_vq_resources(struct mlx5_vdpa_net *ndev);
 
 static bool mlx5_vdpa_debug;
@@ -2862,7 +2862,7 @@ static int mlx5_vdpa_change_map(struct mlx5_vdpa_dev *mvdev,
 
 	if (teardown) {
 		restore_channels_info(ndev);
-		err = setup_vq_resources(mvdev);
+		err = setup_vq_resources(ndev);
 		if (err)
 			return err;
 	}
@@ -2873,9 +2873,9 @@ static int mlx5_vdpa_change_map(struct mlx5_vdpa_dev *mvdev,
 }
 
 /* reslock must be held for this function */
-static int setup_vq_resources(struct mlx5_vdpa_dev *mvdev)
+static int setup_vq_resources(struct mlx5_vdpa_net *ndev)
 {
-	struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+	struct mlx5_vdpa_dev *mvdev = &ndev->mvdev;
 	int err;
 
 	WARN_ON(!rwsem_is_locked(&ndev->reslock));
@@ -2997,7 +2997,7 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status)
 				goto err_setup;
 			}
 			register_link_notifier(ndev);
-			err = setup_vq_resources(mvdev);
+			err = setup_vq_resources(ndev);
 			if (err) {
 				mlx5_vdpa_warn(mvdev, "failed to setup driver\n");
 				goto err_driver;

-- 
2.45.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH vhost 03/23] vdpa/mlx5: Drop redundant code
  2024-06-17 15:07 [PATCH vhost 00/23] vdpa/mlx5: Pre-create HW VQs to reduce LM downtime Dragos Tatulea
  2024-06-17 15:07 ` [PATCH vhost 01/23] vdpa/mlx5: Clarify meaning thorough function rename Dragos Tatulea
  2024-06-17 15:07 ` [PATCH vhost 02/23] vdpa/mlx5: Make setup/teardown_vq_resources() symmetrical Dragos Tatulea
@ 2024-06-17 15:07 ` Dragos Tatulea
  2024-06-19 10:55   ` Eugenio Perez Martin
  2024-06-17 15:07 ` [PATCH vhost 04/23] vdpa/mlx5: Drop redundant check in teardown_virtqueues() Dragos Tatulea
                   ` (19 subsequent siblings)
  22 siblings, 1 reply; 53+ messages in thread
From: Dragos Tatulea @ 2024-06-17 15:07 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Si-Wei Liu
  Cc: virtualization, linux-kernel, linux-rdma, netdev, Dragos Tatulea,
	Cosmin Ratiu

The second iteration in init_mvqs() is never called because the first
one will iterate up to max_vqs.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 1ad281cbc541..b4d9ef4f66c8 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -3519,12 +3519,6 @@ static void init_mvqs(struct mlx5_vdpa_net *ndev)
 		mvq->fwqp.fw = true;
 		mvq->fw_state = MLX5_VIRTIO_NET_Q_OBJECT_NONE;
 	}
-	for (; i < ndev->mvdev.max_vqs; i++) {
-		mvq = &ndev->vqs[i];
-		memset(mvq, 0, offsetof(struct mlx5_vdpa_virtqueue, ri));
-		mvq->index = i;
-		mvq->ndev = ndev;
-	}
 }
 
 struct mlx5_vdpa_mgmtdev {

-- 
2.45.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH vhost 04/23] vdpa/mlx5: Drop redundant check in teardown_virtqueues()
  2024-06-17 15:07 [PATCH vhost 00/23] vdpa/mlx5: Pre-create HW VQs to reduce LM downtime Dragos Tatulea
                   ` (2 preceding siblings ...)
  2024-06-17 15:07 ` [PATCH vhost 03/23] vdpa/mlx5: Drop redundant code Dragos Tatulea
@ 2024-06-17 15:07 ` Dragos Tatulea
  2024-06-19 10:56   ` Eugenio Perez Martin
  2024-06-17 15:07 ` [PATCH vhost 05/23] vdpa/mlx5: Iterate over active VQs during suspend/resume Dragos Tatulea
                   ` (18 subsequent siblings)
  22 siblings, 1 reply; 53+ messages in thread
From: Dragos Tatulea @ 2024-06-17 15:07 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Si-Wei Liu
  Cc: virtualization, linux-kernel, linux-rdma, netdev, Dragos Tatulea,
	Cosmin Ratiu

The check is done inside teardown_vq().

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index b4d9ef4f66c8..96782b34e2b2 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -2559,16 +2559,10 @@ static int setup_virtqueues(struct mlx5_vdpa_dev *mvdev)
 
 static void teardown_virtqueues(struct mlx5_vdpa_net *ndev)
 {
-	struct mlx5_vdpa_virtqueue *mvq;
 	int i;
 
-	for (i = ndev->mvdev.max_vqs - 1; i >= 0; i--) {
-		mvq = &ndev->vqs[i];
-		if (!mvq->initialized)
-			continue;
-
-		teardown_vq(ndev, mvq);
-	}
+	for (i = ndev->mvdev.max_vqs - 1; i >= 0; i--)
+		teardown_vq(ndev, &ndev->vqs[i]);
 }
 
 static void update_cvq_info(struct mlx5_vdpa_dev *mvdev)

-- 
2.45.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH vhost 05/23] vdpa/mlx5: Iterate over active VQs during suspend/resume
  2024-06-17 15:07 [PATCH vhost 00/23] vdpa/mlx5: Pre-create HW VQs to reduce LM downtime Dragos Tatulea
                   ` (3 preceding siblings ...)
  2024-06-17 15:07 ` [PATCH vhost 04/23] vdpa/mlx5: Drop redundant check in teardown_virtqueues() Dragos Tatulea
@ 2024-06-17 15:07 ` Dragos Tatulea
  2024-06-19 11:04   ` Eugenio Perez Martin
  2024-06-17 15:07 ` [PATCH vhost 06/23] vdpa/mlx5: Remove duplicate suspend code Dragos Tatulea
                   ` (17 subsequent siblings)
  22 siblings, 1 reply; 53+ messages in thread
From: Dragos Tatulea @ 2024-06-17 15:07 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Si-Wei Liu
  Cc: virtualization, linux-kernel, linux-rdma, netdev, Dragos Tatulea,
	Cosmin Ratiu

No need to iterate over max number of VQs.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 96782b34e2b2..51630b1935f4 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1504,7 +1504,7 @@ static void suspend_vqs(struct mlx5_vdpa_net *ndev)
 {
 	int i;
 
-	for (i = 0; i < ndev->mvdev.max_vqs; i++)
+	for (i = 0; i < ndev->cur_num_vqs; i++)
 		suspend_vq(ndev, &ndev->vqs[i]);
 }
 
@@ -1522,7 +1522,7 @@ static void resume_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mv
 
 static void resume_vqs(struct mlx5_vdpa_net *ndev)
 {
-	for (int i = 0; i < ndev->mvdev.max_vqs; i++)
+	for (int i = 0; i < ndev->cur_num_vqs; i++)
 		resume_vq(ndev, &ndev->vqs[i]);
 }
 

-- 
2.45.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH vhost 06/23] vdpa/mlx5: Remove duplicate suspend code
  2024-06-17 15:07 [PATCH vhost 00/23] vdpa/mlx5: Pre-create HW VQs to reduce LM downtime Dragos Tatulea
                   ` (4 preceding siblings ...)
  2024-06-17 15:07 ` [PATCH vhost 05/23] vdpa/mlx5: Iterate over active VQs during suspend/resume Dragos Tatulea
@ 2024-06-17 15:07 ` Dragos Tatulea
  2024-06-19 11:02   ` Eugenio Perez Martin
  2024-06-17 15:07 ` [PATCH vhost 07/23] vdpa/mlx5: Initialize and reset device with one queue pair Dragos Tatulea
                   ` (16 subsequent siblings)
  22 siblings, 1 reply; 53+ messages in thread
From: Dragos Tatulea @ 2024-06-17 15:07 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Si-Wei Liu
  Cc: virtualization, linux-kernel, linux-rdma, netdev, Dragos Tatulea,
	Cosmin Ratiu

Use the dedicated suspend_vqs() function instead.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 51630b1935f4..eca6f68c2eda 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -3355,17 +3355,12 @@ static int mlx5_vdpa_suspend(struct vdpa_device *vdev)
 {
 	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
 	struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
-	struct mlx5_vdpa_virtqueue *mvq;
-	int i;
 
 	mlx5_vdpa_info(mvdev, "suspending device\n");
 
 	down_write(&ndev->reslock);
 	unregister_link_notifier(ndev);
-	for (i = 0; i < ndev->cur_num_vqs; i++) {
-		mvq = &ndev->vqs[i];
-		suspend_vq(ndev, mvq);
-	}
+	suspend_vqs(ndev);
 	mlx5_vdpa_cvq_suspend(mvdev);
 	mvdev->suspended = true;
 	up_write(&ndev->reslock);

-- 
2.45.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH vhost 07/23] vdpa/mlx5: Initialize and reset device with one queue pair
  2024-06-17 15:07 [PATCH vhost 00/23] vdpa/mlx5: Pre-create HW VQs to reduce LM downtime Dragos Tatulea
                   ` (5 preceding siblings ...)
  2024-06-17 15:07 ` [PATCH vhost 06/23] vdpa/mlx5: Remove duplicate suspend code Dragos Tatulea
@ 2024-06-17 15:07 ` Dragos Tatulea
  2024-06-17 15:07 ` [PATCH vhost 08/23] vdpa/mlx5: Clear and reinitialize software VQ data on reset Dragos Tatulea
                   ` (15 subsequent siblings)
  22 siblings, 0 replies; 53+ messages in thread
From: Dragos Tatulea @ 2024-06-17 15:07 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Si-Wei Liu
  Cc: virtualization, linux-kernel, linux-rdma, netdev, Dragos Tatulea,
	Cosmin Ratiu

The virtio spec says that a vdpa device should start off with one queue
pair. The driver is already compliant.

This patch moves the initialization to device add and reset times. This
is done in preparation for the pre-creation of hardware virtqueues at
device add time.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index eca6f68c2eda..c8b5c87f001d 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -48,6 +48,16 @@ MODULE_LICENSE("Dual BSD/GPL");
 
 #define MLX5V_UNTAGGED 0x1000
 
+/* Device must start with 1 queue pair, as per VIRTIO v1.2 spec, section
+ * 5.1.6.5.5 "Device operation in multiqueue mode":
+ *
+ * Multiqueue is disabled by default.
+ * The driver enables multiqueue by sending a command using class
+ * VIRTIO_NET_CTRL_MQ. The command selects the mode of multiqueue
+ * operation, as follows: ...
+ */
+#define MLX5V_DEFAULT_VQ_COUNT 2
+
 struct mlx5_vdpa_cq_buf {
 	struct mlx5_frag_buf_ctrl fbc;
 	struct mlx5_frag_buf frag_buf;
@@ -2713,16 +2723,6 @@ static int mlx5_vdpa_set_driver_features(struct vdpa_device *vdev, u64 features)
 	else
 		ndev->rqt_size = 1;
 
-	/* Device must start with 1 queue pair, as per VIRTIO v1.2 spec, section
-	 * 5.1.6.5.5 "Device operation in multiqueue mode":
-	 *
-	 * Multiqueue is disabled by default.
-	 * The driver enables multiqueue by sending a command using class
-	 * VIRTIO_NET_CTRL_MQ. The command selects the mode of multiqueue
-	 * operation, as follows: ...
-	 */
-	ndev->cur_num_vqs = 2;
-
 	update_cvq_info(mvdev);
 	return err;
 }
@@ -3040,7 +3040,7 @@ static int mlx5_vdpa_compat_reset(struct vdpa_device *vdev, u32 flags)
 		mlx5_vdpa_destroy_mr_resources(&ndev->mvdev);
 	ndev->mvdev.status = 0;
 	ndev->mvdev.suspended = false;
-	ndev->cur_num_vqs = 0;
+	ndev->cur_num_vqs = MLX5V_DEFAULT_VQ_COUNT;
 	ndev->mvdev.cvq.received_desc = 0;
 	ndev->mvdev.cvq.completed_desc = 0;
 	memset(ndev->event_cbs, 0, sizeof(*ndev->event_cbs) * (mvdev->max_vqs + 1));
@@ -3643,6 +3643,7 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name,
 		err = -ENOMEM;
 		goto err_alloc;
 	}
+	ndev->cur_num_vqs = MLX5V_DEFAULT_VQ_COUNT;
 
 	init_mvqs(ndev);
 	allocate_irqs(ndev);

-- 
2.45.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH vhost 08/23] vdpa/mlx5: Clear and reinitialize software VQ data on reset
  2024-06-17 15:07 [PATCH vhost 00/23] vdpa/mlx5: Pre-create HW VQs to reduce LM downtime Dragos Tatulea
                   ` (6 preceding siblings ...)
  2024-06-17 15:07 ` [PATCH vhost 07/23] vdpa/mlx5: Initialize and reset device with one queue pair Dragos Tatulea
@ 2024-06-17 15:07 ` Dragos Tatulea
  2024-06-19 11:28   ` Eugenio Perez Martin
  2024-06-17 15:07 ` [PATCH vhost 09/23] vdpa/mlx5: Add support for modifying the virtio_version VQ field Dragos Tatulea
                   ` (14 subsequent siblings)
  22 siblings, 1 reply; 53+ messages in thread
From: Dragos Tatulea @ 2024-06-17 15:07 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Si-Wei Liu
  Cc: virtualization, linux-kernel, linux-rdma, netdev, Dragos Tatulea,
	Cosmin Ratiu

The hardware VQ configuration is mirrored by data in struct
mlx5_vdpa_virtqueue . Instead of clearing just a few fields at reset,
fully clear the struct and initialize with the appropriate default
values.

As clear_vqs_ready() is used only during reset, get rid of it.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 16 +++-------------
 1 file changed, 3 insertions(+), 13 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index c8b5c87f001d..de013b5a2815 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -2941,18 +2941,6 @@ static void teardown_vq_resources(struct mlx5_vdpa_net *ndev)
 	ndev->setup = false;
 }
 
-static void clear_vqs_ready(struct mlx5_vdpa_net *ndev)
-{
-	int i;
-
-	for (i = 0; i < ndev->mvdev.max_vqs; i++) {
-		ndev->vqs[i].ready = false;
-		ndev->vqs[i].modified_fields = 0;
-	}
-
-	ndev->mvdev.cvq.ready = false;
-}
-
 static int setup_cvq_vring(struct mlx5_vdpa_dev *mvdev)
 {
 	struct mlx5_control_vq *cvq = &mvdev->cvq;
@@ -3035,12 +3023,14 @@ static int mlx5_vdpa_compat_reset(struct vdpa_device *vdev, u32 flags)
 	down_write(&ndev->reslock);
 	unregister_link_notifier(ndev);
 	teardown_vq_resources(ndev);
-	clear_vqs_ready(ndev);
+	init_mvqs(ndev);
+
 	if (flags & VDPA_RESET_F_CLEAN_MAP)
 		mlx5_vdpa_destroy_mr_resources(&ndev->mvdev);
 	ndev->mvdev.status = 0;
 	ndev->mvdev.suspended = false;
 	ndev->cur_num_vqs = MLX5V_DEFAULT_VQ_COUNT;
+	ndev->mvdev.cvq.ready = false;
 	ndev->mvdev.cvq.received_desc = 0;
 	ndev->mvdev.cvq.completed_desc = 0;
 	memset(ndev->event_cbs, 0, sizeof(*ndev->event_cbs) * (mvdev->max_vqs + 1));

-- 
2.45.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH vhost 09/23] vdpa/mlx5: Add support for modifying the virtio_version VQ field
  2024-06-17 15:07 [PATCH vhost 00/23] vdpa/mlx5: Pre-create HW VQs to reduce LM downtime Dragos Tatulea
                   ` (7 preceding siblings ...)
  2024-06-17 15:07 ` [PATCH vhost 08/23] vdpa/mlx5: Clear and reinitialize software VQ data on reset Dragos Tatulea
@ 2024-06-17 15:07 ` Dragos Tatulea
  2024-06-17 15:07 ` [PATCH vhost 10/23] vdpa/mlx5: Add support for modifying the VQ features field Dragos Tatulea
                   ` (13 subsequent siblings)
  22 siblings, 0 replies; 53+ messages in thread
From: Dragos Tatulea @ 2024-06-17 15:07 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Si-Wei Liu
  Cc: virtualization, linux-kernel, linux-rdma, netdev, Dragos Tatulea,
	Cosmin Ratiu

This is done in preparation for the pre-creation of hardware virtqueues
at device add time.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c  | 16 ++++++++++++++++
 include/linux/mlx5/mlx5_ifc_vdpa.h |  1 +
 2 files changed, 17 insertions(+)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index de013b5a2815..b60e8897717b 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1283,6 +1283,10 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
 	if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_USED_IDX)
 		MLX5_SET(virtio_net_q_object, obj_context, hw_used_index, mvq->used_idx);
 
+	if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_QUEUE_VIRTIO_VERSION)
+		MLX5_SET(virtio_q, vq_ctx, virtio_version_1_0,
+			!!(ndev->mvdev.actual_features & BIT_ULL(VIRTIO_F_VERSION_1)));
+
 	if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_MKEY) {
 		vq_mr = mvdev->mr[mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP]];
 
@@ -2709,6 +2713,7 @@ static int mlx5_vdpa_set_driver_features(struct vdpa_device *vdev, u64 features)
 {
 	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
 	struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+	u64 old_features = mvdev->actual_features;
 	int err;
 
 	print_features(mvdev, features, true);
@@ -2723,6 +2728,17 @@ static int mlx5_vdpa_set_driver_features(struct vdpa_device *vdev, u64 features)
 	else
 		ndev->rqt_size = 1;
 
+	/* Interested in changes of vq features only. */
+	if (get_features(old_features) != get_features(mvdev->actual_features)) {
+		for (int i = 0; i < mvdev->max_vqs; ++i) {
+			struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[i];
+
+			mvq->modified_fields |= (
+				MLX5_VIRTQ_MODIFY_MASK_QUEUE_VIRTIO_VERSION
+			);
+		}
+	}
+
 	update_cvq_info(mvdev);
 	return err;
 }
diff --git a/include/linux/mlx5/mlx5_ifc_vdpa.h b/include/linux/mlx5/mlx5_ifc_vdpa.h
index 40371c916cf9..34f27c01cec9 100644
--- a/include/linux/mlx5/mlx5_ifc_vdpa.h
+++ b/include/linux/mlx5/mlx5_ifc_vdpa.h
@@ -148,6 +148,7 @@ enum {
 	MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_ADDRS           = (u64)1 << 6,
 	MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_AVAIL_IDX       = (u64)1 << 7,
 	MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_USED_IDX        = (u64)1 << 8,
+	MLX5_VIRTQ_MODIFY_MASK_QUEUE_VIRTIO_VERSION	= (u64)1 << 10,
 	MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_MKEY            = (u64)1 << 11,
 	MLX5_VIRTQ_MODIFY_MASK_DESC_GROUP_MKEY          = (u64)1 << 14,
 };

-- 
2.45.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH vhost 10/23] vdpa/mlx5: Add support for modifying the VQ features field
  2024-06-17 15:07 [PATCH vhost 00/23] vdpa/mlx5: Pre-create HW VQs to reduce LM downtime Dragos Tatulea
                   ` (8 preceding siblings ...)
  2024-06-17 15:07 ` [PATCH vhost 09/23] vdpa/mlx5: Add support for modifying the virtio_version VQ field Dragos Tatulea
@ 2024-06-17 15:07 ` Dragos Tatulea
  2024-06-17 15:07 ` [PATCH vhost 11/23] vdpa/mlx5: Set an initial size on the VQ Dragos Tatulea
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 53+ messages in thread
From: Dragos Tatulea @ 2024-06-17 15:07 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Si-Wei Liu
  Cc: virtualization, linux-kernel, linux-rdma, netdev, Dragos Tatulea,
	Cosmin Ratiu

This is done in preparation for the pre-creation of hardware virtqueues
at device add time.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c  | 12 +++++++++++-
 include/linux/mlx5/mlx5_ifc_vdpa.h |  1 +
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index b60e8897717b..245b5dac98d3 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1287,6 +1287,15 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
 		MLX5_SET(virtio_q, vq_ctx, virtio_version_1_0,
 			!!(ndev->mvdev.actual_features & BIT_ULL(VIRTIO_F_VERSION_1)));
 
+	if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_QUEUE_FEATURES) {
+		u16 mlx_features = get_features(ndev->mvdev.actual_features);
+
+		MLX5_SET(virtio_net_q_object, obj_context, queue_feature_bit_mask_12_3,
+			 mlx_features >> 3);
+		MLX5_SET(virtio_net_q_object, obj_context, queue_feature_bit_mask_2_0,
+			 mlx_features & 7);
+	}
+
 	if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_MKEY) {
 		vq_mr = mvdev->mr[mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP]];
 
@@ -2734,7 +2743,8 @@ static int mlx5_vdpa_set_driver_features(struct vdpa_device *vdev, u64 features)
 			struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[i];
 
 			mvq->modified_fields |= (
-				MLX5_VIRTQ_MODIFY_MASK_QUEUE_VIRTIO_VERSION
+				MLX5_VIRTQ_MODIFY_MASK_QUEUE_VIRTIO_VERSION |
+				MLX5_VIRTQ_MODIFY_MASK_QUEUE_FEATURES
 			);
 		}
 	}
diff --git a/include/linux/mlx5/mlx5_ifc_vdpa.h b/include/linux/mlx5/mlx5_ifc_vdpa.h
index 34f27c01cec9..58dfa2ee7c83 100644
--- a/include/linux/mlx5/mlx5_ifc_vdpa.h
+++ b/include/linux/mlx5/mlx5_ifc_vdpa.h
@@ -150,6 +150,7 @@ enum {
 	MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_USED_IDX        = (u64)1 << 8,
 	MLX5_VIRTQ_MODIFY_MASK_QUEUE_VIRTIO_VERSION	= (u64)1 << 10,
 	MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_MKEY            = (u64)1 << 11,
+	MLX5_VIRTQ_MODIFY_MASK_QUEUE_FEATURES		= (u64)1 << 12,
 	MLX5_VIRTQ_MODIFY_MASK_DESC_GROUP_MKEY          = (u64)1 << 14,
 };
 

-- 
2.45.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH vhost 11/23] vdpa/mlx5: Set an initial size on the VQ
  2024-06-17 15:07 [PATCH vhost 00/23] vdpa/mlx5: Pre-create HW VQs to reduce LM downtime Dragos Tatulea
                   ` (9 preceding siblings ...)
  2024-06-17 15:07 ` [PATCH vhost 10/23] vdpa/mlx5: Add support for modifying the VQ features field Dragos Tatulea
@ 2024-06-17 15:07 ` Dragos Tatulea
  2024-06-19 15:08   ` Eugenio Perez Martin
  2024-06-17 15:07 ` [PATCH vhost 12/23] vdpa/mlx5: Start off rqt_size with max VQPs Dragos Tatulea
                   ` (11 subsequent siblings)
  22 siblings, 1 reply; 53+ messages in thread
From: Dragos Tatulea @ 2024-06-17 15:07 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Si-Wei Liu
  Cc: virtualization, linux-kernel, linux-rdma, netdev, Dragos Tatulea,
	Cosmin Ratiu

The virtqueue size is a pre-requisite for setting up any virtqueue
resources. For the upcoming optimization of creating virtqueues at
device add, the virtqueue size has to be configured.

Store the default queue size in struct mlx5_vdpa_net to make it easy in
the future to pre-configure this default value via vdpa tool.

The queue size check in setup_vq() will always be false. So remove it.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 7 ++++---
 drivers/vdpa/mlx5/net/mlx5_vnet.h | 1 +
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 245b5dac98d3..1181e0ac3671 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -58,6 +58,8 @@ MODULE_LICENSE("Dual BSD/GPL");
  */
 #define MLX5V_DEFAULT_VQ_COUNT 2
 
+#define MLX5V_DEFAULT_VQ_SIZE 256
+
 struct mlx5_vdpa_cq_buf {
 	struct mlx5_frag_buf_ctrl fbc;
 	struct mlx5_frag_buf frag_buf;
@@ -1445,9 +1447,6 @@ static int setup_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
 	u16 idx = mvq->index;
 	int err;
 
-	if (!mvq->num_ent)
-		return 0;
-
 	if (mvq->initialized)
 		return 0;
 
@@ -3523,6 +3522,7 @@ static void init_mvqs(struct mlx5_vdpa_net *ndev)
 		mvq->ndev = ndev;
 		mvq->fwqp.fw = true;
 		mvq->fw_state = MLX5_VIRTIO_NET_Q_OBJECT_NONE;
+		mvq->num_ent = ndev->default_queue_size;
 	}
 }
 
@@ -3660,6 +3660,7 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name,
 		goto err_alloc;
 	}
 	ndev->cur_num_vqs = MLX5V_DEFAULT_VQ_COUNT;
+	ndev->default_queue_size = MLX5V_DEFAULT_VQ_SIZE;
 
 	init_mvqs(ndev);
 	allocate_irqs(ndev);
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.h b/drivers/vdpa/mlx5/net/mlx5_vnet.h
index 90b556a57971..2ada29767cc5 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.h
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.h
@@ -58,6 +58,7 @@ struct mlx5_vdpa_net {
 	bool setup;
 	u32 cur_num_vqs;
 	u32 rqt_size;
+	u16 default_queue_size;
 	bool nb_registered;
 	struct notifier_block nb;
 	struct vdpa_callback config_cb;

-- 
2.45.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH vhost 12/23] vdpa/mlx5: Start off rqt_size with max VQPs
  2024-06-17 15:07 [PATCH vhost 00/23] vdpa/mlx5: Pre-create HW VQs to reduce LM downtime Dragos Tatulea
                   ` (10 preceding siblings ...)
  2024-06-17 15:07 ` [PATCH vhost 11/23] vdpa/mlx5: Set an initial size on the VQ Dragos Tatulea
@ 2024-06-17 15:07 ` Dragos Tatulea
  2024-06-19 15:33   ` Eugenio Perez Martin
  2024-06-17 15:07 ` [PATCH vhost 13/23] vdpa/mlx5: Set mkey modified flags on all VQs Dragos Tatulea
                   ` (10 subsequent siblings)
  22 siblings, 1 reply; 53+ messages in thread
From: Dragos Tatulea @ 2024-06-17 15:07 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Si-Wei Liu
  Cc: virtualization, linux-kernel, linux-rdma, netdev, Dragos Tatulea,
	Cosmin Ratiu

Currently rqt_size is initialized during device flag configuration.
That's because it is the earliest moment when device knows if MQ
(multi queue) is on or off.

Shift this configuration earlier to device creation time. This implies
that non-MQ devices will have a larger RQT size. But the configuration
will still be correct.

This is done in preparation for the pre-creation of hardware virtqueues
at device add time. When that change will be added, RQT will be created
at device creation time so it needs to be initialized to its max size.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 1181e0ac3671..0201c6fe61e1 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -2731,10 +2731,6 @@ static int mlx5_vdpa_set_driver_features(struct vdpa_device *vdev, u64 features)
 		return err;
 
 	ndev->mvdev.actual_features = features & ndev->mvdev.mlx_features;
-	if (ndev->mvdev.actual_features & BIT_ULL(VIRTIO_NET_F_MQ))
-		ndev->rqt_size = mlx5vdpa16_to_cpu(mvdev, ndev->config.max_virtqueue_pairs);
-	else
-		ndev->rqt_size = 1;
 
 	/* Interested in changes of vq features only. */
 	if (get_features(old_features) != get_features(mvdev->actual_features)) {
@@ -3719,8 +3715,12 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name,
 		goto err_alloc;
 	}
 
-	if (device_features & BIT_ULL(VIRTIO_NET_F_MQ))
+	if (device_features & BIT_ULL(VIRTIO_NET_F_MQ)) {
 		config->max_virtqueue_pairs = cpu_to_mlx5vdpa16(mvdev, max_vqs / 2);
+		ndev->rqt_size = max_vqs / 2;
+	} else {
+		ndev->rqt_size = 1;
+	}
 
 	ndev->mvdev.mlx_features = device_features;
 	mvdev->vdev.dma_dev = &mdev->pdev->dev;

-- 
2.45.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH vhost 13/23] vdpa/mlx5: Set mkey modified flags on all VQs
  2024-06-17 15:07 [PATCH vhost 00/23] vdpa/mlx5: Pre-create HW VQs to reduce LM downtime Dragos Tatulea
                   ` (11 preceding siblings ...)
  2024-06-17 15:07 ` [PATCH vhost 12/23] vdpa/mlx5: Start off rqt_size with max VQPs Dragos Tatulea
@ 2024-06-17 15:07 ` Dragos Tatulea
  2024-06-19 15:33   ` Eugenio Perez Martin
  2024-06-17 15:07 ` [PATCH vhost 14/23] vdpa/mlx5: Allow creation of blank VQs Dragos Tatulea
                   ` (9 subsequent siblings)
  22 siblings, 1 reply; 53+ messages in thread
From: Dragos Tatulea @ 2024-06-17 15:07 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Si-Wei Liu
  Cc: virtualization, linux-kernel, linux-rdma, netdev, Dragos Tatulea,
	Cosmin Ratiu

Otherwise, when virtqueues are moved from INIT to READY the latest mkey
will not be set appropriately.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 0201c6fe61e1..0abe01fd20e9 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -2868,7 +2868,7 @@ static int mlx5_vdpa_change_map(struct mlx5_vdpa_dev *mvdev,
 
 	mlx5_vdpa_update_mr(mvdev, new_mr, asid);
 
-	for (int i = 0; i < ndev->cur_num_vqs; i++)
+	for (int i = 0; i < mvdev->max_vqs; i++)
 		ndev->vqs[i].modified_fields |= MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_MKEY |
 						MLX5_VIRTQ_MODIFY_MASK_DESC_GROUP_MKEY;
 

-- 
2.45.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH vhost 14/23] vdpa/mlx5: Allow creation of blank VQs
  2024-06-17 15:07 [PATCH vhost 00/23] vdpa/mlx5: Pre-create HW VQs to reduce LM downtime Dragos Tatulea
                   ` (12 preceding siblings ...)
  2024-06-17 15:07 ` [PATCH vhost 13/23] vdpa/mlx5: Set mkey modified flags on all VQs Dragos Tatulea
@ 2024-06-17 15:07 ` Dragos Tatulea
  2024-06-17 15:07 ` [PATCH vhost 15/23] vdpa/mlx5: Accept Init -> Ready VQ transition in resume_vq() Dragos Tatulea
                   ` (8 subsequent siblings)
  22 siblings, 0 replies; 53+ messages in thread
From: Dragos Tatulea @ 2024-06-17 15:07 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Si-Wei Liu
  Cc: virtualization, linux-kernel, linux-rdma, netdev, Dragos Tatulea,
	Cosmin Ratiu

Based on the filled flag, create VQs that are filled or blank.
Blank VQs will be filled in later through VQ modify.

Downstream patches will make use of this to pre-create blank VQs at
vdpa device creation.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 85 +++++++++++++++++++++++++--------------
 1 file changed, 55 insertions(+), 30 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 0abe01fd20e9..a2dd8fd58afa 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -158,7 +158,7 @@ static bool is_index_valid(struct mlx5_vdpa_dev *mvdev, u16 idx)
 
 static void free_fixed_resources(struct mlx5_vdpa_net *ndev);
 static void init_mvqs(struct mlx5_vdpa_net *ndev);
-static int setup_vq_resources(struct mlx5_vdpa_net *ndev);
+static int setup_vq_resources(struct mlx5_vdpa_net *ndev, bool filled);
 static void teardown_vq_resources(struct mlx5_vdpa_net *ndev);
 
 static bool mlx5_vdpa_debug;
@@ -874,13 +874,16 @@ static bool msix_mode_supported(struct mlx5_vdpa_dev *mvdev)
 		pci_msix_can_alloc_dyn(mvdev->mdev->pdev);
 }
 
-static int create_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
+static int create_virtqueue(struct mlx5_vdpa_net *ndev,
+			    struct mlx5_vdpa_virtqueue *mvq,
+			    bool filled)
 {
 	int inlen = MLX5_ST_SZ_BYTES(create_virtio_net_q_in);
 	u32 out[MLX5_ST_SZ_DW(create_virtio_net_q_out)] = {};
 	struct mlx5_vdpa_dev *mvdev = &ndev->mvdev;
 	struct mlx5_vdpa_mr *vq_mr;
 	struct mlx5_vdpa_mr *vq_desc_mr;
+	u64 features = filled ? mvdev->actual_features : mvdev->mlx_features;
 	void *obj_context;
 	u16 mlx_features;
 	void *cmd_hdr;
@@ -898,7 +901,7 @@ static int create_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtque
 		goto err_alloc;
 	}
 
-	mlx_features = get_features(ndev->mvdev.actual_features);
+	mlx_features = get_features(features);
 	cmd_hdr = MLX5_ADDR_OF(create_virtio_net_q_in, in, general_obj_in_cmd_hdr);
 
 	MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, opcode, MLX5_CMD_OP_CREATE_GENERAL_OBJECT);
@@ -906,8 +909,6 @@ static int create_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtque
 	MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, uid, ndev->mvdev.res.uid);
 
 	obj_context = MLX5_ADDR_OF(create_virtio_net_q_in, in, obj_context);
-	MLX5_SET(virtio_net_q_object, obj_context, hw_available_index, mvq->avail_idx);
-	MLX5_SET(virtio_net_q_object, obj_context, hw_used_index, mvq->used_idx);
 	MLX5_SET(virtio_net_q_object, obj_context, queue_feature_bit_mask_12_3,
 		 mlx_features >> 3);
 	MLX5_SET(virtio_net_q_object, obj_context, queue_feature_bit_mask_2_0,
@@ -929,17 +930,36 @@ static int create_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtque
 	MLX5_SET(virtio_q, vq_ctx, queue_index, mvq->index);
 	MLX5_SET(virtio_q, vq_ctx, queue_size, mvq->num_ent);
 	MLX5_SET(virtio_q, vq_ctx, virtio_version_1_0,
-		 !!(ndev->mvdev.actual_features & BIT_ULL(VIRTIO_F_VERSION_1)));
-	MLX5_SET64(virtio_q, vq_ctx, desc_addr, mvq->desc_addr);
-	MLX5_SET64(virtio_q, vq_ctx, used_addr, mvq->device_addr);
-	MLX5_SET64(virtio_q, vq_ctx, available_addr, mvq->driver_addr);
-	vq_mr = mvdev->mr[mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP]];
-	if (vq_mr)
-		MLX5_SET(virtio_q, vq_ctx, virtio_q_mkey, vq_mr->mkey);
-
-	vq_desc_mr = mvdev->mr[mvdev->group2asid[MLX5_VDPA_DATAVQ_DESC_GROUP]];
-	if (vq_desc_mr && MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, desc_group_mkey_supported))
-		MLX5_SET(virtio_q, vq_ctx, desc_group_mkey, vq_desc_mr->mkey);
+		 !!(features & BIT_ULL(VIRTIO_F_VERSION_1)));
+
+	if (filled) {
+		MLX5_SET(virtio_net_q_object, obj_context, hw_available_index, mvq->avail_idx);
+		MLX5_SET(virtio_net_q_object, obj_context, hw_used_index, mvq->used_idx);
+
+		MLX5_SET64(virtio_q, vq_ctx, desc_addr, mvq->desc_addr);
+		MLX5_SET64(virtio_q, vq_ctx, used_addr, mvq->device_addr);
+		MLX5_SET64(virtio_q, vq_ctx, available_addr, mvq->driver_addr);
+
+		vq_mr = mvdev->mr[mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP]];
+		if (vq_mr)
+			MLX5_SET(virtio_q, vq_ctx, virtio_q_mkey, vq_mr->mkey);
+
+		vq_desc_mr = mvdev->mr[mvdev->group2asid[MLX5_VDPA_DATAVQ_DESC_GROUP]];
+		if (vq_desc_mr &&
+		    MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, desc_group_mkey_supported))
+			MLX5_SET(virtio_q, vq_ctx, desc_group_mkey, vq_desc_mr->mkey);
+	} else {
+		/* If there is no mr update, make sure that the existing ones are set
+		 * modify to ready.
+		 */
+		vq_mr = mvdev->mr[mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP]];
+		if (vq_mr)
+			mvq->modified_fields |= MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_MKEY;
+
+		vq_desc_mr = mvdev->mr[mvdev->group2asid[MLX5_VDPA_DATAVQ_DESC_GROUP]];
+		if (vq_desc_mr)
+			mvq->modified_fields |= MLX5_VIRTQ_MODIFY_MASK_DESC_GROUP_MKEY;
+	}
 
 	MLX5_SET(virtio_q, vq_ctx, umem_1_id, mvq->umem1.id);
 	MLX5_SET(virtio_q, vq_ctx, umem_1_size, mvq->umem1.size);
@@ -959,12 +979,15 @@ static int create_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtque
 	kfree(in);
 	mvq->virtq_id = MLX5_GET(general_obj_out_cmd_hdr, out, obj_id);
 
-	mlx5_vdpa_get_mr(mvdev, vq_mr);
-	mvq->vq_mr = vq_mr;
+	if (filled) {
+		mlx5_vdpa_get_mr(mvdev, vq_mr);
+		mvq->vq_mr = vq_mr;
 
-	if (vq_desc_mr && MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, desc_group_mkey_supported)) {
-		mlx5_vdpa_get_mr(mvdev, vq_desc_mr);
-		mvq->desc_mr = vq_desc_mr;
+		if (vq_desc_mr &&
+		    MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, desc_group_mkey_supported)) {
+			mlx5_vdpa_get_mr(mvdev, vq_desc_mr);
+			mvq->desc_mr = vq_desc_mr;
+		}
 	}
 
 	return 0;
@@ -1442,7 +1465,9 @@ static void dealloc_vector(struct mlx5_vdpa_net *ndev,
 		}
 }
 
-static int setup_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
+static int setup_vq(struct mlx5_vdpa_net *ndev,
+		    struct mlx5_vdpa_virtqueue *mvq,
+		    bool filled)
 {
 	u16 idx = mvq->index;
 	int err;
@@ -1471,7 +1496,7 @@ static int setup_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
 		goto err_connect;
 
 	alloc_vector(ndev, mvq);
-	err = create_virtqueue(ndev, mvq);
+	err = create_virtqueue(ndev, mvq, filled);
 	if (err)
 		goto err_vq;
 
@@ -2062,7 +2087,7 @@ static int change_num_qps(struct mlx5_vdpa_dev *mvdev, int newqps)
 	} else {
 		ndev->cur_num_vqs = 2 * newqps;
 		for (i = cur_qps * 2; i < 2 * newqps; i++) {
-			err = setup_vq(ndev, &ndev->vqs[i]);
+			err = setup_vq(ndev, &ndev->vqs[i], true);
 			if (err)
 				goto clean_added;
 		}
@@ -2558,14 +2583,14 @@ static int verify_driver_features(struct mlx5_vdpa_dev *mvdev, u64 features)
 	return 0;
 }
 
-static int setup_virtqueues(struct mlx5_vdpa_dev *mvdev)
+static int setup_virtqueues(struct mlx5_vdpa_dev *mvdev, bool filled)
 {
 	struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
 	int err;
 	int i;
 
 	for (i = 0; i < mvdev->max_vqs; i++) {
-		err = setup_vq(ndev, &ndev->vqs[i]);
+		err = setup_vq(ndev, &ndev->vqs[i], filled);
 		if (err)
 			goto err_vq;
 	}
@@ -2877,7 +2902,7 @@ static int mlx5_vdpa_change_map(struct mlx5_vdpa_dev *mvdev,
 
 	if (teardown) {
 		restore_channels_info(ndev);
-		err = setup_vq_resources(ndev);
+		err = setup_vq_resources(ndev, true);
 		if (err)
 			return err;
 	}
@@ -2888,7 +2913,7 @@ static int mlx5_vdpa_change_map(struct mlx5_vdpa_dev *mvdev,
 }
 
 /* reslock must be held for this function */
-static int setup_vq_resources(struct mlx5_vdpa_net *ndev)
+static int setup_vq_resources(struct mlx5_vdpa_net *ndev, bool filled)
 {
 	struct mlx5_vdpa_dev *mvdev = &ndev->mvdev;
 	int err;
@@ -2906,7 +2931,7 @@ static int setup_vq_resources(struct mlx5_vdpa_net *ndev)
 	if (err)
 		goto err_setup;
 
-	err = setup_virtqueues(mvdev);
+	err = setup_virtqueues(mvdev, filled);
 	if (err) {
 		mlx5_vdpa_warn(mvdev, "setup_virtqueues\n");
 		goto err_setup;
@@ -3000,7 +3025,7 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status)
 				goto err_setup;
 			}
 			register_link_notifier(ndev);
-			err = setup_vq_resources(ndev);
+			err = setup_vq_resources(ndev, true);
 			if (err) {
 				mlx5_vdpa_warn(mvdev, "failed to setup driver\n");
 				goto err_driver;

-- 
2.45.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH vhost 15/23] vdpa/mlx5: Accept Init -> Ready VQ transition in resume_vq()
  2024-06-17 15:07 [PATCH vhost 00/23] vdpa/mlx5: Pre-create HW VQs to reduce LM downtime Dragos Tatulea
                   ` (13 preceding siblings ...)
  2024-06-17 15:07 ` [PATCH vhost 14/23] vdpa/mlx5: Allow creation of blank VQs Dragos Tatulea
@ 2024-06-17 15:07 ` Dragos Tatulea
  2024-06-17 15:07 ` [PATCH vhost 16/23] vdpa/mlx5: Add error code for suspend/resume VQ Dragos Tatulea
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 53+ messages in thread
From: Dragos Tatulea @ 2024-06-17 15:07 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Si-Wei Liu
  Cc: virtualization, linux-kernel, linux-rdma, netdev, Dragos Tatulea,
	Cosmin Ratiu

Until now resume_vq() was used only for the suspend/resume scenario.
This change also allows calling resume_vq() to bring it from Init to
Ready state (VQ initialization).

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index a2dd8fd58afa..e4d68d2d0bb4 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1557,11 +1557,31 @@ static void suspend_vqs(struct mlx5_vdpa_net *ndev)
 
 static void resume_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
 {
-	if (!mvq->initialized || !is_resumable(ndev))
+	if (!mvq->initialized)
 		return;
 
-	if (mvq->fw_state != MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND)
+	switch (mvq->fw_state) {
+	case MLX5_VIRTIO_NET_Q_OBJECT_STATE_INIT:
+		/* Due to a FW quirk we need to modify the VQ fields first then change state.
+		 * This should be fixed soon. After that, a single command can be used.
+		 */
+		if (modify_virtqueue(ndev, mvq, 0))
+			mlx5_vdpa_warn(&ndev->mvdev,
+				"modify vq properties failed for vq %u\n", mvq->index);
+		break;
+	case MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND:
+		if (!is_resumable(ndev)) {
+			mlx5_vdpa_warn(&ndev->mvdev, "vq %d is not resumable\n", mvq->index);
+			return;
+		}
+		break;
+	case MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY:
 		return;
+	default:
+		mlx5_vdpa_warn(&ndev->mvdev, "resume vq %u called from bad state %d\n",
+			       mvq->index, mvq->fw_state);
+		return;
+	}
 
 	if (modify_virtqueue_state(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY))
 		mlx5_vdpa_warn(&ndev->mvdev, "modify to resume failed for vq %u\n", mvq->index);

-- 
2.45.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH vhost 16/23] vdpa/mlx5: Add error code for suspend/resume VQ
  2024-06-17 15:07 [PATCH vhost 00/23] vdpa/mlx5: Pre-create HW VQs to reduce LM downtime Dragos Tatulea
                   ` (14 preceding siblings ...)
  2024-06-17 15:07 ` [PATCH vhost 15/23] vdpa/mlx5: Accept Init -> Ready VQ transition in resume_vq() Dragos Tatulea
@ 2024-06-17 15:07 ` Dragos Tatulea
  2024-06-19 15:41   ` Eugenio Perez Martin
  2024-06-17 15:07 ` [PATCH vhost 17/23] vdpa/mlx5: Consolidate all VQ modify to Ready to use resume_vq() Dragos Tatulea
                   ` (6 subsequent siblings)
  22 siblings, 1 reply; 53+ messages in thread
From: Dragos Tatulea @ 2024-06-17 15:07 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Si-Wei Liu
  Cc: virtualization, linux-kernel, linux-rdma, netdev, Dragos Tatulea,
	Cosmin Ratiu

Instead of blindly calling suspend/resume_vqs(), make then return error
codes.

To keep compatibility, keep suspending or resuming VQs on error and
return the last error code. The assumption here is that the error code
would be the same.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 77 +++++++++++++++++++++++++++------------
 1 file changed, 54 insertions(+), 23 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index e4d68d2d0bb4..e3a82c43b44e 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1526,71 +1526,102 @@ static int setup_vq(struct mlx5_vdpa_net *ndev,
 	return err;
 }
 
-static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
+static int suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
 {
 	struct mlx5_virtq_attr attr;
+	int err;
 
 	if (!mvq->initialized)
-		return;
+		return 0;
 
 	if (mvq->fw_state != MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY)
-		return;
+		return 0;
 
-	if (modify_virtqueue_state(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND))
-		mlx5_vdpa_warn(&ndev->mvdev, "modify to suspend failed\n");
+	err = modify_virtqueue_state(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND);
+	if (err) {
+		mlx5_vdpa_warn(&ndev->mvdev, "modify to suspend failed, err: %d\n", err);
+		return err;
+	}
 
-	if (query_virtqueue(ndev, mvq, &attr)) {
-		mlx5_vdpa_warn(&ndev->mvdev, "failed to query virtqueue\n");
-		return;
+	err = query_virtqueue(ndev, mvq, &attr);
+	if (err) {
+		mlx5_vdpa_warn(&ndev->mvdev, "failed to query virtqueue, err: %d\n", err);
+		return err;
 	}
+
 	mvq->avail_idx = attr.available_index;
 	mvq->used_idx = attr.used_index;
+
+	return 0;
 }
 
-static void suspend_vqs(struct mlx5_vdpa_net *ndev)
+static int suspend_vqs(struct mlx5_vdpa_net *ndev)
 {
+	int err = 0;
 	int i;
 
-	for (i = 0; i < ndev->cur_num_vqs; i++)
-		suspend_vq(ndev, &ndev->vqs[i]);
+	for (i = 0; i < ndev->cur_num_vqs; i++) {
+		int local_err = suspend_vq(ndev, &ndev->vqs[i]);
+
+		err = local_err ? local_err : err;
+	}
+
+	return err;
 }
 
-static void resume_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
+static int resume_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
 {
+	int err;
+
 	if (!mvq->initialized)
-		return;
+		return 0;
 
 	switch (mvq->fw_state) {
 	case MLX5_VIRTIO_NET_Q_OBJECT_STATE_INIT:
 		/* Due to a FW quirk we need to modify the VQ fields first then change state.
 		 * This should be fixed soon. After that, a single command can be used.
 		 */
-		if (modify_virtqueue(ndev, mvq, 0))
+		err = modify_virtqueue(ndev, mvq, 0);
+		if (err) {
 			mlx5_vdpa_warn(&ndev->mvdev,
-				"modify vq properties failed for vq %u\n", mvq->index);
+				"modify vq properties failed for vq %u, err: %d\n",
+				mvq->index, err);
+			return err;
+		}
 		break;
 	case MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND:
 		if (!is_resumable(ndev)) {
 			mlx5_vdpa_warn(&ndev->mvdev, "vq %d is not resumable\n", mvq->index);
-			return;
+			return -EINVAL;
 		}
 		break;
 	case MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY:
-		return;
+		return 0;
 	default:
 		mlx5_vdpa_warn(&ndev->mvdev, "resume vq %u called from bad state %d\n",
 			       mvq->index, mvq->fw_state);
-		return;
+		return -EINVAL;
 	}
 
-	if (modify_virtqueue_state(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY))
-		mlx5_vdpa_warn(&ndev->mvdev, "modify to resume failed for vq %u\n", mvq->index);
+	err = modify_virtqueue_state(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY);
+	if (err)
+		mlx5_vdpa_warn(&ndev->mvdev, "modify to resume failed for vq %u, err: %d\n",
+			       mvq->index, err);
+
+	return err;
 }
 
-static void resume_vqs(struct mlx5_vdpa_net *ndev)
+static int resume_vqs(struct mlx5_vdpa_net *ndev)
 {
-	for (int i = 0; i < ndev->cur_num_vqs; i++)
-		resume_vq(ndev, &ndev->vqs[i]);
+	int err = 0;
+
+	for (int i = 0; i < ndev->cur_num_vqs; i++) {
+		int local_err = resume_vq(ndev, &ndev->vqs[i]);
+
+		err = local_err ? local_err : err;
+	}
+
+	return err;
 }
 
 static void teardown_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)

-- 
2.45.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH vhost 17/23] vdpa/mlx5: Consolidate all VQ modify to Ready to use resume_vq()
  2024-06-17 15:07 [PATCH vhost 00/23] vdpa/mlx5: Pre-create HW VQs to reduce LM downtime Dragos Tatulea
                   ` (15 preceding siblings ...)
  2024-06-17 15:07 ` [PATCH vhost 16/23] vdpa/mlx5: Add error code for suspend/resume VQ Dragos Tatulea
@ 2024-06-17 15:07 ` Dragos Tatulea
  2024-06-19 15:43   ` Eugenio Perez Martin
  2024-06-17 15:07 ` [PATCH vhost 18/23] vdpa/mlx5: Forward error in suspend/resume device Dragos Tatulea
                   ` (5 subsequent siblings)
  22 siblings, 1 reply; 53+ messages in thread
From: Dragos Tatulea @ 2024-06-17 15:07 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Si-Wei Liu
  Cc: virtualization, linux-kernel, linux-rdma, netdev, Dragos Tatulea,
	Cosmin Ratiu

There are a few more places modifying the VQ to Ready directly. Let's
consolidate them into resume_vq().

The redundant warnings for resume_vq() errors can also be dropped.

There is one special case that needs to be handled for virtio-vdpa:
the initialized flag must be set to true earlier in setup_vq() so that
resume_vq() doesn't return early.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 18 ++++++------------
 1 file changed, 6 insertions(+), 12 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index e3a82c43b44e..f5d5b25cdb01 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -160,6 +160,7 @@ static void free_fixed_resources(struct mlx5_vdpa_net *ndev);
 static void init_mvqs(struct mlx5_vdpa_net *ndev);
 static int setup_vq_resources(struct mlx5_vdpa_net *ndev, bool filled);
 static void teardown_vq_resources(struct mlx5_vdpa_net *ndev);
+static int resume_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq);
 
 static bool mlx5_vdpa_debug;
 
@@ -1500,16 +1501,14 @@ static int setup_vq(struct mlx5_vdpa_net *ndev,
 	if (err)
 		goto err_vq;
 
+	mvq->initialized = true;
+
 	if (mvq->ready) {
-		err = modify_virtqueue_state(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY);
-		if (err) {
-			mlx5_vdpa_warn(&ndev->mvdev, "failed to modify to ready vq idx %d(%d)\n",
-				       idx, err);
+		err = resume_vq(ndev, mvq);
+		if (err)
 			goto err_modify;
-		}
 	}
 
-	mvq->initialized = true;
 	return 0;
 
 err_modify:
@@ -2422,7 +2421,6 @@ static void mlx5_vdpa_set_vq_ready(struct vdpa_device *vdev, u16 idx, bool ready
 	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
 	struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
 	struct mlx5_vdpa_virtqueue *mvq;
-	int err;
 
 	if (!mvdev->actual_features)
 		return;
@@ -2439,14 +2437,10 @@ static void mlx5_vdpa_set_vq_ready(struct vdpa_device *vdev, u16 idx, bool ready
 	if (!ready) {
 		suspend_vq(ndev, mvq);
 	} else {
-		err = modify_virtqueue_state(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY);
-		if (err) {
-			mlx5_vdpa_warn(mvdev, "modify VQ %d to ready failed (%d)\n", idx, err);
+		if (resume_vq(ndev, mvq))
 			ready = false;
-		}
 	}
 
-
 	mvq->ready = ready;
 }
 

-- 
2.45.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH vhost 18/23] vdpa/mlx5: Forward error in suspend/resume device
  2024-06-17 15:07 [PATCH vhost 00/23] vdpa/mlx5: Pre-create HW VQs to reduce LM downtime Dragos Tatulea
                   ` (16 preceding siblings ...)
  2024-06-17 15:07 ` [PATCH vhost 17/23] vdpa/mlx5: Consolidate all VQ modify to Ready to use resume_vq() Dragos Tatulea
@ 2024-06-17 15:07 ` Dragos Tatulea
  2024-06-23 11:19   ` Zhu Yanjun
  2024-06-17 15:07 ` [PATCH vhost 19/23] vdpa/mlx5: Use suspend/resume during VQP change Dragos Tatulea
                   ` (4 subsequent siblings)
  22 siblings, 1 reply; 53+ messages in thread
From: Dragos Tatulea @ 2024-06-17 15:07 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Si-Wei Liu
  Cc: virtualization, linux-kernel, linux-rdma, netdev, Dragos Tatulea,
	Cosmin Ratiu

Start using the suspend/resume_vq() error return codes previously added.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index f5d5b25cdb01..0e1c1b7ff297 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -3436,22 +3436,25 @@ static int mlx5_vdpa_suspend(struct vdpa_device *vdev)
 {
 	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
 	struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+	int err;
 
 	mlx5_vdpa_info(mvdev, "suspending device\n");
 
 	down_write(&ndev->reslock);
 	unregister_link_notifier(ndev);
-	suspend_vqs(ndev);
+	err = suspend_vqs(ndev);
 	mlx5_vdpa_cvq_suspend(mvdev);
 	mvdev->suspended = true;
 	up_write(&ndev->reslock);
-	return 0;
+
+	return err;
 }
 
 static int mlx5_vdpa_resume(struct vdpa_device *vdev)
 {
 	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
 	struct mlx5_vdpa_net *ndev;
+	int err;
 
 	ndev = to_mlx5_vdpa_ndev(mvdev);
 
@@ -3459,10 +3462,11 @@ static int mlx5_vdpa_resume(struct vdpa_device *vdev)
 
 	down_write(&ndev->reslock);
 	mvdev->suspended = false;
-	resume_vqs(ndev);
+	err = resume_vqs(ndev);
 	register_link_notifier(ndev);
 	up_write(&ndev->reslock);
-	return 0;
+
+	return err;
 }
 
 static int mlx5_set_group_asid(struct vdpa_device *vdev, u32 group,

-- 
2.45.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH vhost 19/23] vdpa/mlx5: Use suspend/resume during VQP change
  2024-06-17 15:07 [PATCH vhost 00/23] vdpa/mlx5: Pre-create HW VQs to reduce LM downtime Dragos Tatulea
                   ` (17 preceding siblings ...)
  2024-06-17 15:07 ` [PATCH vhost 18/23] vdpa/mlx5: Forward error in suspend/resume device Dragos Tatulea
@ 2024-06-17 15:07 ` Dragos Tatulea
  2024-06-19 15:46   ` Eugenio Perez Martin
  2024-06-17 15:07 ` [PATCH vhost 20/23] vdpa/mlx5: Pre-create hardware VQs at vdpa .dev_add time Dragos Tatulea
                   ` (3 subsequent siblings)
  22 siblings, 1 reply; 53+ messages in thread
From: Dragos Tatulea @ 2024-06-17 15:07 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Si-Wei Liu
  Cc: virtualization, linux-kernel, linux-rdma, netdev, Dragos Tatulea,
	Cosmin Ratiu

Resume a VQ if it is already created when the number of VQ pairs
increases. This is done in preparation for VQ pre-creation which is
coming in a later patch. It is necessary because calling setup_vq() on
an already created VQ will return early and will not enable the queue.

For symmetry, suspend a VQ instead of tearing it down when the number of
VQ pairs decreases. But only if the resume operation is supported.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 0e1c1b7ff297..249b5afbe34a 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -2130,14 +2130,22 @@ static int change_num_qps(struct mlx5_vdpa_dev *mvdev, int newqps)
 		if (err)
 			return err;
 
-		for (i = ndev->cur_num_vqs - 1; i >= 2 * newqps; i--)
-			teardown_vq(ndev, &ndev->vqs[i]);
+		for (i = ndev->cur_num_vqs - 1; i >= 2 * newqps; i--) {
+			struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[i];
+
+			if (is_resumable(ndev))
+				suspend_vq(ndev, mvq);
+			else
+				teardown_vq(ndev, mvq);
+		}
 
 		ndev->cur_num_vqs = 2 * newqps;
 	} else {
 		ndev->cur_num_vqs = 2 * newqps;
 		for (i = cur_qps * 2; i < 2 * newqps; i++) {
-			err = setup_vq(ndev, &ndev->vqs[i], true);
+			struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[i];
+
+			err = mvq->initialized ? resume_vq(ndev, mvq) : setup_vq(ndev, mvq, true);
 			if (err)
 				goto clean_added;
 		}

-- 
2.45.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH vhost 20/23] vdpa/mlx5: Pre-create hardware VQs at vdpa .dev_add time
  2024-06-17 15:07 [PATCH vhost 00/23] vdpa/mlx5: Pre-create HW VQs to reduce LM downtime Dragos Tatulea
                   ` (18 preceding siblings ...)
  2024-06-17 15:07 ` [PATCH vhost 19/23] vdpa/mlx5: Use suspend/resume during VQP change Dragos Tatulea
@ 2024-06-17 15:07 ` Dragos Tatulea
  2024-06-19 15:54   ` Eugenio Perez Martin
  2024-07-08 16:22   ` Zhu Yanjun
  2024-06-17 15:07 ` [PATCH vhost 21/23] vdpa/mlx5: Re-create HW VQs under certain conditions Dragos Tatulea
                   ` (2 subsequent siblings)
  22 siblings, 2 replies; 53+ messages in thread
From: Dragos Tatulea @ 2024-06-17 15:07 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Si-Wei Liu
  Cc: virtualization, linux-kernel, linux-rdma, netdev, Dragos Tatulea,
	Cosmin Ratiu

Currently, hardware VQs are created right when the vdpa device gets into
DRIVER_OK state. That is easier because most of the VQ state is known by
then.

This patch switches to creating all VQs and their associated resources
at device creation time. The motivation is to reduce the vdpa device
live migration downtime by moving the expensive operation of creating
all the hardware VQs and their associated resources out of downtime on
the destination VM.

The VQs are now created in a blank state. The VQ configuration will
happen later, on DRIVER_OK. Then the configuration will be applied when
the VQs are moved to the Ready state.

When .set_vq_ready() is called on a VQ before DRIVER_OK, special care is
needed: now that the VQ is already created a resume_vq() will be
triggered too early when no mr has been configured yet. Skip calling
resume_vq() in this case, let it be handled during DRIVER_OK.

For virtio-vdpa, the device configuration is done earlier during
.vdpa_dev_add() by vdpa_register_device(). Avoid calling
setup_vq_resources() a second time in that case.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 37 ++++++++++++++++++++++++++++++++-----
 1 file changed, 32 insertions(+), 5 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 249b5afbe34a..b2836fd3d1dd 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -2444,7 +2444,7 @@ static void mlx5_vdpa_set_vq_ready(struct vdpa_device *vdev, u16 idx, bool ready
 	mvq = &ndev->vqs[idx];
 	if (!ready) {
 		suspend_vq(ndev, mvq);
-	} else {
+	} else if (mvdev->status & VIRTIO_CONFIG_S_DRIVER_OK) {
 		if (resume_vq(ndev, mvq))
 			ready = false;
 	}
@@ -3078,10 +3078,18 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status)
 				goto err_setup;
 			}
 			register_link_notifier(ndev);
-			err = setup_vq_resources(ndev, true);
-			if (err) {
-				mlx5_vdpa_warn(mvdev, "failed to setup driver\n");
-				goto err_driver;
+			if (ndev->setup) {
+				err = resume_vqs(ndev);
+				if (err) {
+					mlx5_vdpa_warn(mvdev, "failed to resume VQs\n");
+					goto err_driver;
+				}
+			} else {
+				err = setup_vq_resources(ndev, true);
+				if (err) {
+					mlx5_vdpa_warn(mvdev, "failed to setup driver\n");
+					goto err_driver;
+				}
 			}
 		} else {
 			mlx5_vdpa_warn(mvdev, "did not expect DRIVER_OK to be cleared\n");
@@ -3142,6 +3150,7 @@ static int mlx5_vdpa_compat_reset(struct vdpa_device *vdev, u32 flags)
 		if (mlx5_vdpa_create_dma_mr(mvdev))
 			mlx5_vdpa_warn(mvdev, "create MR failed\n");
 	}
+	setup_vq_resources(ndev, false);
 	up_write(&ndev->reslock);
 
 	return 0;
@@ -3836,8 +3845,21 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name,
 		goto err_reg;
 
 	mgtdev->ndev = ndev;
+
+	/* For virtio-vdpa, the device was set up during device register. */
+	if (ndev->setup)
+		return 0;
+
+	down_write(&ndev->reslock);
+	err = setup_vq_resources(ndev, false);
+	up_write(&ndev->reslock);
+	if (err)
+		goto err_setup_vq_res;
+
 	return 0;
 
+err_setup_vq_res:
+	_vdpa_unregister_device(&mvdev->vdev);
 err_reg:
 	destroy_workqueue(mvdev->wq);
 err_res2:
@@ -3863,6 +3885,11 @@ static void mlx5_vdpa_dev_del(struct vdpa_mgmt_dev *v_mdev, struct vdpa_device *
 
 	unregister_link_notifier(ndev);
 	_vdpa_unregister_device(dev);
+
+	down_write(&ndev->reslock);
+	teardown_vq_resources(ndev);
+	up_write(&ndev->reslock);
+
 	wq = mvdev->wq;
 	mvdev->wq = NULL;
 	destroy_workqueue(wq);

-- 
2.45.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH vhost 21/23] vdpa/mlx5: Re-create HW VQs under certain conditions
  2024-06-17 15:07 [PATCH vhost 00/23] vdpa/mlx5: Pre-create HW VQs to reduce LM downtime Dragos Tatulea
                   ` (19 preceding siblings ...)
  2024-06-17 15:07 ` [PATCH vhost 20/23] vdpa/mlx5: Pre-create hardware VQs at vdpa .dev_add time Dragos Tatulea
@ 2024-06-17 15:07 ` Dragos Tatulea
  2024-06-19 16:04   ` Eugenio Perez Martin
  2024-06-17 15:07 ` [PATCH vhost 22/23] vdpa/mlx5: Don't reset VQs more than necessary Dragos Tatulea
  2024-06-17 15:07 ` [PATCH vhost 23/23] vdpa/mlx5: Don't enable non-active VQs in .set_vq_ready() Dragos Tatulea
  22 siblings, 1 reply; 53+ messages in thread
From: Dragos Tatulea @ 2024-06-17 15:07 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Si-Wei Liu
  Cc: virtualization, linux-kernel, linux-rdma, netdev, Dragos Tatulea,
	Cosmin Ratiu

There are a few conditions under which the hardware VQs need a full
teardown and setup:

- VQ size changed to something else than default value. Hardware VQ size
  modification is not supported.

- User turns off certain device features: mergeable buffers, checksum
  virtio 1.0 compliance. In these cases, the TIR and RQT need to be
  re-created.

Add a needs_teardown configuration variable and set it when detecting
the above scenarios. On next DRIVER_OK, the resources will be torn down
first.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 15 +++++++++++++++
 drivers/vdpa/mlx5/net/mlx5_vnet.h |  1 +
 2 files changed, 16 insertions(+)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index b2836fd3d1dd..d80d6b47da61 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -2390,6 +2390,7 @@ static void mlx5_vdpa_set_vq_num(struct vdpa_device *vdev, u16 idx, u32 num)
         }
 
 	mvq = &ndev->vqs[idx];
+	ndev->needs_teardown = num != mvq->num_ent;
 	mvq->num_ent = num;
 }
 
@@ -2800,6 +2801,7 @@ static int mlx5_vdpa_set_driver_features(struct vdpa_device *vdev, u64 features)
 	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
 	struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
 	u64 old_features = mvdev->actual_features;
+	u64 diff_features;
 	int err;
 
 	print_features(mvdev, features, true);
@@ -2822,6 +2824,14 @@ static int mlx5_vdpa_set_driver_features(struct vdpa_device *vdev, u64 features)
 		}
 	}
 
+	/* When below features diverge from initial device features, VQs need a full teardown. */
+#define NEEDS_TEARDOWN_MASK (BIT_ULL(VIRTIO_NET_F_MRG_RXBUF) | \
+			     BIT_ULL(VIRTIO_NET_F_CSUM) | \
+			     BIT_ULL(VIRTIO_F_VERSION_1))
+
+	diff_features = mvdev->mlx_features ^ mvdev->actual_features;
+	ndev->needs_teardown = !!(diff_features & NEEDS_TEARDOWN_MASK);
+
 	update_cvq_info(mvdev);
 	return err;
 }
@@ -3038,6 +3048,7 @@ static void teardown_vq_resources(struct mlx5_vdpa_net *ndev)
 	destroy_rqt(ndev);
 	teardown_virtqueues(ndev);
 	ndev->setup = false;
+	ndev->needs_teardown = false;
 }
 
 static int setup_cvq_vring(struct mlx5_vdpa_dev *mvdev)
@@ -3078,6 +3089,10 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status)
 				goto err_setup;
 			}
 			register_link_notifier(ndev);
+
+			if (ndev->needs_teardown)
+				teardown_vq_resources(ndev);
+
 			if (ndev->setup) {
 				err = resume_vqs(ndev);
 				if (err) {
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.h b/drivers/vdpa/mlx5/net/mlx5_vnet.h
index 2ada29767cc5..da7318f82d2a 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.h
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.h
@@ -56,6 +56,7 @@ struct mlx5_vdpa_net {
 	struct dentry *rx_dent;
 	struct dentry *rx_table_dent;
 	bool setup;
+	bool needs_teardown;
 	u32 cur_num_vqs;
 	u32 rqt_size;
 	u16 default_queue_size;

-- 
2.45.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH vhost 22/23] vdpa/mlx5: Don't reset VQs more than necessary
  2024-06-17 15:07 [PATCH vhost 00/23] vdpa/mlx5: Pre-create HW VQs to reduce LM downtime Dragos Tatulea
                   ` (20 preceding siblings ...)
  2024-06-17 15:07 ` [PATCH vhost 21/23] vdpa/mlx5: Re-create HW VQs under certain conditions Dragos Tatulea
@ 2024-06-17 15:07 ` Dragos Tatulea
  2024-06-19 16:14   ` Eugenio Perez Martin
  2024-06-17 15:07 ` [PATCH vhost 23/23] vdpa/mlx5: Don't enable non-active VQs in .set_vq_ready() Dragos Tatulea
  22 siblings, 1 reply; 53+ messages in thread
From: Dragos Tatulea @ 2024-06-17 15:07 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Si-Wei Liu
  Cc: virtualization, linux-kernel, linux-rdma, netdev, Dragos Tatulea,
	Cosmin Ratiu

The vdpa device can be reset many times in sequence without any
significant state changes in between. Previously this was not a problem:
VQs were torn down only on first reset. But after VQ pre-creation was
introduced, each reset will delete and re-create the hardware VQs and
their associated resources.

To solve this problem, avoid resetting hardware VQs if the VQs are still
in a blank state.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 30 +++++++++++++++++++++++++++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index d80d6b47da61..1a5ee0d2b47f 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -3134,18 +3134,41 @@ static void init_group_to_asid_map(struct mlx5_vdpa_dev *mvdev)
 		mvdev->group2asid[i] = 0;
 }
 
+static bool needs_vqs_reset(const struct mlx5_vdpa_dev *mvdev)
+{
+	struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+	struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[0];
+
+	if (mvdev->status & VIRTIO_CONFIG_S_DRIVER_OK)
+		return true;
+
+	if (mvq->fw_state != MLX5_VIRTIO_NET_Q_OBJECT_STATE_INIT)
+		return true;
+
+	return mvq->modified_fields & (
+		MLX5_VIRTQ_MODIFY_MASK_STATE |
+		MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_ADDRS |
+		MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_AVAIL_IDX |
+		MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_USED_IDX
+	);
+}
+
 static int mlx5_vdpa_compat_reset(struct vdpa_device *vdev, u32 flags)
 {
 	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
 	struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+	bool vq_reset;
 
 	print_status(mvdev, 0, true);
 	mlx5_vdpa_info(mvdev, "performing device reset\n");
 
 	down_write(&ndev->reslock);
 	unregister_link_notifier(ndev);
-	teardown_vq_resources(ndev);
-	init_mvqs(ndev);
+	vq_reset = needs_vqs_reset(mvdev);
+	if (vq_reset) {
+		teardown_vq_resources(ndev);
+		init_mvqs(ndev);
+	}
 
 	if (flags & VDPA_RESET_F_CLEAN_MAP)
 		mlx5_vdpa_destroy_mr_resources(&ndev->mvdev);
@@ -3165,7 +3188,8 @@ static int mlx5_vdpa_compat_reset(struct vdpa_device *vdev, u32 flags)
 		if (mlx5_vdpa_create_dma_mr(mvdev))
 			mlx5_vdpa_warn(mvdev, "create MR failed\n");
 	}
-	setup_vq_resources(ndev, false);
+	if (vq_reset)
+		setup_vq_resources(ndev, false);
 	up_write(&ndev->reslock);
 
 	return 0;

-- 
2.45.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH vhost 23/23] vdpa/mlx5: Don't enable non-active VQs in .set_vq_ready()
  2024-06-17 15:07 [PATCH vhost 00/23] vdpa/mlx5: Pre-create HW VQs to reduce LM downtime Dragos Tatulea
                   ` (21 preceding siblings ...)
  2024-06-17 15:07 ` [PATCH vhost 22/23] vdpa/mlx5: Don't reset VQs more than necessary Dragos Tatulea
@ 2024-06-17 15:07 ` Dragos Tatulea
  2024-06-19 16:39   ` Eugenio Perez Martin
  22 siblings, 1 reply; 53+ messages in thread
From: Dragos Tatulea @ 2024-06-17 15:07 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez,
	Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Si-Wei Liu
  Cc: virtualization, linux-kernel, linux-rdma, netdev, Dragos Tatulea,
	Cosmin Ratiu

VQ indices in the range [cur_num_qps, max_vqs) represent queues that
have not yet been activated. .set_vq_ready should not activate these
VQs.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
---
 drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 1a5ee0d2b47f..a969a7f105a6 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1575,6 +1575,9 @@ static int resume_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq
 	if (!mvq->initialized)
 		return 0;
 
+	if (mvq->index >= ndev->cur_num_vqs)
+		return 0;
+
 	switch (mvq->fw_state) {
 	case MLX5_VIRTIO_NET_Q_OBJECT_STATE_INIT:
 		/* Due to a FW quirk we need to modify the VQ fields first then change state.

-- 
2.45.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH vhost 01/23] vdpa/mlx5: Clarify meaning thorough function rename
  2024-06-17 15:07 ` [PATCH vhost 01/23] vdpa/mlx5: Clarify meaning thorough function rename Dragos Tatulea
@ 2024-06-19 10:37   ` Eugenio Perez Martin
  0 siblings, 0 replies; 53+ messages in thread
From: Eugenio Perez Martin @ 2024-06-19 10:37 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Saeed Mahameed,
	Leon Romanovsky, Tariq Toukan, Si-Wei Liu, virtualization,
	linux-kernel, linux-rdma, netdev, Cosmin Ratiu

On Mon, Jun 17, 2024 at 5:08 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> setup_driver()/teardown_driver() are a bit vague. These functions are
> used for virtqueue resources.
>
> Same for alloc_resources()/teardown_resources(): they represent fixed
> resources that are meant to exist during the device lifetime.
>
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>

Acked-by: Eugenio Pérez <eperezma@redhat.com>

> ---
>  drivers/vdpa/mlx5/net/mlx5_vnet.c | 28 ++++++++++++++--------------
>  1 file changed, 14 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index ecfc16151d61..3422da0e344b 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -144,10 +144,10 @@ static bool is_index_valid(struct mlx5_vdpa_dev *mvdev, u16 idx)
>         return idx <= mvdev->max_idx;
>  }
>
> -static void free_resources(struct mlx5_vdpa_net *ndev);
> +static void free_fixed_resources(struct mlx5_vdpa_net *ndev);
>  static void init_mvqs(struct mlx5_vdpa_net *ndev);
> -static int setup_driver(struct mlx5_vdpa_dev *mvdev);
> -static void teardown_driver(struct mlx5_vdpa_net *ndev);
> +static int setup_vq_resources(struct mlx5_vdpa_dev *mvdev);
> +static void teardown_vq_resources(struct mlx5_vdpa_net *ndev);
>
>  static bool mlx5_vdpa_debug;
>
> @@ -2848,7 +2848,7 @@ static int mlx5_vdpa_change_map(struct mlx5_vdpa_dev *mvdev,
>                 if (err)
>                         return err;
>
> -               teardown_driver(ndev);
> +               teardown_vq_resources(ndev);
>         }
>
>         mlx5_vdpa_update_mr(mvdev, new_mr, asid);
> @@ -2862,7 +2862,7 @@ static int mlx5_vdpa_change_map(struct mlx5_vdpa_dev *mvdev,
>
>         if (teardown) {
>                 restore_channels_info(ndev);
> -               err = setup_driver(mvdev);
> +               err = setup_vq_resources(mvdev);
>                 if (err)
>                         return err;
>         }
> @@ -2873,7 +2873,7 @@ static int mlx5_vdpa_change_map(struct mlx5_vdpa_dev *mvdev,
>  }
>
>  /* reslock must be held for this function */
> -static int setup_driver(struct mlx5_vdpa_dev *mvdev)
> +static int setup_vq_resources(struct mlx5_vdpa_dev *mvdev)
>  {
>         struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
>         int err;
> @@ -2931,7 +2931,7 @@ static int setup_driver(struct mlx5_vdpa_dev *mvdev)
>  }
>
>  /* reslock must be held for this function */
> -static void teardown_driver(struct mlx5_vdpa_net *ndev)
> +static void teardown_vq_resources(struct mlx5_vdpa_net *ndev)
>  {
>
>         WARN_ON(!rwsem_is_locked(&ndev->reslock));
> @@ -2997,7 +2997,7 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status)
>                                 goto err_setup;
>                         }
>                         register_link_notifier(ndev);
> -                       err = setup_driver(mvdev);
> +                       err = setup_vq_resources(mvdev);
>                         if (err) {
>                                 mlx5_vdpa_warn(mvdev, "failed to setup driver\n");
>                                 goto err_driver;
> @@ -3040,7 +3040,7 @@ static int mlx5_vdpa_compat_reset(struct vdpa_device *vdev, u32 flags)
>
>         down_write(&ndev->reslock);
>         unregister_link_notifier(ndev);
> -       teardown_driver(ndev);
> +       teardown_vq_resources(ndev);
>         clear_vqs_ready(ndev);
>         if (flags & VDPA_RESET_F_CLEAN_MAP)
>                 mlx5_vdpa_destroy_mr_resources(&ndev->mvdev);
> @@ -3197,7 +3197,7 @@ static void mlx5_vdpa_free(struct vdpa_device *vdev)
>
>         ndev = to_mlx5_vdpa_ndev(mvdev);
>
> -       free_resources(ndev);
> +       free_fixed_resources(ndev);
>         mlx5_vdpa_destroy_mr_resources(mvdev);
>         if (!is_zero_ether_addr(ndev->config.mac)) {
>                 pfmdev = pci_get_drvdata(pci_physfn(mvdev->mdev->pdev));
> @@ -3467,7 +3467,7 @@ static int query_mtu(struct mlx5_core_dev *mdev, u16 *mtu)
>         return 0;
>  }
>
> -static int alloc_resources(struct mlx5_vdpa_net *ndev)
> +static int alloc_fixed_resources(struct mlx5_vdpa_net *ndev)
>  {
>         struct mlx5_vdpa_net_resources *res = &ndev->res;
>         int err;
> @@ -3494,7 +3494,7 @@ static int alloc_resources(struct mlx5_vdpa_net *ndev)
>         return err;
>  }
>
> -static void free_resources(struct mlx5_vdpa_net *ndev)
> +static void free_fixed_resources(struct mlx5_vdpa_net *ndev)
>  {
>         struct mlx5_vdpa_net_resources *res = &ndev->res;
>
> @@ -3735,7 +3735,7 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name,
>                         goto err_res;
>         }
>
> -       err = alloc_resources(ndev);
> +       err = alloc_fixed_resources(ndev);
>         if (err)
>                 goto err_mr;
>
> @@ -3758,7 +3758,7 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name,
>  err_reg:
>         destroy_workqueue(mvdev->wq);
>  err_res2:
> -       free_resources(ndev);
> +       free_fixed_resources(ndev);
>  err_mr:
>         mlx5_vdpa_destroy_mr_resources(mvdev);
>  err_res:
>
> --
> 2.45.1
>


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH vhost 02/23] vdpa/mlx5: Make setup/teardown_vq_resources() symmetrical
  2024-06-17 15:07 ` [PATCH vhost 02/23] vdpa/mlx5: Make setup/teardown_vq_resources() symmetrical Dragos Tatulea
@ 2024-06-19 10:38   ` Eugenio Perez Martin
  0 siblings, 0 replies; 53+ messages in thread
From: Eugenio Perez Martin @ 2024-06-19 10:38 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Saeed Mahameed,
	Leon Romanovsky, Tariq Toukan, Si-Wei Liu, virtualization,
	linux-kernel, linux-rdma, netdev, Cosmin Ratiu

On Mon, Jun 17, 2024 at 5:08 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> ... by changing the setup_vq_resources() parameter.

s/parameter/parameter type/ ?

Either way,

Acked-by: Eugenio Pérez <eperezma@redhat.com>

>
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
> ---
>  drivers/vdpa/mlx5/net/mlx5_vnet.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index 3422da0e344b..1ad281cbc541 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -146,7 +146,7 @@ static bool is_index_valid(struct mlx5_vdpa_dev *mvdev, u16 idx)
>
>  static void free_fixed_resources(struct mlx5_vdpa_net *ndev);
>  static void init_mvqs(struct mlx5_vdpa_net *ndev);
> -static int setup_vq_resources(struct mlx5_vdpa_dev *mvdev);
> +static int setup_vq_resources(struct mlx5_vdpa_net *ndev);
>  static void teardown_vq_resources(struct mlx5_vdpa_net *ndev);
>
>  static bool mlx5_vdpa_debug;
> @@ -2862,7 +2862,7 @@ static int mlx5_vdpa_change_map(struct mlx5_vdpa_dev *mvdev,
>
>         if (teardown) {
>                 restore_channels_info(ndev);
> -               err = setup_vq_resources(mvdev);
> +               err = setup_vq_resources(ndev);
>                 if (err)
>                         return err;
>         }
> @@ -2873,9 +2873,9 @@ static int mlx5_vdpa_change_map(struct mlx5_vdpa_dev *mvdev,
>  }
>
>  /* reslock must be held for this function */
> -static int setup_vq_resources(struct mlx5_vdpa_dev *mvdev)
> +static int setup_vq_resources(struct mlx5_vdpa_net *ndev)
>  {
> -       struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> +       struct mlx5_vdpa_dev *mvdev = &ndev->mvdev;
>         int err;
>
>         WARN_ON(!rwsem_is_locked(&ndev->reslock));
> @@ -2997,7 +2997,7 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status)
>                                 goto err_setup;
>                         }
>                         register_link_notifier(ndev);
> -                       err = setup_vq_resources(mvdev);
> +                       err = setup_vq_resources(ndev);
>                         if (err) {
>                                 mlx5_vdpa_warn(mvdev, "failed to setup driver\n");
>                                 goto err_driver;
>
> --
> 2.45.1
>


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH vhost 03/23] vdpa/mlx5: Drop redundant code
  2024-06-17 15:07 ` [PATCH vhost 03/23] vdpa/mlx5: Drop redundant code Dragos Tatulea
@ 2024-06-19 10:55   ` Eugenio Perez Martin
  0 siblings, 0 replies; 53+ messages in thread
From: Eugenio Perez Martin @ 2024-06-19 10:55 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Saeed Mahameed,
	Leon Romanovsky, Tariq Toukan, Si-Wei Liu, virtualization,
	linux-kernel, linux-rdma, netdev, Cosmin Ratiu

On Mon, Jun 17, 2024 at 5:08 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:

Patch message suggestion:
Originally, the second loop initialized the CVQ. But (acde3929492b
("vdpa/mlx5: Use consistent RQT size") initialized all the queues in
the first loop, so the second iteration in ...

>
> The second iteration in init_mvqs() is never called because the first
> one will iterate up to max_vqs.
>

Either way,

Acked-by: Eugenio Pérez <eperezma@redhat.com>

> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
> ---
>  drivers/vdpa/mlx5/net/mlx5_vnet.c | 6 ------
>  1 file changed, 6 deletions(-)
>
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index 1ad281cbc541..b4d9ef4f66c8 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -3519,12 +3519,6 @@ static void init_mvqs(struct mlx5_vdpa_net *ndev)
>                 mvq->fwqp.fw = true;
>                 mvq->fw_state = MLX5_VIRTIO_NET_Q_OBJECT_NONE;
>         }
> -       for (; i < ndev->mvdev.max_vqs; i++) {
> -               mvq = &ndev->vqs[i];
> -               memset(mvq, 0, offsetof(struct mlx5_vdpa_virtqueue, ri));
> -               mvq->index = i;
> -               mvq->ndev = ndev;
> -       }
>  }
>
>  struct mlx5_vdpa_mgmtdev {
>
> --
> 2.45.1
>


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH vhost 04/23] vdpa/mlx5: Drop redundant check in teardown_virtqueues()
  2024-06-17 15:07 ` [PATCH vhost 04/23] vdpa/mlx5: Drop redundant check in teardown_virtqueues() Dragos Tatulea
@ 2024-06-19 10:56   ` Eugenio Perez Martin
  0 siblings, 0 replies; 53+ messages in thread
From: Eugenio Perez Martin @ 2024-06-19 10:56 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Saeed Mahameed,
	Leon Romanovsky, Tariq Toukan, Si-Wei Liu, virtualization,
	linux-kernel, linux-rdma, netdev, Cosmin Ratiu

On Mon, Jun 17, 2024 at 5:08 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> The check is done inside teardown_vq().
>
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>

Reviewed-by: Eugenio Pérez <eperezma@redhat.com>

> ---
>  drivers/vdpa/mlx5/net/mlx5_vnet.c | 10 ++--------
>  1 file changed, 2 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index b4d9ef4f66c8..96782b34e2b2 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -2559,16 +2559,10 @@ static int setup_virtqueues(struct mlx5_vdpa_dev *mvdev)
>
>  static void teardown_virtqueues(struct mlx5_vdpa_net *ndev)
>  {
> -       struct mlx5_vdpa_virtqueue *mvq;
>         int i;
>
> -       for (i = ndev->mvdev.max_vqs - 1; i >= 0; i--) {
> -               mvq = &ndev->vqs[i];
> -               if (!mvq->initialized)
> -                       continue;
> -
> -               teardown_vq(ndev, mvq);
> -       }
> +       for (i = ndev->mvdev.max_vqs - 1; i >= 0; i--)
> +               teardown_vq(ndev, &ndev->vqs[i]);
>  }
>
>  static void update_cvq_info(struct mlx5_vdpa_dev *mvdev)
>
> --
> 2.45.1
>


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH vhost 06/23] vdpa/mlx5: Remove duplicate suspend code
  2024-06-17 15:07 ` [PATCH vhost 06/23] vdpa/mlx5: Remove duplicate suspend code Dragos Tatulea
@ 2024-06-19 11:02   ` Eugenio Perez Martin
  0 siblings, 0 replies; 53+ messages in thread
From: Eugenio Perez Martin @ 2024-06-19 11:02 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Saeed Mahameed,
	Leon Romanovsky, Tariq Toukan, Si-Wei Liu, virtualization,
	linux-kernel, linux-rdma, netdev, Cosmin Ratiu

On Mon, Jun 17, 2024 at 5:08 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> Use the dedicated suspend_vqs() function instead.
>
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>

Reviewed-by: Eugenio Pérez <eperezma@redhat.com>

> ---
>  drivers/vdpa/mlx5/net/mlx5_vnet.c | 7 +------
>  1 file changed, 1 insertion(+), 6 deletions(-)
>
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index 51630b1935f4..eca6f68c2eda 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -3355,17 +3355,12 @@ static int mlx5_vdpa_suspend(struct vdpa_device *vdev)
>  {
>         struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
>         struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> -       struct mlx5_vdpa_virtqueue *mvq;
> -       int i;
>
>         mlx5_vdpa_info(mvdev, "suspending device\n");
>
>         down_write(&ndev->reslock);
>         unregister_link_notifier(ndev);
> -       for (i = 0; i < ndev->cur_num_vqs; i++) {
> -               mvq = &ndev->vqs[i];
> -               suspend_vq(ndev, mvq);
> -       }
> +       suspend_vqs(ndev);
>         mlx5_vdpa_cvq_suspend(mvdev);
>         mvdev->suspended = true;
>         up_write(&ndev->reslock);
>
> --
> 2.45.1
>


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH vhost 05/23] vdpa/mlx5: Iterate over active VQs during suspend/resume
  2024-06-17 15:07 ` [PATCH vhost 05/23] vdpa/mlx5: Iterate over active VQs during suspend/resume Dragos Tatulea
@ 2024-06-19 11:04   ` Eugenio Perez Martin
  0 siblings, 0 replies; 53+ messages in thread
From: Eugenio Perez Martin @ 2024-06-19 11:04 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Saeed Mahameed,
	Leon Romanovsky, Tariq Toukan, Si-Wei Liu, virtualization,
	linux-kernel, linux-rdma, netdev, Cosmin Ratiu

On Mon, Jun 17, 2024 at 5:08 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> No need to iterate over max number of VQs.
>
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>

Acked-by: Eugenio Pérez <eperezma@redhat.com>

> ---
>  drivers/vdpa/mlx5/net/mlx5_vnet.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index 96782b34e2b2..51630b1935f4 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -1504,7 +1504,7 @@ static void suspend_vqs(struct mlx5_vdpa_net *ndev)
>  {
>         int i;
>
> -       for (i = 0; i < ndev->mvdev.max_vqs; i++)
> +       for (i = 0; i < ndev->cur_num_vqs; i++)
>                 suspend_vq(ndev, &ndev->vqs[i]);
>  }
>
> @@ -1522,7 +1522,7 @@ static void resume_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mv
>
>  static void resume_vqs(struct mlx5_vdpa_net *ndev)
>  {
> -       for (int i = 0; i < ndev->mvdev.max_vqs; i++)
> +       for (int i = 0; i < ndev->cur_num_vqs; i++)
>                 resume_vq(ndev, &ndev->vqs[i]);
>  }
>
>
> --
> 2.45.1
>


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH vhost 08/23] vdpa/mlx5: Clear and reinitialize software VQ data on reset
  2024-06-17 15:07 ` [PATCH vhost 08/23] vdpa/mlx5: Clear and reinitialize software VQ data on reset Dragos Tatulea
@ 2024-06-19 11:28   ` Eugenio Perez Martin
  2024-06-19 17:03     ` Dragos Tatulea
  0 siblings, 1 reply; 53+ messages in thread
From: Eugenio Perez Martin @ 2024-06-19 11:28 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Saeed Mahameed,
	Leon Romanovsky, Tariq Toukan, Si-Wei Liu, virtualization,
	linux-kernel, linux-rdma, netdev, Cosmin Ratiu

On Mon, Jun 17, 2024 at 5:08 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> The hardware VQ configuration is mirrored by data in struct
> mlx5_vdpa_virtqueue . Instead of clearing just a few fields at reset,
> fully clear the struct and initialize with the appropriate default
> values.
>
> As clear_vqs_ready() is used only during reset, get rid of it.
>
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>

Acked-by: Eugenio Pérez <eperezma@redhat.com>

> ---
>  drivers/vdpa/mlx5/net/mlx5_vnet.c | 16 +++-------------
>  1 file changed, 3 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index c8b5c87f001d..de013b5a2815 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -2941,18 +2941,6 @@ static void teardown_vq_resources(struct mlx5_vdpa_net *ndev)
>         ndev->setup = false;
>  }
>
> -static void clear_vqs_ready(struct mlx5_vdpa_net *ndev)
> -{
> -       int i;
> -
> -       for (i = 0; i < ndev->mvdev.max_vqs; i++) {
> -               ndev->vqs[i].ready = false;
> -               ndev->vqs[i].modified_fields = 0;
> -       }
> -
> -       ndev->mvdev.cvq.ready = false;
> -}
> -
>  static int setup_cvq_vring(struct mlx5_vdpa_dev *mvdev)
>  {
>         struct mlx5_control_vq *cvq = &mvdev->cvq;
> @@ -3035,12 +3023,14 @@ static int mlx5_vdpa_compat_reset(struct vdpa_device *vdev, u32 flags)
>         down_write(&ndev->reslock);
>         unregister_link_notifier(ndev);
>         teardown_vq_resources(ndev);
> -       clear_vqs_ready(ndev);
> +       init_mvqs(ndev);

Nitpick / suggestion if you have to send a v2. The init_mvqs function
name sounds like it can allocate stuff that needs to be released to
me. But I'm very bad at naming :). Maybe something like
"mvqs_set_defaults" or similar?

> +
>         if (flags & VDPA_RESET_F_CLEAN_MAP)
>                 mlx5_vdpa_destroy_mr_resources(&ndev->mvdev);
>         ndev->mvdev.status = 0;
>         ndev->mvdev.suspended = false;
>         ndev->cur_num_vqs = MLX5V_DEFAULT_VQ_COUNT;
> +       ndev->mvdev.cvq.ready = false;
>         ndev->mvdev.cvq.received_desc = 0;
>         ndev->mvdev.cvq.completed_desc = 0;
>         memset(ndev->event_cbs, 0, sizeof(*ndev->event_cbs) * (mvdev->max_vqs + 1));
>
> --
> 2.45.1
>


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH vhost 11/23] vdpa/mlx5: Set an initial size on the VQ
  2024-06-17 15:07 ` [PATCH vhost 11/23] vdpa/mlx5: Set an initial size on the VQ Dragos Tatulea
@ 2024-06-19 15:08   ` Eugenio Perez Martin
  2024-06-19 17:06     ` Dragos Tatulea
  0 siblings, 1 reply; 53+ messages in thread
From: Eugenio Perez Martin @ 2024-06-19 15:08 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Saeed Mahameed,
	Leon Romanovsky, Tariq Toukan, Si-Wei Liu, virtualization,
	linux-kernel, linux-rdma, netdev, Cosmin Ratiu

On Mon, Jun 17, 2024 at 5:09 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> The virtqueue size is a pre-requisite for setting up any virtqueue
> resources. For the upcoming optimization of creating virtqueues at
> device add, the virtqueue size has to be configured.
>
> Store the default queue size in struct mlx5_vdpa_net to make it easy in
> the future to pre-configure this default value via vdpa tool.
>
> The queue size check in setup_vq() will always be false. So remove it.
>
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
> ---
>  drivers/vdpa/mlx5/net/mlx5_vnet.c | 7 ++++---
>  drivers/vdpa/mlx5/net/mlx5_vnet.h | 1 +
>  2 files changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index 245b5dac98d3..1181e0ac3671 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -58,6 +58,8 @@ MODULE_LICENSE("Dual BSD/GPL");
>   */
>  #define MLX5V_DEFAULT_VQ_COUNT 2
>
> +#define MLX5V_DEFAULT_VQ_SIZE 256
> +
>  struct mlx5_vdpa_cq_buf {
>         struct mlx5_frag_buf_ctrl fbc;
>         struct mlx5_frag_buf frag_buf;
> @@ -1445,9 +1447,6 @@ static int setup_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
>         u16 idx = mvq->index;
>         int err;
>
> -       if (!mvq->num_ent)
> -               return 0;
> -
>         if (mvq->initialized)
>                 return 0;
>
> @@ -3523,6 +3522,7 @@ static void init_mvqs(struct mlx5_vdpa_net *ndev)
>                 mvq->ndev = ndev;
>                 mvq->fwqp.fw = true;
>                 mvq->fw_state = MLX5_VIRTIO_NET_Q_OBJECT_NONE;
> +               mvq->num_ent = ndev->default_queue_size;
>         }
>  }
>
> @@ -3660,6 +3660,7 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name,
>                 goto err_alloc;
>         }
>         ndev->cur_num_vqs = MLX5V_DEFAULT_VQ_COUNT;
> +       ndev->default_queue_size = MLX5V_DEFAULT_VQ_SIZE;
>
>         init_mvqs(ndev);
>         allocate_irqs(ndev);
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.h b/drivers/vdpa/mlx5/net/mlx5_vnet.h
> index 90b556a57971..2ada29767cc5 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.h
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.h
> @@ -58,6 +58,7 @@ struct mlx5_vdpa_net {
>         bool setup;
>         u32 cur_num_vqs;
>         u32 rqt_size;
> +       u16 default_queue_size;

It seems to me this is only assigned here and not used in the rest of
the series, why allocate a member here instead of using macro
directly?

>         bool nb_registered;
>         struct notifier_block nb;
>         struct vdpa_callback config_cb;
>
> --
> 2.45.1
>


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH vhost 12/23] vdpa/mlx5: Start off rqt_size with max VQPs
  2024-06-17 15:07 ` [PATCH vhost 12/23] vdpa/mlx5: Start off rqt_size with max VQPs Dragos Tatulea
@ 2024-06-19 15:33   ` Eugenio Perez Martin
  0 siblings, 0 replies; 53+ messages in thread
From: Eugenio Perez Martin @ 2024-06-19 15:33 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Saeed Mahameed,
	Leon Romanovsky, Tariq Toukan, Si-Wei Liu, virtualization,
	linux-kernel, linux-rdma, netdev, Cosmin Ratiu

On Mon, Jun 17, 2024 at 5:09 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> Currently rqt_size is initialized during device flag configuration.
> That's because it is the earliest moment when device knows if MQ
> (multi queue) is on or off.
>
> Shift this configuration earlier to device creation time. This implies
> that non-MQ devices will have a larger RQT size. But the configuration
> will still be correct.
>
> This is done in preparation for the pre-creation of hardware virtqueues
> at device add time. When that change will be added, RQT will be created
> at device creation time so it needs to be initialized to its max size.
>
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>

Acked-by: Eugenio Pérez <eperezma@redhat.com>

> ---
>  drivers/vdpa/mlx5/net/mlx5_vnet.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index 1181e0ac3671..0201c6fe61e1 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -2731,10 +2731,6 @@ static int mlx5_vdpa_set_driver_features(struct vdpa_device *vdev, u64 features)
>                 return err;
>
>         ndev->mvdev.actual_features = features & ndev->mvdev.mlx_features;
> -       if (ndev->mvdev.actual_features & BIT_ULL(VIRTIO_NET_F_MQ))
> -               ndev->rqt_size = mlx5vdpa16_to_cpu(mvdev, ndev->config.max_virtqueue_pairs);
> -       else
> -               ndev->rqt_size = 1;
>
>         /* Interested in changes of vq features only. */
>         if (get_features(old_features) != get_features(mvdev->actual_features)) {
> @@ -3719,8 +3715,12 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name,
>                 goto err_alloc;
>         }
>
> -       if (device_features & BIT_ULL(VIRTIO_NET_F_MQ))
> +       if (device_features & BIT_ULL(VIRTIO_NET_F_MQ)) {
>                 config->max_virtqueue_pairs = cpu_to_mlx5vdpa16(mvdev, max_vqs / 2);
> +               ndev->rqt_size = max_vqs / 2;
> +       } else {
> +               ndev->rqt_size = 1;
> +       }
>
>         ndev->mvdev.mlx_features = device_features;
>         mvdev->vdev.dma_dev = &mdev->pdev->dev;
>
> --
> 2.45.1
>


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH vhost 13/23] vdpa/mlx5: Set mkey modified flags on all VQs
  2024-06-17 15:07 ` [PATCH vhost 13/23] vdpa/mlx5: Set mkey modified flags on all VQs Dragos Tatulea
@ 2024-06-19 15:33   ` Eugenio Perez Martin
  0 siblings, 0 replies; 53+ messages in thread
From: Eugenio Perez Martin @ 2024-06-19 15:33 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Saeed Mahameed,
	Leon Romanovsky, Tariq Toukan, Si-Wei Liu, virtualization,
	linux-kernel, linux-rdma, netdev, Cosmin Ratiu

On Mon, Jun 17, 2024 at 5:09 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> Otherwise, when virtqueues are moved from INIT to READY the latest mkey
> will not be set appropriately.
>
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>

Acked-by: Eugenio Pérez <eperezma@redhat.com>

> ---
>  drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index 0201c6fe61e1..0abe01fd20e9 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -2868,7 +2868,7 @@ static int mlx5_vdpa_change_map(struct mlx5_vdpa_dev *mvdev,
>
>         mlx5_vdpa_update_mr(mvdev, new_mr, asid);
>
> -       for (int i = 0; i < ndev->cur_num_vqs; i++)
> +       for (int i = 0; i < mvdev->max_vqs; i++)
>                 ndev->vqs[i].modified_fields |= MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_MKEY |
>                                                 MLX5_VIRTQ_MODIFY_MASK_DESC_GROUP_MKEY;
>
>
> --
> 2.45.1
>


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH vhost 16/23] vdpa/mlx5: Add error code for suspend/resume VQ
  2024-06-17 15:07 ` [PATCH vhost 16/23] vdpa/mlx5: Add error code for suspend/resume VQ Dragos Tatulea
@ 2024-06-19 15:41   ` Eugenio Perez Martin
  0 siblings, 0 replies; 53+ messages in thread
From: Eugenio Perez Martin @ 2024-06-19 15:41 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Saeed Mahameed,
	Leon Romanovsky, Tariq Toukan, Si-Wei Liu, virtualization,
	linux-kernel, linux-rdma, netdev, Cosmin Ratiu

On Mon, Jun 17, 2024 at 5:09 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> Instead of blindly calling suspend/resume_vqs(), make then return error
> codes.
>
> To keep compatibility, keep suspending or resuming VQs on error and
> return the last error code. The assumption here is that the error code
> would be the same.
>
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>

Acked-by: Eugenio Pérez <eperezma@redhat.com>

> ---
>  drivers/vdpa/mlx5/net/mlx5_vnet.c | 77 +++++++++++++++++++++++++++------------
>  1 file changed, 54 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index e4d68d2d0bb4..e3a82c43b44e 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -1526,71 +1526,102 @@ static int setup_vq(struct mlx5_vdpa_net *ndev,
>         return err;
>  }
>
> -static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
> +static int suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
>  {
>         struct mlx5_virtq_attr attr;
> +       int err;
>
>         if (!mvq->initialized)
> -               return;
> +               return 0;
>
>         if (mvq->fw_state != MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY)
> -               return;
> +               return 0;
>
> -       if (modify_virtqueue_state(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND))
> -               mlx5_vdpa_warn(&ndev->mvdev, "modify to suspend failed\n");
> +       err = modify_virtqueue_state(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND);
> +       if (err) {
> +               mlx5_vdpa_warn(&ndev->mvdev, "modify to suspend failed, err: %d\n", err);
> +               return err;
> +       }
>
> -       if (query_virtqueue(ndev, mvq, &attr)) {
> -               mlx5_vdpa_warn(&ndev->mvdev, "failed to query virtqueue\n");
> -               return;
> +       err = query_virtqueue(ndev, mvq, &attr);
> +       if (err) {
> +               mlx5_vdpa_warn(&ndev->mvdev, "failed to query virtqueue, err: %d\n", err);
> +               return err;
>         }
> +
>         mvq->avail_idx = attr.available_index;
>         mvq->used_idx = attr.used_index;
> +
> +       return 0;
>  }
>
> -static void suspend_vqs(struct mlx5_vdpa_net *ndev)
> +static int suspend_vqs(struct mlx5_vdpa_net *ndev)
>  {
> +       int err = 0;
>         int i;
>
> -       for (i = 0; i < ndev->cur_num_vqs; i++)
> -               suspend_vq(ndev, &ndev->vqs[i]);
> +       for (i = 0; i < ndev->cur_num_vqs; i++) {
> +               int local_err = suspend_vq(ndev, &ndev->vqs[i]);
> +
> +               err = local_err ? local_err : err;
> +       }
> +
> +       return err;
>  }
>
> -static void resume_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
> +static int resume_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
>  {
> +       int err;
> +
>         if (!mvq->initialized)
> -               return;
> +               return 0;
>
>         switch (mvq->fw_state) {
>         case MLX5_VIRTIO_NET_Q_OBJECT_STATE_INIT:
>                 /* Due to a FW quirk we need to modify the VQ fields first then change state.
>                  * This should be fixed soon. After that, a single command can be used.
>                  */
> -               if (modify_virtqueue(ndev, mvq, 0))
> +               err = modify_virtqueue(ndev, mvq, 0);
> +               if (err) {
>                         mlx5_vdpa_warn(&ndev->mvdev,
> -                               "modify vq properties failed for vq %u\n", mvq->index);
> +                               "modify vq properties failed for vq %u, err: %d\n",
> +                               mvq->index, err);
> +                       return err;
> +               }
>                 break;
>         case MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND:
>                 if (!is_resumable(ndev)) {
>                         mlx5_vdpa_warn(&ndev->mvdev, "vq %d is not resumable\n", mvq->index);
> -                       return;
> +                       return -EINVAL;
>                 }
>                 break;
>         case MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY:
> -               return;
> +               return 0;
>         default:
>                 mlx5_vdpa_warn(&ndev->mvdev, "resume vq %u called from bad state %d\n",
>                                mvq->index, mvq->fw_state);
> -               return;
> +               return -EINVAL;
>         }
>
> -       if (modify_virtqueue_state(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY))
> -               mlx5_vdpa_warn(&ndev->mvdev, "modify to resume failed for vq %u\n", mvq->index);
> +       err = modify_virtqueue_state(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY);
> +       if (err)
> +               mlx5_vdpa_warn(&ndev->mvdev, "modify to resume failed for vq %u, err: %d\n",
> +                              mvq->index, err);
> +
> +       return err;
>  }
>
> -static void resume_vqs(struct mlx5_vdpa_net *ndev)
> +static int resume_vqs(struct mlx5_vdpa_net *ndev)
>  {
> -       for (int i = 0; i < ndev->cur_num_vqs; i++)
> -               resume_vq(ndev, &ndev->vqs[i]);
> +       int err = 0;
> +
> +       for (int i = 0; i < ndev->cur_num_vqs; i++) {
> +               int local_err = resume_vq(ndev, &ndev->vqs[i]);
> +
> +               err = local_err ? local_err : err;
> +       }
> +
> +       return err;
>  }
>
>  static void teardown_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
>
> --
> 2.45.1
>


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH vhost 17/23] vdpa/mlx5: Consolidate all VQ modify to Ready to use resume_vq()
  2024-06-17 15:07 ` [PATCH vhost 17/23] vdpa/mlx5: Consolidate all VQ modify to Ready to use resume_vq() Dragos Tatulea
@ 2024-06-19 15:43   ` Eugenio Perez Martin
  0 siblings, 0 replies; 53+ messages in thread
From: Eugenio Perez Martin @ 2024-06-19 15:43 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Saeed Mahameed,
	Leon Romanovsky, Tariq Toukan, Si-Wei Liu, virtualization,
	linux-kernel, linux-rdma, netdev, Cosmin Ratiu

On Mon, Jun 17, 2024 at 5:09 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> There are a few more places modifying the VQ to Ready directly. Let's
> consolidate them into resume_vq().
>
> The redundant warnings for resume_vq() errors can also be dropped.
>
> There is one special case that needs to be handled for virtio-vdpa:
> the initialized flag must be set to true earlier in setup_vq() so that
> resume_vq() doesn't return early.
>
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>

Acked-by: Eugenio Pérez <eperezma@redhat.com>

> ---
>  drivers/vdpa/mlx5/net/mlx5_vnet.c | 18 ++++++------------
>  1 file changed, 6 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index e3a82c43b44e..f5d5b25cdb01 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -160,6 +160,7 @@ static void free_fixed_resources(struct mlx5_vdpa_net *ndev);
>  static void init_mvqs(struct mlx5_vdpa_net *ndev);
>  static int setup_vq_resources(struct mlx5_vdpa_net *ndev, bool filled);
>  static void teardown_vq_resources(struct mlx5_vdpa_net *ndev);
> +static int resume_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq);
>
>  static bool mlx5_vdpa_debug;
>
> @@ -1500,16 +1501,14 @@ static int setup_vq(struct mlx5_vdpa_net *ndev,
>         if (err)
>                 goto err_vq;
>
> +       mvq->initialized = true;
> +
>         if (mvq->ready) {
> -               err = modify_virtqueue_state(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY);
> -               if (err) {
> -                       mlx5_vdpa_warn(&ndev->mvdev, "failed to modify to ready vq idx %d(%d)\n",
> -                                      idx, err);
> +               err = resume_vq(ndev, mvq);
> +               if (err)
>                         goto err_modify;
> -               }
>         }
>
> -       mvq->initialized = true;
>         return 0;
>
>  err_modify:
> @@ -2422,7 +2421,6 @@ static void mlx5_vdpa_set_vq_ready(struct vdpa_device *vdev, u16 idx, bool ready
>         struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
>         struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
>         struct mlx5_vdpa_virtqueue *mvq;
> -       int err;
>
>         if (!mvdev->actual_features)
>                 return;
> @@ -2439,14 +2437,10 @@ static void mlx5_vdpa_set_vq_ready(struct vdpa_device *vdev, u16 idx, bool ready
>         if (!ready) {
>                 suspend_vq(ndev, mvq);
>         } else {
> -               err = modify_virtqueue_state(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY);
> -               if (err) {
> -                       mlx5_vdpa_warn(mvdev, "modify VQ %d to ready failed (%d)\n", idx, err);
> +               if (resume_vq(ndev, mvq))
>                         ready = false;
> -               }
>         }
>
> -
>         mvq->ready = ready;
>  }
>
>
> --
> 2.45.1
>


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH vhost 19/23] vdpa/mlx5: Use suspend/resume during VQP change
  2024-06-17 15:07 ` [PATCH vhost 19/23] vdpa/mlx5: Use suspend/resume during VQP change Dragos Tatulea
@ 2024-06-19 15:46   ` Eugenio Perez Martin
  0 siblings, 0 replies; 53+ messages in thread
From: Eugenio Perez Martin @ 2024-06-19 15:46 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Saeed Mahameed,
	Leon Romanovsky, Tariq Toukan, Si-Wei Liu, virtualization,
	linux-kernel, linux-rdma, netdev, Cosmin Ratiu

On Mon, Jun 17, 2024 at 5:09 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> Resume a VQ if it is already created when the number of VQ pairs
> increases. This is done in preparation for VQ pre-creation which is
> coming in a later patch. It is necessary because calling setup_vq() on
> an already created VQ will return early and will not enable the queue.
>
> For symmetry, suspend a VQ instead of tearing it down when the number of
> VQ pairs decreases. But only if the resume operation is supported.
>
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>

Acked-by: Eugenio Pérez <eperezma@redhat.com>

> ---
>  drivers/vdpa/mlx5/net/mlx5_vnet.c | 14 +++++++++++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index 0e1c1b7ff297..249b5afbe34a 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -2130,14 +2130,22 @@ static int change_num_qps(struct mlx5_vdpa_dev *mvdev, int newqps)
>                 if (err)
>                         return err;
>
> -               for (i = ndev->cur_num_vqs - 1; i >= 2 * newqps; i--)
> -                       teardown_vq(ndev, &ndev->vqs[i]);
> +               for (i = ndev->cur_num_vqs - 1; i >= 2 * newqps; i--) {
> +                       struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[i];
> +
> +                       if (is_resumable(ndev))
> +                               suspend_vq(ndev, mvq);
> +                       else
> +                               teardown_vq(ndev, mvq);
> +               }
>
>                 ndev->cur_num_vqs = 2 * newqps;
>         } else {
>                 ndev->cur_num_vqs = 2 * newqps;
>                 for (i = cur_qps * 2; i < 2 * newqps; i++) {
> -                       err = setup_vq(ndev, &ndev->vqs[i], true);
> +                       struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[i];
> +
> +                       err = mvq->initialized ? resume_vq(ndev, mvq) : setup_vq(ndev, mvq, true);
>                         if (err)
>                                 goto clean_added;
>                 }
>
> --
> 2.45.1
>


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH vhost 20/23] vdpa/mlx5: Pre-create hardware VQs at vdpa .dev_add time
  2024-06-17 15:07 ` [PATCH vhost 20/23] vdpa/mlx5: Pre-create hardware VQs at vdpa .dev_add time Dragos Tatulea
@ 2024-06-19 15:54   ` Eugenio Perez Martin
  2024-06-26  9:27     ` Dragos Tatulea
  2024-07-08 16:22   ` Zhu Yanjun
  1 sibling, 1 reply; 53+ messages in thread
From: Eugenio Perez Martin @ 2024-06-19 15:54 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Saeed Mahameed,
	Leon Romanovsky, Tariq Toukan, Si-Wei Liu, virtualization,
	linux-kernel, linux-rdma, netdev, Cosmin Ratiu

On Mon, Jun 17, 2024 at 5:09 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> Currently, hardware VQs are created right when the vdpa device gets into
> DRIVER_OK state. That is easier because most of the VQ state is known by
> then.
>
> This patch switches to creating all VQs and their associated resources
> at device creation time. The motivation is to reduce the vdpa device
> live migration downtime by moving the expensive operation of creating
> all the hardware VQs and their associated resources out of downtime on
> the destination VM.
>
> The VQs are now created in a blank state. The VQ configuration will
> happen later, on DRIVER_OK. Then the configuration will be applied when
> the VQs are moved to the Ready state.
>
> When .set_vq_ready() is called on a VQ before DRIVER_OK, special care is
> needed: now that the VQ is already created a resume_vq() will be
> triggered too early when no mr has been configured yet. Skip calling
> resume_vq() in this case, let it be handled during DRIVER_OK.
>
> For virtio-vdpa, the device configuration is done earlier during
> .vdpa_dev_add() by vdpa_register_device(). Avoid calling
> setup_vq_resources() a second time in that case.
>

I guess this happens if virtio_vdpa is already loaded, but I cannot
see how this is different here. Apart from the IOTLB, what else does
it change from the mlx5_vdpa POV?

> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
> ---
>  drivers/vdpa/mlx5/net/mlx5_vnet.c | 37 ++++++++++++++++++++++++++++++++-----
>  1 file changed, 32 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index 249b5afbe34a..b2836fd3d1dd 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -2444,7 +2444,7 @@ static void mlx5_vdpa_set_vq_ready(struct vdpa_device *vdev, u16 idx, bool ready
>         mvq = &ndev->vqs[idx];
>         if (!ready) {
>                 suspend_vq(ndev, mvq);
> -       } else {
> +       } else if (mvdev->status & VIRTIO_CONFIG_S_DRIVER_OK) {
>                 if (resume_vq(ndev, mvq))
>                         ready = false;
>         }
> @@ -3078,10 +3078,18 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status)
>                                 goto err_setup;
>                         }
>                         register_link_notifier(ndev);
> -                       err = setup_vq_resources(ndev, true);
> -                       if (err) {
> -                               mlx5_vdpa_warn(mvdev, "failed to setup driver\n");
> -                               goto err_driver;
> +                       if (ndev->setup) {
> +                               err = resume_vqs(ndev);
> +                               if (err) {
> +                                       mlx5_vdpa_warn(mvdev, "failed to resume VQs\n");
> +                                       goto err_driver;
> +                               }
> +                       } else {
> +                               err = setup_vq_resources(ndev, true);
> +                               if (err) {
> +                                       mlx5_vdpa_warn(mvdev, "failed to setup driver\n");
> +                                       goto err_driver;
> +                               }
>                         }
>                 } else {
>                         mlx5_vdpa_warn(mvdev, "did not expect DRIVER_OK to be cleared\n");
> @@ -3142,6 +3150,7 @@ static int mlx5_vdpa_compat_reset(struct vdpa_device *vdev, u32 flags)
>                 if (mlx5_vdpa_create_dma_mr(mvdev))
>                         mlx5_vdpa_warn(mvdev, "create MR failed\n");
>         }
> +       setup_vq_resources(ndev, false);
>         up_write(&ndev->reslock);
>
>         return 0;
> @@ -3836,8 +3845,21 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name,
>                 goto err_reg;
>
>         mgtdev->ndev = ndev;
> +
> +       /* For virtio-vdpa, the device was set up during device register. */
> +       if (ndev->setup)
> +               return 0;
> +
> +       down_write(&ndev->reslock);
> +       err = setup_vq_resources(ndev, false);
> +       up_write(&ndev->reslock);
> +       if (err)
> +               goto err_setup_vq_res;
> +
>         return 0;
>
> +err_setup_vq_res:
> +       _vdpa_unregister_device(&mvdev->vdev);
>  err_reg:
>         destroy_workqueue(mvdev->wq);
>  err_res2:
> @@ -3863,6 +3885,11 @@ static void mlx5_vdpa_dev_del(struct vdpa_mgmt_dev *v_mdev, struct vdpa_device *
>
>         unregister_link_notifier(ndev);
>         _vdpa_unregister_device(dev);
> +
> +       down_write(&ndev->reslock);
> +       teardown_vq_resources(ndev);
> +       up_write(&ndev->reslock);
> +
>         wq = mvdev->wq;
>         mvdev->wq = NULL;
>         destroy_workqueue(wq);
>
> --
> 2.45.1
>


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH vhost 21/23] vdpa/mlx5: Re-create HW VQs under certain conditions
  2024-06-17 15:07 ` [PATCH vhost 21/23] vdpa/mlx5: Re-create HW VQs under certain conditions Dragos Tatulea
@ 2024-06-19 16:04   ` Eugenio Perez Martin
  0 siblings, 0 replies; 53+ messages in thread
From: Eugenio Perez Martin @ 2024-06-19 16:04 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Saeed Mahameed,
	Leon Romanovsky, Tariq Toukan, Si-Wei Liu, virtualization,
	linux-kernel, linux-rdma, netdev, Cosmin Ratiu

On Mon, Jun 17, 2024 at 5:09 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> There are a few conditions under which the hardware VQs need a full
> teardown and setup:
>
> - VQ size changed to something else than default value. Hardware VQ size
>   modification is not supported.
>
> - User turns off certain device features: mergeable buffers, checksum
>   virtio 1.0 compliance. In these cases, the TIR and RQT need to be
>   re-created.
>
> Add a needs_teardown configuration variable and set it when detecting
> the above scenarios. On next DRIVER_OK, the resources will be torn down
> first.
>
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>

Acked-by: Eugenio Pérez <eperezma@redhat.com>

> ---
>  drivers/vdpa/mlx5/net/mlx5_vnet.c | 15 +++++++++++++++
>  drivers/vdpa/mlx5/net/mlx5_vnet.h |  1 +
>  2 files changed, 16 insertions(+)
>
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index b2836fd3d1dd..d80d6b47da61 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -2390,6 +2390,7 @@ static void mlx5_vdpa_set_vq_num(struct vdpa_device *vdev, u16 idx, u32 num)
>          }
>
>         mvq = &ndev->vqs[idx];
> +       ndev->needs_teardown = num != mvq->num_ent;
>         mvq->num_ent = num;
>  }
>
> @@ -2800,6 +2801,7 @@ static int mlx5_vdpa_set_driver_features(struct vdpa_device *vdev, u64 features)
>         struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
>         struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
>         u64 old_features = mvdev->actual_features;
> +       u64 diff_features;
>         int err;
>
>         print_features(mvdev, features, true);
> @@ -2822,6 +2824,14 @@ static int mlx5_vdpa_set_driver_features(struct vdpa_device *vdev, u64 features)
>                 }
>         }
>
> +       /* When below features diverge from initial device features, VQs need a full teardown. */
> +#define NEEDS_TEARDOWN_MASK (BIT_ULL(VIRTIO_NET_F_MRG_RXBUF) | \
> +                            BIT_ULL(VIRTIO_NET_F_CSUM) | \
> +                            BIT_ULL(VIRTIO_F_VERSION_1))
> +
> +       diff_features = mvdev->mlx_features ^ mvdev->actual_features;
> +       ndev->needs_teardown = !!(diff_features & NEEDS_TEARDOWN_MASK);
> +
>         update_cvq_info(mvdev);
>         return err;
>  }
> @@ -3038,6 +3048,7 @@ static void teardown_vq_resources(struct mlx5_vdpa_net *ndev)
>         destroy_rqt(ndev);
>         teardown_virtqueues(ndev);
>         ndev->setup = false;
> +       ndev->needs_teardown = false;
>  }
>
>  static int setup_cvq_vring(struct mlx5_vdpa_dev *mvdev)
> @@ -3078,6 +3089,10 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status)
>                                 goto err_setup;
>                         }
>                         register_link_notifier(ndev);
> +
> +                       if (ndev->needs_teardown)
> +                               teardown_vq_resources(ndev);
> +
>                         if (ndev->setup) {
>                                 err = resume_vqs(ndev);
>                                 if (err) {
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.h b/drivers/vdpa/mlx5/net/mlx5_vnet.h
> index 2ada29767cc5..da7318f82d2a 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.h
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.h
> @@ -56,6 +56,7 @@ struct mlx5_vdpa_net {
>         struct dentry *rx_dent;
>         struct dentry *rx_table_dent;
>         bool setup;
> +       bool needs_teardown;
>         u32 cur_num_vqs;
>         u32 rqt_size;
>         u16 default_queue_size;
>
> --
> 2.45.1
>


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH vhost 22/23] vdpa/mlx5: Don't reset VQs more than necessary
  2024-06-17 15:07 ` [PATCH vhost 22/23] vdpa/mlx5: Don't reset VQs more than necessary Dragos Tatulea
@ 2024-06-19 16:14   ` Eugenio Perez Martin
  0 siblings, 0 replies; 53+ messages in thread
From: Eugenio Perez Martin @ 2024-06-19 16:14 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Saeed Mahameed,
	Leon Romanovsky, Tariq Toukan, Si-Wei Liu, virtualization,
	linux-kernel, linux-rdma, netdev, Cosmin Ratiu

On Mon, Jun 17, 2024 at 5:09 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> The vdpa device can be reset many times in sequence without any
> significant state changes in between. Previously this was not a problem:
> VQs were torn down only on first reset. But after VQ pre-creation was
> introduced, each reset will delete and re-create the hardware VQs and
> their associated resources.
>
> To solve this problem, avoid resetting hardware VQs if the VQs are still
> in a blank state.
>
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>

Acked-by: Eugenio Pérez <eperezma@redhat.com>

> ---
>  drivers/vdpa/mlx5/net/mlx5_vnet.c | 30 +++++++++++++++++++++++++++---
>  1 file changed, 27 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index d80d6b47da61..1a5ee0d2b47f 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -3134,18 +3134,41 @@ static void init_group_to_asid_map(struct mlx5_vdpa_dev *mvdev)
>                 mvdev->group2asid[i] = 0;
>  }
>
> +static bool needs_vqs_reset(const struct mlx5_vdpa_dev *mvdev)
> +{
> +       struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> +       struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[0];
> +
> +       if (mvdev->status & VIRTIO_CONFIG_S_DRIVER_OK)
> +               return true;
> +
> +       if (mvq->fw_state != MLX5_VIRTIO_NET_Q_OBJECT_STATE_INIT)
> +               return true;
> +
> +       return mvq->modified_fields & (
> +               MLX5_VIRTQ_MODIFY_MASK_STATE |
> +               MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_ADDRS |
> +               MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_AVAIL_IDX |
> +               MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_USED_IDX
> +       );
> +}
> +
>  static int mlx5_vdpa_compat_reset(struct vdpa_device *vdev, u32 flags)
>  {
>         struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
>         struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> +       bool vq_reset;
>
>         print_status(mvdev, 0, true);
>         mlx5_vdpa_info(mvdev, "performing device reset\n");
>
>         down_write(&ndev->reslock);
>         unregister_link_notifier(ndev);
> -       teardown_vq_resources(ndev);
> -       init_mvqs(ndev);
> +       vq_reset = needs_vqs_reset(mvdev);
> +       if (vq_reset) {
> +               teardown_vq_resources(ndev);
> +               init_mvqs(ndev);
> +       }
>
>         if (flags & VDPA_RESET_F_CLEAN_MAP)
>                 mlx5_vdpa_destroy_mr_resources(&ndev->mvdev);
> @@ -3165,7 +3188,8 @@ static int mlx5_vdpa_compat_reset(struct vdpa_device *vdev, u32 flags)
>                 if (mlx5_vdpa_create_dma_mr(mvdev))
>                         mlx5_vdpa_warn(mvdev, "create MR failed\n");
>         }
> -       setup_vq_resources(ndev, false);
> +       if (vq_reset)
> +               setup_vq_resources(ndev, false);
>         up_write(&ndev->reslock);
>
>         return 0;
>
> --
> 2.45.1
>


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH vhost 23/23] vdpa/mlx5: Don't enable non-active VQs in .set_vq_ready()
  2024-06-17 15:07 ` [PATCH vhost 23/23] vdpa/mlx5: Don't enable non-active VQs in .set_vq_ready() Dragos Tatulea
@ 2024-06-19 16:39   ` Eugenio Perez Martin
  0 siblings, 0 replies; 53+ messages in thread
From: Eugenio Perez Martin @ 2024-06-19 16:39 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Saeed Mahameed,
	Leon Romanovsky, Tariq Toukan, Si-Wei Liu, virtualization,
	linux-kernel, linux-rdma, netdev, Cosmin Ratiu

On Mon, Jun 17, 2024 at 5:09 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> VQ indices in the range [cur_num_qps, max_vqs) represent queues that
> have not yet been activated. .set_vq_ready should not activate these
> VQs.
>
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>

Acked-by: Eugenio Pérez <eperezma@redhat.com>

> ---
>  drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index 1a5ee0d2b47f..a969a7f105a6 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -1575,6 +1575,9 @@ static int resume_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq
>         if (!mvq->initialized)
>                 return 0;
>
> +       if (mvq->index >= ndev->cur_num_vqs)
> +               return 0;
> +
>         switch (mvq->fw_state) {
>         case MLX5_VIRTIO_NET_Q_OBJECT_STATE_INIT:
>                 /* Due to a FW quirk we need to modify the VQ fields first then change state.
>
> --
> 2.45.1
>


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH vhost 08/23] vdpa/mlx5: Clear and reinitialize software VQ data on reset
  2024-06-19 11:28   ` Eugenio Perez Martin
@ 2024-06-19 17:03     ` Dragos Tatulea
  0 siblings, 0 replies; 53+ messages in thread
From: Dragos Tatulea @ 2024-06-19 17:03 UTC (permalink / raw)
  To: eperezma@redhat.com
  Cc: linux-rdma@vger.kernel.org, xuanzhuo@linux.alibaba.com,
	virtualization@lists.linux.dev, Tariq Toukan,
	linux-kernel@vger.kernel.org, Cosmin Ratiu, si-wei.liu@oracle.com,
	jasowang@redhat.com, mst@redhat.com, Saeed Mahameed,
	leon@kernel.org, netdev@vger.kernel.org

On Wed, 2024-06-19 at 13:28 +0200, Eugenio Perez Martin wrote:
> On Mon, Jun 17, 2024 at 5:08 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> > 
> > The hardware VQ configuration is mirrored by data in struct
> > mlx5_vdpa_virtqueue . Instead of clearing just a few fields at reset,
> > fully clear the struct and initialize with the appropriate default
> > values.
> > 
> > As clear_vqs_ready() is used only during reset, get rid of it.
> > 
> > Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> > Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
> 
> Acked-by: Eugenio Pérez <eperezma@redhat.com>
> 
> > ---
> >  drivers/vdpa/mlx5/net/mlx5_vnet.c | 16 +++-------------
> >  1 file changed, 3 insertions(+), 13 deletions(-)
> > 
> > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > index c8b5c87f001d..de013b5a2815 100644
> > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > @@ -2941,18 +2941,6 @@ static void teardown_vq_resources(struct mlx5_vdpa_net *ndev)
> >         ndev->setup = false;
> >  }
> > 
> > -static void clear_vqs_ready(struct mlx5_vdpa_net *ndev)
> > -{
> > -       int i;
> > -
> > -       for (i = 0; i < ndev->mvdev.max_vqs; i++) {
> > -               ndev->vqs[i].ready = false;
> > -               ndev->vqs[i].modified_fields = 0;
> > -       }
> > -
> > -       ndev->mvdev.cvq.ready = false;
> > -}
> > -
> >  static int setup_cvq_vring(struct mlx5_vdpa_dev *mvdev)
> >  {
> >         struct mlx5_control_vq *cvq = &mvdev->cvq;
> > @@ -3035,12 +3023,14 @@ static int mlx5_vdpa_compat_reset(struct vdpa_device *vdev, u32 flags)
> >         down_write(&ndev->reslock);
> >         unregister_link_notifier(ndev);
> >         teardown_vq_resources(ndev);
> > -       clear_vqs_ready(ndev);
> > +       init_mvqs(ndev);
> 
> Nitpick / suggestion if you have to send a v2. The init_mvqs function
> name sounds like it can allocate stuff that needs to be released to
> me. But I'm very bad at naming :). Maybe something like
> "mvqs_set_defaults" or similar?
Makes sense. I think I will call it mvqs_reset / reset_mvqs to keep things
consistent.

Thanks,
Dragos
> 
> > +
> >         if (flags & VDPA_RESET_F_CLEAN_MAP)
> >                 mlx5_vdpa_destroy_mr_resources(&ndev->mvdev);
> >         ndev->mvdev.status = 0;
> >         ndev->mvdev.suspended = false;
> >         ndev->cur_num_vqs = MLX5V_DEFAULT_VQ_COUNT;
> > +       ndev->mvdev.cvq.ready = false;
> >         ndev->mvdev.cvq.received_desc = 0;
> >         ndev->mvdev.cvq.completed_desc = 0;
> >         memset(ndev->event_cbs, 0, sizeof(*ndev->event_cbs) * (mvdev->max_vqs + 1));
> > 
> > --
> > 2.45.1
> > 
> 


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH vhost 11/23] vdpa/mlx5: Set an initial size on the VQ
  2024-06-19 15:08   ` Eugenio Perez Martin
@ 2024-06-19 17:06     ` Dragos Tatulea
  0 siblings, 0 replies; 53+ messages in thread
From: Dragos Tatulea @ 2024-06-19 17:06 UTC (permalink / raw)
  To: eperezma@redhat.com
  Cc: linux-rdma@vger.kernel.org, xuanzhuo@linux.alibaba.com,
	virtualization@lists.linux.dev, Tariq Toukan,
	linux-kernel@vger.kernel.org, Cosmin Ratiu, si-wei.liu@oracle.com,
	jasowang@redhat.com, mst@redhat.com, Saeed Mahameed,
	leon@kernel.org, netdev@vger.kernel.org

On Wed, 2024-06-19 at 17:08 +0200, Eugenio Perez Martin wrote:
> On Mon, Jun 17, 2024 at 5:09 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> > 
> > The virtqueue size is a pre-requisite for setting up any virtqueue
> > resources. For the upcoming optimization of creating virtqueues at
> > device add, the virtqueue size has to be configured.
> > 
> > Store the default queue size in struct mlx5_vdpa_net to make it easy in
> > the future to pre-configure this default value via vdpa tool.
> > 
> > The queue size check in setup_vq() will always be false. So remove it.
> > 
> > Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> > Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
> > ---
> >  drivers/vdpa/mlx5/net/mlx5_vnet.c | 7 ++++---
> >  drivers/vdpa/mlx5/net/mlx5_vnet.h | 1 +
> >  2 files changed, 5 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > index 245b5dac98d3..1181e0ac3671 100644
> > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > @@ -58,6 +58,8 @@ MODULE_LICENSE("Dual BSD/GPL");
> >   */
> >  #define MLX5V_DEFAULT_VQ_COUNT 2
> > 
> > +#define MLX5V_DEFAULT_VQ_SIZE 256
> > +
> >  struct mlx5_vdpa_cq_buf {
> >         struct mlx5_frag_buf_ctrl fbc;
> >         struct mlx5_frag_buf frag_buf;
> > @@ -1445,9 +1447,6 @@ static int setup_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
> >         u16 idx = mvq->index;
> >         int err;
> > 
> > -       if (!mvq->num_ent)
> > -               return 0;
> > -
> >         if (mvq->initialized)
> >                 return 0;
> > 
> > @@ -3523,6 +3522,7 @@ static void init_mvqs(struct mlx5_vdpa_net *ndev)
> >                 mvq->ndev = ndev;
> >                 mvq->fwqp.fw = true;
> >                 mvq->fw_state = MLX5_VIRTIO_NET_Q_OBJECT_NONE;
> > +               mvq->num_ent = ndev->default_queue_size;
> >         }
> >  }
> > 
> > @@ -3660,6 +3660,7 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name,
> >                 goto err_alloc;
> >         }
> >         ndev->cur_num_vqs = MLX5V_DEFAULT_VQ_COUNT;
> > +       ndev->default_queue_size = MLX5V_DEFAULT_VQ_SIZE;
> > 
> >         init_mvqs(ndev);
> >         allocate_irqs(ndev);
> > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.h b/drivers/vdpa/mlx5/net/mlx5_vnet.h
> > index 90b556a57971..2ada29767cc5 100644
> > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.h
> > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.h
> > @@ -58,6 +58,7 @@ struct mlx5_vdpa_net {
> >         bool setup;
> >         u32 cur_num_vqs;
> >         u32 rqt_size;
> > +       u16 default_queue_size;
> 
> It seems to me this is only assigned here and not used in the rest of
> the series, why allocate a member here instead of using macro
> directly?
It is used in init_mvqs(). I wanted to make it easy in case we add the default
queue size to the mvq tool. I'm also ok with switching to constants for now.

Thank you for your reviews!

Thanks,
Dragos
> 
> >         bool nb_registered;
> >         struct notifier_block nb;
> >         struct vdpa_callback config_cb;
> > 
> > --
> > 2.45.1
> > 
> 


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH vhost 18/23] vdpa/mlx5: Forward error in suspend/resume device
  2024-06-17 15:07 ` [PATCH vhost 18/23] vdpa/mlx5: Forward error in suspend/resume device Dragos Tatulea
@ 2024-06-23 11:19   ` Zhu Yanjun
  2024-06-26  9:28     ` Dragos Tatulea
  0 siblings, 1 reply; 53+ messages in thread
From: Zhu Yanjun @ 2024-06-23 11:19 UTC (permalink / raw)
  To: Dragos Tatulea, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Saeed Mahameed, Leon Romanovsky, Tariq Toukan,
	Si-Wei Liu
  Cc: virtualization, linux-kernel, linux-rdma, netdev, Cosmin Ratiu

在 2024/6/17 23:07, Dragos Tatulea 写道:
> Start using the suspend/resume_vq() error return codes previously added.
> 
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
> ---
>   drivers/vdpa/mlx5/net/mlx5_vnet.c | 12 ++++++++----
>   1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index f5d5b25cdb01..0e1c1b7ff297 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -3436,22 +3436,25 @@ static int mlx5_vdpa_suspend(struct vdpa_device *vdev)
>   {
>   	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
>   	struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> +	int err;

Reverse Christmas Tree?

Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>

Zhu Yanjun
>   
>   	mlx5_vdpa_info(mvdev, "suspending device\n");
>   
>   	down_write(&ndev->reslock);
>   	unregister_link_notifier(ndev);
> -	suspend_vqs(ndev);
> +	err = suspend_vqs(ndev);
>   	mlx5_vdpa_cvq_suspend(mvdev);
>   	mvdev->suspended = true;
>   	up_write(&ndev->reslock);
> -	return 0;
> +
> +	return err;
>   }
>   
>   static int mlx5_vdpa_resume(struct vdpa_device *vdev)
>   {
>   	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
>   	struct mlx5_vdpa_net *ndev;
> +	int err;
>   
>   	ndev = to_mlx5_vdpa_ndev(mvdev);
>   
> @@ -3459,10 +3462,11 @@ static int mlx5_vdpa_resume(struct vdpa_device *vdev)
>   
>   	down_write(&ndev->reslock);
>   	mvdev->suspended = false;
> -	resume_vqs(ndev);
> +	err = resume_vqs(ndev);
>   	register_link_notifier(ndev);
>   	up_write(&ndev->reslock);
> -	return 0;
> +
> +	return err;
>   }
>   
>   static int mlx5_set_group_asid(struct vdpa_device *vdev, u32 group,
> 


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH vhost 20/23] vdpa/mlx5: Pre-create hardware VQs at vdpa .dev_add time
  2024-06-19 15:54   ` Eugenio Perez Martin
@ 2024-06-26  9:27     ` Dragos Tatulea
  2024-07-03 16:01       ` Eugenio Perez Martin
  0 siblings, 1 reply; 53+ messages in thread
From: Dragos Tatulea @ 2024-06-26  9:27 UTC (permalink / raw)
  To: eperezma@redhat.com
  Cc: linux-rdma@vger.kernel.org, xuanzhuo@linux.alibaba.com,
	virtualization@lists.linux.dev, Tariq Toukan,
	linux-kernel@vger.kernel.org, Cosmin Ratiu, si-wei.liu@oracle.com,
	jasowang@redhat.com, mst@redhat.com, Saeed Mahameed,
	leon@kernel.org, netdev@vger.kernel.org

On Wed, 2024-06-19 at 17:54 +0200, Eugenio Perez Martin wrote:
> On Mon, Jun 17, 2024 at 5:09 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> > 
> > Currently, hardware VQs are created right when the vdpa device gets into
> > DRIVER_OK state. That is easier because most of the VQ state is known by
> > then.
> > 
> > This patch switches to creating all VQs and their associated resources
> > at device creation time. The motivation is to reduce the vdpa device
> > live migration downtime by moving the expensive operation of creating
> > all the hardware VQs and their associated resources out of downtime on
> > the destination VM.
> > 
> > The VQs are now created in a blank state. The VQ configuration will
> > happen later, on DRIVER_OK. Then the configuration will be applied when
> > the VQs are moved to the Ready state.
> > 
> > When .set_vq_ready() is called on a VQ before DRIVER_OK, special care is
> > needed: now that the VQ is already created a resume_vq() will be
> > triggered too early when no mr has been configured yet. Skip calling
> > resume_vq() in this case, let it be handled during DRIVER_OK.
> > 
> > For virtio-vdpa, the device configuration is done earlier during
> > .vdpa_dev_add() by vdpa_register_device(). Avoid calling
> > setup_vq_resources() a second time in that case.
> > 
> 
> I guess this happens if virtio_vdpa is already loaded, but I cannot
> see how this is different here. Apart from the IOTLB, what else does
> it change from the mlx5_vdpa POV?
> 
I don't understand your question, could you rephrase or provide more context
please?

Thanks,
Dragos

> > Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> > Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
> > ---
> >  drivers/vdpa/mlx5/net/mlx5_vnet.c | 37 ++++++++++++++++++++++++++++++++-----
> >  1 file changed, 32 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > index 249b5afbe34a..b2836fd3d1dd 100644
> > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > @@ -2444,7 +2444,7 @@ static void mlx5_vdpa_set_vq_ready(struct vdpa_device *vdev, u16 idx, bool ready
> >         mvq = &ndev->vqs[idx];
> >         if (!ready) {
> >                 suspend_vq(ndev, mvq);
> > -       } else {
> > +       } else if (mvdev->status & VIRTIO_CONFIG_S_DRIVER_OK) {
> >                 if (resume_vq(ndev, mvq))
> >                         ready = false;
> >         }
> > @@ -3078,10 +3078,18 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status)
> >                                 goto err_setup;
> >                         }
> >                         register_link_notifier(ndev);
> > -                       err = setup_vq_resources(ndev, true);
> > -                       if (err) {
> > -                               mlx5_vdpa_warn(mvdev, "failed to setup driver\n");
> > -                               goto err_driver;
> > +                       if (ndev->setup) {
> > +                               err = resume_vqs(ndev);
> > +                               if (err) {
> > +                                       mlx5_vdpa_warn(mvdev, "failed to resume VQs\n");
> > +                                       goto err_driver;
> > +                               }
> > +                       } else {
> > +                               err = setup_vq_resources(ndev, true);
> > +                               if (err) {
> > +                                       mlx5_vdpa_warn(mvdev, "failed to setup driver\n");
> > +                                       goto err_driver;
> > +                               }
> >                         }
> >                 } else {
> >                         mlx5_vdpa_warn(mvdev, "did not expect DRIVER_OK to be cleared\n");
> > @@ -3142,6 +3150,7 @@ static int mlx5_vdpa_compat_reset(struct vdpa_device *vdev, u32 flags)
> >                 if (mlx5_vdpa_create_dma_mr(mvdev))
> >                         mlx5_vdpa_warn(mvdev, "create MR failed\n");
> >         }
> > +       setup_vq_resources(ndev, false);
> >         up_write(&ndev->reslock);
> > 
> >         return 0;
> > @@ -3836,8 +3845,21 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name,
> >                 goto err_reg;
> > 
> >         mgtdev->ndev = ndev;
> > +
> > +       /* For virtio-vdpa, the device was set up during device register. */
> > +       if (ndev->setup)
> > +               return 0;
> > +
> > +       down_write(&ndev->reslock);
> > +       err = setup_vq_resources(ndev, false);
> > +       up_write(&ndev->reslock);
> > +       if (err)
> > +               goto err_setup_vq_res;
> > +
> >         return 0;
> > 
> > +err_setup_vq_res:
> > +       _vdpa_unregister_device(&mvdev->vdev);
> >  err_reg:
> >         destroy_workqueue(mvdev->wq);
> >  err_res2:
> > @@ -3863,6 +3885,11 @@ static void mlx5_vdpa_dev_del(struct vdpa_mgmt_dev *v_mdev, struct vdpa_device *
> > 
> >         unregister_link_notifier(ndev);
> >         _vdpa_unregister_device(dev);
> > +
> > +       down_write(&ndev->reslock);
> > +       teardown_vq_resources(ndev);
> > +       up_write(&ndev->reslock);
> > +
> >         wq = mvdev->wq;
> >         mvdev->wq = NULL;
> >         destroy_workqueue(wq);
> > 
> > --
> > 2.45.1
> > 
> 


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH vhost 18/23] vdpa/mlx5: Forward error in suspend/resume device
  2024-06-23 11:19   ` Zhu Yanjun
@ 2024-06-26  9:28     ` Dragos Tatulea
  0 siblings, 0 replies; 53+ messages in thread
From: Dragos Tatulea @ 2024-06-26  9:28 UTC (permalink / raw)
  To: xuanzhuo@linux.alibaba.com, Tariq Toukan, eperezma@redhat.com,
	yanjun.zhu@linux.dev, si-wei.liu@oracle.com, mst@redhat.com,
	jasowang@redhat.com, Saeed Mahameed, leon@kernel.org
  Cc: Cosmin Ratiu, linux-kernel@vger.kernel.org,
	virtualization@lists.linux.dev, linux-rdma@vger.kernel.org,
	netdev@vger.kernel.org

On Sun, 2024-06-23 at 19:19 +0800, Zhu Yanjun wrote:
> 在 2024/6/17 23:07, Dragos Tatulea 写道:
> > Start using the suspend/resume_vq() error return codes previously added.
> > 
> > Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> > Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
> > ---
> >   drivers/vdpa/mlx5/net/mlx5_vnet.c | 12 ++++++++----
> >   1 file changed, 8 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > index f5d5b25cdb01..0e1c1b7ff297 100644
> > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > @@ -3436,22 +3436,25 @@ static int mlx5_vdpa_suspend(struct vdpa_device *vdev)
> >   {
> >   	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> >   	struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> > +	int err;
> 
> Reverse Christmas Tree?
Would have fixed the code if it would have been part of the patch. But it isn't.

> 
> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
> 
Thanks!

> Zhu Yanjun
> >   
> >   	mlx5_vdpa_info(mvdev, "suspending device\n");
> >   
> >   	down_write(&ndev->reslock);
> >   	unregister_link_notifier(ndev);
> > -	suspend_vqs(ndev);
> > +	err = suspend_vqs(ndev);
> >   	mlx5_vdpa_cvq_suspend(mvdev);
> >   	mvdev->suspended = true;
> >   	up_write(&ndev->reslock);
> > -	return 0;
> > +
> > +	return err;
> >   }
> >   
> >   static int mlx5_vdpa_resume(struct vdpa_device *vdev)
> >   {
> >   	struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> >   	struct mlx5_vdpa_net *ndev;
> > +	int err;
> >   
> >   	ndev = to_mlx5_vdpa_ndev(mvdev);
> >   
> > @@ -3459,10 +3462,11 @@ static int mlx5_vdpa_resume(struct vdpa_device *vdev)
> >   
> >   	down_write(&ndev->reslock);
> >   	mvdev->suspended = false;
> > -	resume_vqs(ndev);
> > +	err = resume_vqs(ndev);
> >   	register_link_notifier(ndev);
> >   	up_write(&ndev->reslock);
> > -	return 0;
> > +
> > +	return err;
> >   }
> >   
> >   static int mlx5_set_group_asid(struct vdpa_device *vdev, u32 group,
> > 
> 


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH vhost 20/23] vdpa/mlx5: Pre-create hardware VQs at vdpa .dev_add time
  2024-06-26  9:27     ` Dragos Tatulea
@ 2024-07-03 16:01       ` Eugenio Perez Martin
  2024-07-08 11:01         ` Dragos Tatulea
  0 siblings, 1 reply; 53+ messages in thread
From: Eugenio Perez Martin @ 2024-07-03 16:01 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: linux-rdma@vger.kernel.org, xuanzhuo@linux.alibaba.com,
	virtualization@lists.linux.dev, Tariq Toukan,
	linux-kernel@vger.kernel.org, Cosmin Ratiu, si-wei.liu@oracle.com,
	jasowang@redhat.com, mst@redhat.com, Saeed Mahameed,
	leon@kernel.org, netdev@vger.kernel.org

On Wed, Jun 26, 2024 at 11:27 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> On Wed, 2024-06-19 at 17:54 +0200, Eugenio Perez Martin wrote:
> > On Mon, Jun 17, 2024 at 5:09 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> > >
> > > Currently, hardware VQs are created right when the vdpa device gets into
> > > DRIVER_OK state. That is easier because most of the VQ state is known by
> > > then.
> > >
> > > This patch switches to creating all VQs and their associated resources
> > > at device creation time. The motivation is to reduce the vdpa device
> > > live migration downtime by moving the expensive operation of creating
> > > all the hardware VQs and their associated resources out of downtime on
> > > the destination VM.
> > >
> > > The VQs are now created in a blank state. The VQ configuration will
> > > happen later, on DRIVER_OK. Then the configuration will be applied when
> > > the VQs are moved to the Ready state.
> > >
> > > When .set_vq_ready() is called on a VQ before DRIVER_OK, special care is
> > > needed: now that the VQ is already created a resume_vq() will be
> > > triggered too early when no mr has been configured yet. Skip calling
> > > resume_vq() in this case, let it be handled during DRIVER_OK.
> > >
> > > For virtio-vdpa, the device configuration is done earlier during
> > > .vdpa_dev_add() by vdpa_register_device(). Avoid calling
> > > setup_vq_resources() a second time in that case.
> > >
> >
> > I guess this happens if virtio_vdpa is already loaded, but I cannot
> > see how this is different here. Apart from the IOTLB, what else does
> > it change from the mlx5_vdpa POV?
> >
> I don't understand your question, could you rephrase or provide more context
> please?
>

My main point is that the vdpa parent driver should not be able to
tell the difference between vhost_vdpa and virtio_vdpa. The only
difference I can think of is because of the vhost IOTLB handling.

Do you also observe this behavior if you add the device with "vdpa
add" without the virtio_vdpa module loaded, and then modprobe
virtio_vdpa?

At least the comment should be something in the line of "If we have
all the information to initialize the device, pre-warm it here" or
similar.

> Thanks,
> Dragos
>
> > > Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> > > Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
> > > ---
> > >  drivers/vdpa/mlx5/net/mlx5_vnet.c | 37 ++++++++++++++++++++++++++++++++-----
> > >  1 file changed, 32 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > index 249b5afbe34a..b2836fd3d1dd 100644
> > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > @@ -2444,7 +2444,7 @@ static void mlx5_vdpa_set_vq_ready(struct vdpa_device *vdev, u16 idx, bool ready
> > >         mvq = &ndev->vqs[idx];
> > >         if (!ready) {
> > >                 suspend_vq(ndev, mvq);
> > > -       } else {
> > > +       } else if (mvdev->status & VIRTIO_CONFIG_S_DRIVER_OK) {
> > >                 if (resume_vq(ndev, mvq))
> > >                         ready = false;
> > >         }
> > > @@ -3078,10 +3078,18 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status)
> > >                                 goto err_setup;
> > >                         }
> > >                         register_link_notifier(ndev);
> > > -                       err = setup_vq_resources(ndev, true);
> > > -                       if (err) {
> > > -                               mlx5_vdpa_warn(mvdev, "failed to setup driver\n");
> > > -                               goto err_driver;
> > > +                       if (ndev->setup) {
> > > +                               err = resume_vqs(ndev);
> > > +                               if (err) {
> > > +                                       mlx5_vdpa_warn(mvdev, "failed to resume VQs\n");
> > > +                                       goto err_driver;
> > > +                               }
> > > +                       } else {
> > > +                               err = setup_vq_resources(ndev, true);
> > > +                               if (err) {
> > > +                                       mlx5_vdpa_warn(mvdev, "failed to setup driver\n");
> > > +                                       goto err_driver;
> > > +                               }
> > >                         }
> > >                 } else {
> > >                         mlx5_vdpa_warn(mvdev, "did not expect DRIVER_OK to be cleared\n");
> > > @@ -3142,6 +3150,7 @@ static int mlx5_vdpa_compat_reset(struct vdpa_device *vdev, u32 flags)
> > >                 if (mlx5_vdpa_create_dma_mr(mvdev))
> > >                         mlx5_vdpa_warn(mvdev, "create MR failed\n");
> > >         }
> > > +       setup_vq_resources(ndev, false);
> > >         up_write(&ndev->reslock);
> > >
> > >         return 0;
> > > @@ -3836,8 +3845,21 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name,
> > >                 goto err_reg;
> > >
> > >         mgtdev->ndev = ndev;
> > > +
> > > +       /* For virtio-vdpa, the device was set up during device register. */
> > > +       if (ndev->setup)
> > > +               return 0;
> > > +
> > > +       down_write(&ndev->reslock);
> > > +       err = setup_vq_resources(ndev, false);
> > > +       up_write(&ndev->reslock);
> > > +       if (err)
> > > +               goto err_setup_vq_res;
> > > +
> > >         return 0;
> > >
> > > +err_setup_vq_res:
> > > +       _vdpa_unregister_device(&mvdev->vdev);
> > >  err_reg:
> > >         destroy_workqueue(mvdev->wq);
> > >  err_res2:
> > > @@ -3863,6 +3885,11 @@ static void mlx5_vdpa_dev_del(struct vdpa_mgmt_dev *v_mdev, struct vdpa_device *
> > >
> > >         unregister_link_notifier(ndev);
> > >         _vdpa_unregister_device(dev);
> > > +
> > > +       down_write(&ndev->reslock);
> > > +       teardown_vq_resources(ndev);
> > > +       up_write(&ndev->reslock);
> > > +
> > >         wq = mvdev->wq;
> > >         mvdev->wq = NULL;
> > >         destroy_workqueue(wq);
> > >
> > > --
> > > 2.45.1
> > >
> >
>


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH vhost 20/23] vdpa/mlx5: Pre-create hardware VQs at vdpa .dev_add time
  2024-07-03 16:01       ` Eugenio Perez Martin
@ 2024-07-08 11:01         ` Dragos Tatulea
  2024-07-08 11:11           ` Michael S. Tsirkin
  0 siblings, 1 reply; 53+ messages in thread
From: Dragos Tatulea @ 2024-07-08 11:01 UTC (permalink / raw)
  To: eperezma@redhat.com
  Cc: linux-rdma@vger.kernel.org, xuanzhuo@linux.alibaba.com,
	virtualization@lists.linux.dev, Tariq Toukan,
	linux-kernel@vger.kernel.org, Cosmin Ratiu, jasowang@redhat.com,
	mst@redhat.com, si-wei.liu@oracle.com, Saeed Mahameed,
	leon@kernel.org, netdev@vger.kernel.org

On Wed, 2024-07-03 at 18:01 +0200, Eugenio Perez Martin wrote:
> On Wed, Jun 26, 2024 at 11:27 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> > 
> > On Wed, 2024-06-19 at 17:54 +0200, Eugenio Perez Martin wrote:
> > > On Mon, Jun 17, 2024 at 5:09 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> > > > 
> > > > Currently, hardware VQs are created right when the vdpa device gets into
> > > > DRIVER_OK state. That is easier because most of the VQ state is known by
> > > > then.
> > > > 
> > > > This patch switches to creating all VQs and their associated resources
> > > > at device creation time. The motivation is to reduce the vdpa device
> > > > live migration downtime by moving the expensive operation of creating
> > > > all the hardware VQs and their associated resources out of downtime on
> > > > the destination VM.
> > > > 
> > > > The VQs are now created in a blank state. The VQ configuration will
> > > > happen later, on DRIVER_OK. Then the configuration will be applied when
> > > > the VQs are moved to the Ready state.
> > > > 
> > > > When .set_vq_ready() is called on a VQ before DRIVER_OK, special care is
> > > > needed: now that the VQ is already created a resume_vq() will be
> > > > triggered too early when no mr has been configured yet. Skip calling
> > > > resume_vq() in this case, let it be handled during DRIVER_OK.
> > > > 
> > > > For virtio-vdpa, the device configuration is done earlier during
> > > > .vdpa_dev_add() by vdpa_register_device(). Avoid calling
> > > > setup_vq_resources() a second time in that case.
> > > > 
> > > 
> > > I guess this happens if virtio_vdpa is already loaded, but I cannot
> > > see how this is different here. Apart from the IOTLB, what else does
> > > it change from the mlx5_vdpa POV?
> > > 
> > I don't understand your question, could you rephrase or provide more context
> > please?
> > 
> 
> My main point is that the vdpa parent driver should not be able to
> tell the difference between vhost_vdpa and virtio_vdpa. The only
> difference I can think of is because of the vhost IOTLB handling.
> 
> Do you also observe this behavior if you add the device with "vdpa
> add" without the virtio_vdpa module loaded, and then modprobe
> virtio_vdpa?
> 
Aah, now I understand what you mean. Indeed in my tests I was loading the
virtio_vdpa module before adding the device. When doing it the other way around
the device doesn't get configured during probe.
 

> At least the comment should be something in the line of "If we have
> all the information to initialize the device, pre-warm it here" or
> similar.
Makes sense. I will send a v3 with the commit + comment message update.

> 
> > Thanks,
> > Dragos
> > 
> > > > Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> > > > Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
> > > > ---
> > > >  drivers/vdpa/mlx5/net/mlx5_vnet.c | 37 ++++++++++++++++++++++++++++++++-----
> > > >  1 file changed, 32 insertions(+), 5 deletions(-)
> > > > 
> > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > index 249b5afbe34a..b2836fd3d1dd 100644
> > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > @@ -2444,7 +2444,7 @@ static void mlx5_vdpa_set_vq_ready(struct vdpa_device *vdev, u16 idx, bool ready
> > > >         mvq = &ndev->vqs[idx];
> > > >         if (!ready) {
> > > >                 suspend_vq(ndev, mvq);
> > > > -       } else {
> > > > +       } else if (mvdev->status & VIRTIO_CONFIG_S_DRIVER_OK) {
> > > >                 if (resume_vq(ndev, mvq))
> > > >                         ready = false;
> > > >         }
> > > > @@ -3078,10 +3078,18 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status)
> > > >                                 goto err_setup;
> > > >                         }
> > > >                         register_link_notifier(ndev);
> > > > -                       err = setup_vq_resources(ndev, true);
> > > > -                       if (err) {
> > > > -                               mlx5_vdpa_warn(mvdev, "failed to setup driver\n");
> > > > -                               goto err_driver;
> > > > +                       if (ndev->setup) {
> > > > +                               err = resume_vqs(ndev);
> > > > +                               if (err) {
> > > > +                                       mlx5_vdpa_warn(mvdev, "failed to resume VQs\n");
> > > > +                                       goto err_driver;
> > > > +                               }
> > > > +                       } else {
> > > > +                               err = setup_vq_resources(ndev, true);
> > > > +                               if (err) {
> > > > +                                       mlx5_vdpa_warn(mvdev, "failed to setup driver\n");
> > > > +                                       goto err_driver;
> > > > +                               }
> > > >                         }
> > > >                 } else {
> > > >                         mlx5_vdpa_warn(mvdev, "did not expect DRIVER_OK to be cleared\n");
> > > > @@ -3142,6 +3150,7 @@ static int mlx5_vdpa_compat_reset(struct vdpa_device *vdev, u32 flags)
> > > >                 if (mlx5_vdpa_create_dma_mr(mvdev))
> > > >                         mlx5_vdpa_warn(mvdev, "create MR failed\n");
> > > >         }
> > > > +       setup_vq_resources(ndev, false);
> > > >         up_write(&ndev->reslock);
> > > > 
> > > >         return 0;
> > > > @@ -3836,8 +3845,21 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name,
> > > >                 goto err_reg;
> > > > 
> > > >         mgtdev->ndev = ndev;
> > > > +
> > > > +       /* For virtio-vdpa, the device was set up during device register. */
> > > > +       if (ndev->setup)
> > > > +               return 0;
> > > > +
> > > > +       down_write(&ndev->reslock);
> > > > +       err = setup_vq_resources(ndev, false);
> > > > +       up_write(&ndev->reslock);
> > > > +       if (err)
> > > > +               goto err_setup_vq_res;
> > > > +
> > > >         return 0;
> > > > 
> > > > +err_setup_vq_res:
> > > > +       _vdpa_unregister_device(&mvdev->vdev);
> > > >  err_reg:
> > > >         destroy_workqueue(mvdev->wq);
> > > >  err_res2:
> > > > @@ -3863,6 +3885,11 @@ static void mlx5_vdpa_dev_del(struct vdpa_mgmt_dev *v_mdev, struct vdpa_device *
> > > > 
> > > >         unregister_link_notifier(ndev);
> > > >         _vdpa_unregister_device(dev);
> > > > +
> > > > +       down_write(&ndev->reslock);
> > > > +       teardown_vq_resources(ndev);
> > > > +       up_write(&ndev->reslock);
> > > > +
> > > >         wq = mvdev->wq;
> > > >         mvdev->wq = NULL;
> > > >         destroy_workqueue(wq);
> > > > 
> > > > --
> > > > 2.45.1
> > > > 
> > > 
> > 
> 


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH vhost 20/23] vdpa/mlx5: Pre-create hardware VQs at vdpa .dev_add time
  2024-07-08 11:01         ` Dragos Tatulea
@ 2024-07-08 11:11           ` Michael S. Tsirkin
  2024-07-08 11:17             ` Dragos Tatulea
  0 siblings, 1 reply; 53+ messages in thread
From: Michael S. Tsirkin @ 2024-07-08 11:11 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: eperezma@redhat.com, linux-rdma@vger.kernel.org,
	xuanzhuo@linux.alibaba.com, virtualization@lists.linux.dev,
	Tariq Toukan, linux-kernel@vger.kernel.org, Cosmin Ratiu,
	jasowang@redhat.com, si-wei.liu@oracle.com, Saeed Mahameed,
	leon@kernel.org, netdev@vger.kernel.org

On Mon, Jul 08, 2024 at 11:01:39AM +0000, Dragos Tatulea wrote:
> On Wed, 2024-07-03 at 18:01 +0200, Eugenio Perez Martin wrote:
> > On Wed, Jun 26, 2024 at 11:27 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> > > 
> > > On Wed, 2024-06-19 at 17:54 +0200, Eugenio Perez Martin wrote:
> > > > On Mon, Jun 17, 2024 at 5:09 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> > > > > 
> > > > > Currently, hardware VQs are created right when the vdpa device gets into
> > > > > DRIVER_OK state. That is easier because most of the VQ state is known by
> > > > > then.
> > > > > 
> > > > > This patch switches to creating all VQs and their associated resources
> > > > > at device creation time. The motivation is to reduce the vdpa device
> > > > > live migration downtime by moving the expensive operation of creating
> > > > > all the hardware VQs and their associated resources out of downtime on
> > > > > the destination VM.
> > > > > 
> > > > > The VQs are now created in a blank state. The VQ configuration will
> > > > > happen later, on DRIVER_OK. Then the configuration will be applied when
> > > > > the VQs are moved to the Ready state.
> > > > > 
> > > > > When .set_vq_ready() is called on a VQ before DRIVER_OK, special care is
> > > > > needed: now that the VQ is already created a resume_vq() will be
> > > > > triggered too early when no mr has been configured yet. Skip calling
> > > > > resume_vq() in this case, let it be handled during DRIVER_OK.
> > > > > 
> > > > > For virtio-vdpa, the device configuration is done earlier during
> > > > > .vdpa_dev_add() by vdpa_register_device(). Avoid calling
> > > > > setup_vq_resources() a second time in that case.
> > > > > 
> > > > 
> > > > I guess this happens if virtio_vdpa is already loaded, but I cannot
> > > > see how this is different here. Apart from the IOTLB, what else does
> > > > it change from the mlx5_vdpa POV?
> > > > 
> > > I don't understand your question, could you rephrase or provide more context
> > > please?
> > > 
> > 
> > My main point is that the vdpa parent driver should not be able to
> > tell the difference between vhost_vdpa and virtio_vdpa. The only
> > difference I can think of is because of the vhost IOTLB handling.
> > 
> > Do you also observe this behavior if you add the device with "vdpa
> > add" without the virtio_vdpa module loaded, and then modprobe
> > virtio_vdpa?
> > 
> Aah, now I understand what you mean. Indeed in my tests I was loading the
> virtio_vdpa module before adding the device. When doing it the other way around
> the device doesn't get configured during probe.
>  
> 
> > At least the comment should be something in the line of "If we have
> > all the information to initialize the device, pre-warm it here" or
> > similar.
> Makes sense. I will send a v3 with the commit + comment message update.


Is commit update the only change then?

> > 
> > > Thanks,
> > > Dragos
> > > 
> > > > > Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> > > > > Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
> > > > > ---
> > > > >  drivers/vdpa/mlx5/net/mlx5_vnet.c | 37 ++++++++++++++++++++++++++++++++-----
> > > > >  1 file changed, 32 insertions(+), 5 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > index 249b5afbe34a..b2836fd3d1dd 100644
> > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > @@ -2444,7 +2444,7 @@ static void mlx5_vdpa_set_vq_ready(struct vdpa_device *vdev, u16 idx, bool ready
> > > > >         mvq = &ndev->vqs[idx];
> > > > >         if (!ready) {
> > > > >                 suspend_vq(ndev, mvq);
> > > > > -       } else {
> > > > > +       } else if (mvdev->status & VIRTIO_CONFIG_S_DRIVER_OK) {
> > > > >                 if (resume_vq(ndev, mvq))
> > > > >                         ready = false;
> > > > >         }
> > > > > @@ -3078,10 +3078,18 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status)
> > > > >                                 goto err_setup;
> > > > >                         }
> > > > >                         register_link_notifier(ndev);
> > > > > -                       err = setup_vq_resources(ndev, true);
> > > > > -                       if (err) {
> > > > > -                               mlx5_vdpa_warn(mvdev, "failed to setup driver\n");
> > > > > -                               goto err_driver;
> > > > > +                       if (ndev->setup) {
> > > > > +                               err = resume_vqs(ndev);
> > > > > +                               if (err) {
> > > > > +                                       mlx5_vdpa_warn(mvdev, "failed to resume VQs\n");
> > > > > +                                       goto err_driver;
> > > > > +                               }
> > > > > +                       } else {
> > > > > +                               err = setup_vq_resources(ndev, true);
> > > > > +                               if (err) {
> > > > > +                                       mlx5_vdpa_warn(mvdev, "failed to setup driver\n");
> > > > > +                                       goto err_driver;
> > > > > +                               }
> > > > >                         }
> > > > >                 } else {
> > > > >                         mlx5_vdpa_warn(mvdev, "did not expect DRIVER_OK to be cleared\n");
> > > > > @@ -3142,6 +3150,7 @@ static int mlx5_vdpa_compat_reset(struct vdpa_device *vdev, u32 flags)
> > > > >                 if (mlx5_vdpa_create_dma_mr(mvdev))
> > > > >                         mlx5_vdpa_warn(mvdev, "create MR failed\n");
> > > > >         }
> > > > > +       setup_vq_resources(ndev, false);
> > > > >         up_write(&ndev->reslock);
> > > > > 
> > > > >         return 0;
> > > > > @@ -3836,8 +3845,21 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name,
> > > > >                 goto err_reg;
> > > > > 
> > > > >         mgtdev->ndev = ndev;
> > > > > +
> > > > > +       /* For virtio-vdpa, the device was set up during device register. */
> > > > > +       if (ndev->setup)
> > > > > +               return 0;
> > > > > +
> > > > > +       down_write(&ndev->reslock);
> > > > > +       err = setup_vq_resources(ndev, false);
> > > > > +       up_write(&ndev->reslock);
> > > > > +       if (err)
> > > > > +               goto err_setup_vq_res;
> > > > > +
> > > > >         return 0;
> > > > > 
> > > > > +err_setup_vq_res:
> > > > > +       _vdpa_unregister_device(&mvdev->vdev);
> > > > >  err_reg:
> > > > >         destroy_workqueue(mvdev->wq);
> > > > >  err_res2:
> > > > > @@ -3863,6 +3885,11 @@ static void mlx5_vdpa_dev_del(struct vdpa_mgmt_dev *v_mdev, struct vdpa_device *
> > > > > 
> > > > >         unregister_link_notifier(ndev);
> > > > >         _vdpa_unregister_device(dev);
> > > > > +
> > > > > +       down_write(&ndev->reslock);
> > > > > +       teardown_vq_resources(ndev);
> > > > > +       up_write(&ndev->reslock);
> > > > > +
> > > > >         wq = mvdev->wq;
> > > > >         mvdev->wq = NULL;
> > > > >         destroy_workqueue(wq);
> > > > > 
> > > > > --
> > > > > 2.45.1
> > > > > 
> > > > 
> > > 
> > 
> 


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH vhost 20/23] vdpa/mlx5: Pre-create hardware VQs at vdpa .dev_add time
  2024-07-08 11:11           ` Michael S. Tsirkin
@ 2024-07-08 11:17             ` Dragos Tatulea
  2024-07-08 11:25               ` Michael S. Tsirkin
  0 siblings, 1 reply; 53+ messages in thread
From: Dragos Tatulea @ 2024-07-08 11:17 UTC (permalink / raw)
  To: mst@redhat.com
  Cc: linux-rdma@vger.kernel.org, xuanzhuo@linux.alibaba.com,
	virtualization@lists.linux.dev, Tariq Toukan, eperezma@redhat.com,
	linux-kernel@vger.kernel.org, Cosmin Ratiu, jasowang@redhat.com,
	leon@kernel.org, si-wei.liu@oracle.com, Saeed Mahameed,
	netdev@vger.kernel.org

On Mon, 2024-07-08 at 07:11 -0400, Michael S. Tsirkin wrote:
> On Mon, Jul 08, 2024 at 11:01:39AM +0000, Dragos Tatulea wrote:
> > On Wed, 2024-07-03 at 18:01 +0200, Eugenio Perez Martin wrote:
> > > On Wed, Jun 26, 2024 at 11:27 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> > > > 
> > > > On Wed, 2024-06-19 at 17:54 +0200, Eugenio Perez Martin wrote:
> > > > > On Mon, Jun 17, 2024 at 5:09 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> > > > > > 
> > > > > > Currently, hardware VQs are created right when the vdpa device gets into
> > > > > > DRIVER_OK state. That is easier because most of the VQ state is known by
> > > > > > then.
> > > > > > 
> > > > > > This patch switches to creating all VQs and their associated resources
> > > > > > at device creation time. The motivation is to reduce the vdpa device
> > > > > > live migration downtime by moving the expensive operation of creating
> > > > > > all the hardware VQs and their associated resources out of downtime on
> > > > > > the destination VM.
> > > > > > 
> > > > > > The VQs are now created in a blank state. The VQ configuration will
> > > > > > happen later, on DRIVER_OK. Then the configuration will be applied when
> > > > > > the VQs are moved to the Ready state.
> > > > > > 
> > > > > > When .set_vq_ready() is called on a VQ before DRIVER_OK, special care is
> > > > > > needed: now that the VQ is already created a resume_vq() will be
> > > > > > triggered too early when no mr has been configured yet. Skip calling
> > > > > > resume_vq() in this case, let it be handled during DRIVER_OK.
> > > > > > 
> > > > > > For virtio-vdpa, the device configuration is done earlier during
> > > > > > .vdpa_dev_add() by vdpa_register_device(). Avoid calling
> > > > > > setup_vq_resources() a second time in that case.
> > > > > > 
> > > > > 
> > > > > I guess this happens if virtio_vdpa is already loaded, but I cannot
> > > > > see how this is different here. Apart from the IOTLB, what else does
> > > > > it change from the mlx5_vdpa POV?
> > > > > 
> > > > I don't understand your question, could you rephrase or provide more context
> > > > please?
> > > > 
> > > 
> > > My main point is that the vdpa parent driver should not be able to
> > > tell the difference between vhost_vdpa and virtio_vdpa. The only
> > > difference I can think of is because of the vhost IOTLB handling.
> > > 
> > > Do you also observe this behavior if you add the device with "vdpa
> > > add" without the virtio_vdpa module loaded, and then modprobe
> > > virtio_vdpa?
> > > 
> > Aah, now I understand what you mean. Indeed in my tests I was loading the
> > virtio_vdpa module before adding the device. When doing it the other way around
> > the device doesn't get configured during probe.
> >  
> > 
> > > At least the comment should be something in the line of "If we have
> > > all the information to initialize the device, pre-warm it here" or
> > > similar.
> > Makes sense. I will send a v3 with the commit + comment message update.
> 
> 
> Is commit update the only change then?
I was planning to drop the paragraph in the commit message (it is confusing) and
edit the comment below (scroll down to see which).

Let me know if I should send the v3 or not. I have it prepared.

> 
> > > 
> > > > Thanks,
> > > > Dragos
> > > > 
> > > > > > Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> > > > > > Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
> > > > > > ---
> > > > > >  drivers/vdpa/mlx5/net/mlx5_vnet.c | 37 ++++++++++++++++++++++++++++++++-----
> > > > > >  1 file changed, 32 insertions(+), 5 deletions(-)
> > > > > > 
> > > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > index 249b5afbe34a..b2836fd3d1dd 100644
> > > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > @@ -2444,7 +2444,7 @@ static void mlx5_vdpa_set_vq_ready(struct vdpa_device *vdev, u16 idx, bool ready
> > > > > >         mvq = &ndev->vqs[idx];
> > > > > >         if (!ready) {
> > > > > >                 suspend_vq(ndev, mvq);
> > > > > > -       } else {
> > > > > > +       } else if (mvdev->status & VIRTIO_CONFIG_S_DRIVER_OK) {
> > > > > >                 if (resume_vq(ndev, mvq))
> > > > > >                         ready = false;
> > > > > >         }
> > > > > > @@ -3078,10 +3078,18 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status)
> > > > > >                                 goto err_setup;
> > > > > >                         }
> > > > > >                         register_link_notifier(ndev);
> > > > > > -                       err = setup_vq_resources(ndev, true);
> > > > > > -                       if (err) {
> > > > > > -                               mlx5_vdpa_warn(mvdev, "failed to setup driver\n");
> > > > > > -                               goto err_driver;
> > > > > > +                       if (ndev->setup) {
> > > > > > +                               err = resume_vqs(ndev);
> > > > > > +                               if (err) {
> > > > > > +                                       mlx5_vdpa_warn(mvdev, "failed to resume VQs\n");
> > > > > > +                                       goto err_driver;
> > > > > > +                               }
> > > > > > +                       } else {
> > > > > > +                               err = setup_vq_resources(ndev, true);
> > > > > > +                               if (err) {
> > > > > > +                                       mlx5_vdpa_warn(mvdev, "failed to setup driver\n");
> > > > > > +                                       goto err_driver;
> > > > > > +                               }
> > > > > >                         }
> > > > > >                 } else {
> > > > > >                         mlx5_vdpa_warn(mvdev, "did not expect DRIVER_OK to be cleared\n");
> > > > > > @@ -3142,6 +3150,7 @@ static int mlx5_vdpa_compat_reset(struct vdpa_device *vdev, u32 flags)
> > > > > >                 if (mlx5_vdpa_create_dma_mr(mvdev))
> > > > > >                         mlx5_vdpa_warn(mvdev, "create MR failed\n");
> > > > > >         }
> > > > > > +       setup_vq_resources(ndev, false);
> > > > > >         up_write(&ndev->reslock);
> > > > > > 
> > > > > >         return 0;
> > > > > > @@ -3836,8 +3845,21 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name,
> > > > > >                 goto err_reg;
> > > > > > 
> > > > > >         mgtdev->ndev = ndev;
> > > > > > +
> > > > > > +       /* For virtio-vdpa, the device was set up during device register. */
> > > > > > +       if (ndev->setup)
> > > > > > +               return 0;
> > > > > > +
This comment updated to:

/* The VQs might have been pre-created during device register.
 * This happens when virtio_vdpa is loaded before the vdpa device is added.
 */


> > > > > > +       down_write(&ndev->reslock);
> > > > > > +       err = setup_vq_resources(ndev, false);
> > > > > > +       up_write(&ndev->reslock);
> > > > > > +       if (err)
> > > > > > +               goto err_setup_vq_res;
> > > > > > +
> > > > > >         return 0;
> > > > > > 
> > > > > > +err_setup_vq_res:
> > > > > > +       _vdpa_unregister_device(&mvdev->vdev);
> > > > > >  err_reg:
> > > > > >         destroy_workqueue(mvdev->wq);
> > > > > >  err_res2:
> > > > > > @@ -3863,6 +3885,11 @@ static void mlx5_vdpa_dev_del(struct vdpa_mgmt_dev *v_mdev, struct vdpa_device *
> > > > > > 
> > > > > >         unregister_link_notifier(ndev);
> > > > > >         _vdpa_unregister_device(dev);
> > > > > > +
> > > > > > +       down_write(&ndev->reslock);
> > > > > > +       teardown_vq_resources(ndev);
> > > > > > +       up_write(&ndev->reslock);
> > > > > > +
> > > > > >         wq = mvdev->wq;
> > > > > >         mvdev->wq = NULL;
> > > > > >         destroy_workqueue(wq);
> > > > > > 
> > > > > > --
> > > > > > 2.45.1
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
Thanks,
Dragos


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH vhost 20/23] vdpa/mlx5: Pre-create hardware VQs at vdpa .dev_add time
  2024-07-08 11:17             ` Dragos Tatulea
@ 2024-07-08 11:25               ` Michael S. Tsirkin
  0 siblings, 0 replies; 53+ messages in thread
From: Michael S. Tsirkin @ 2024-07-08 11:25 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: linux-rdma@vger.kernel.org, xuanzhuo@linux.alibaba.com,
	virtualization@lists.linux.dev, Tariq Toukan, eperezma@redhat.com,
	linux-kernel@vger.kernel.org, Cosmin Ratiu, jasowang@redhat.com,
	leon@kernel.org, si-wei.liu@oracle.com, Saeed Mahameed,
	netdev@vger.kernel.org

On Mon, Jul 08, 2024 at 11:17:06AM +0000, Dragos Tatulea wrote:
> On Mon, 2024-07-08 at 07:11 -0400, Michael S. Tsirkin wrote:
> > On Mon, Jul 08, 2024 at 11:01:39AM +0000, Dragos Tatulea wrote:
> > > On Wed, 2024-07-03 at 18:01 +0200, Eugenio Perez Martin wrote:
> > > > On Wed, Jun 26, 2024 at 11:27 AM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> > > > > 
> > > > > On Wed, 2024-06-19 at 17:54 +0200, Eugenio Perez Martin wrote:
> > > > > > On Mon, Jun 17, 2024 at 5:09 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
> > > > > > > 
> > > > > > > Currently, hardware VQs are created right when the vdpa device gets into
> > > > > > > DRIVER_OK state. That is easier because most of the VQ state is known by
> > > > > > > then.
> > > > > > > 
> > > > > > > This patch switches to creating all VQs and their associated resources
> > > > > > > at device creation time. The motivation is to reduce the vdpa device
> > > > > > > live migration downtime by moving the expensive operation of creating
> > > > > > > all the hardware VQs and their associated resources out of downtime on
> > > > > > > the destination VM.
> > > > > > > 
> > > > > > > The VQs are now created in a blank state. The VQ configuration will
> > > > > > > happen later, on DRIVER_OK. Then the configuration will be applied when
> > > > > > > the VQs are moved to the Ready state.
> > > > > > > 
> > > > > > > When .set_vq_ready() is called on a VQ before DRIVER_OK, special care is
> > > > > > > needed: now that the VQ is already created a resume_vq() will be
> > > > > > > triggered too early when no mr has been configured yet. Skip calling
> > > > > > > resume_vq() in this case, let it be handled during DRIVER_OK.
> > > > > > > 
> > > > > > > For virtio-vdpa, the device configuration is done earlier during
> > > > > > > .vdpa_dev_add() by vdpa_register_device(). Avoid calling
> > > > > > > setup_vq_resources() a second time in that case.
> > > > > > > 
> > > > > > 
> > > > > > I guess this happens if virtio_vdpa is already loaded, but I cannot
> > > > > > see how this is different here. Apart from the IOTLB, what else does
> > > > > > it change from the mlx5_vdpa POV?
> > > > > > 
> > > > > I don't understand your question, could you rephrase or provide more context
> > > > > please?
> > > > > 
> > > > 
> > > > My main point is that the vdpa parent driver should not be able to
> > > > tell the difference between vhost_vdpa and virtio_vdpa. The only
> > > > difference I can think of is because of the vhost IOTLB handling.
> > > > 
> > > > Do you also observe this behavior if you add the device with "vdpa
> > > > add" without the virtio_vdpa module loaded, and then modprobe
> > > > virtio_vdpa?
> > > > 
> > > Aah, now I understand what you mean. Indeed in my tests I was loading the
> > > virtio_vdpa module before adding the device. When doing it the other way around
> > > the device doesn't get configured during probe.
> > >  
> > > 
> > > > At least the comment should be something in the line of "If we have
> > > > all the information to initialize the device, pre-warm it here" or
> > > > similar.
> > > Makes sense. I will send a v3 with the commit + comment message update.
> > 
> > 
> > Is commit update the only change then?
> I was planning to drop the paragraph in the commit message (it is confusing) and
> edit the comment below (scroll down to see which).
> 
> Let me know if I should send the v3 or not. I have it prepared.

You can do this but pls document that the only change is in commit log.


> > 
> > > > 
> > > > > Thanks,
> > > > > Dragos
> > > > > 
> > > > > > > Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> > > > > > > Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
> > > > > > > ---
> > > > > > >  drivers/vdpa/mlx5/net/mlx5_vnet.c | 37 ++++++++++++++++++++++++++++++++-----
> > > > > > >  1 file changed, 32 insertions(+), 5 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > index 249b5afbe34a..b2836fd3d1dd 100644
> > > > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > @@ -2444,7 +2444,7 @@ static void mlx5_vdpa_set_vq_ready(struct vdpa_device *vdev, u16 idx, bool ready
> > > > > > >         mvq = &ndev->vqs[idx];
> > > > > > >         if (!ready) {
> > > > > > >                 suspend_vq(ndev, mvq);
> > > > > > > -       } else {
> > > > > > > +       } else if (mvdev->status & VIRTIO_CONFIG_S_DRIVER_OK) {
> > > > > > >                 if (resume_vq(ndev, mvq))
> > > > > > >                         ready = false;
> > > > > > >         }
> > > > > > > @@ -3078,10 +3078,18 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status)
> > > > > > >                                 goto err_setup;
> > > > > > >                         }
> > > > > > >                         register_link_notifier(ndev);
> > > > > > > -                       err = setup_vq_resources(ndev, true);
> > > > > > > -                       if (err) {
> > > > > > > -                               mlx5_vdpa_warn(mvdev, "failed to setup driver\n");
> > > > > > > -                               goto err_driver;
> > > > > > > +                       if (ndev->setup) {
> > > > > > > +                               err = resume_vqs(ndev);
> > > > > > > +                               if (err) {
> > > > > > > +                                       mlx5_vdpa_warn(mvdev, "failed to resume VQs\n");
> > > > > > > +                                       goto err_driver;
> > > > > > > +                               }
> > > > > > > +                       } else {
> > > > > > > +                               err = setup_vq_resources(ndev, true);
> > > > > > > +                               if (err) {
> > > > > > > +                                       mlx5_vdpa_warn(mvdev, "failed to setup driver\n");
> > > > > > > +                                       goto err_driver;
> > > > > > > +                               }
> > > > > > >                         }
> > > > > > >                 } else {
> > > > > > >                         mlx5_vdpa_warn(mvdev, "did not expect DRIVER_OK to be cleared\n");
> > > > > > > @@ -3142,6 +3150,7 @@ static int mlx5_vdpa_compat_reset(struct vdpa_device *vdev, u32 flags)
> > > > > > >                 if (mlx5_vdpa_create_dma_mr(mvdev))
> > > > > > >                         mlx5_vdpa_warn(mvdev, "create MR failed\n");
> > > > > > >         }
> > > > > > > +       setup_vq_resources(ndev, false);
> > > > > > >         up_write(&ndev->reslock);
> > > > > > > 
> > > > > > >         return 0;
> > > > > > > @@ -3836,8 +3845,21 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name,
> > > > > > >                 goto err_reg;
> > > > > > > 
> > > > > > >         mgtdev->ndev = ndev;
> > > > > > > +
> > > > > > > +       /* For virtio-vdpa, the device was set up during device register. */
> > > > > > > +       if (ndev->setup)
> > > > > > > +               return 0;
> > > > > > > +
> This comment updated to:
> 
> /* The VQs might have been pre-created during device register.
>  * This happens when virtio_vdpa is loaded before the vdpa device is added.
>  */
> 
> 
> > > > > > > +       down_write(&ndev->reslock);
> > > > > > > +       err = setup_vq_resources(ndev, false);
> > > > > > > +       up_write(&ndev->reslock);
> > > > > > > +       if (err)
> > > > > > > +               goto err_setup_vq_res;
> > > > > > > +
> > > > > > >         return 0;
> > > > > > > 
> > > > > > > +err_setup_vq_res:
> > > > > > > +       _vdpa_unregister_device(&mvdev->vdev);
> > > > > > >  err_reg:
> > > > > > >         destroy_workqueue(mvdev->wq);
> > > > > > >  err_res2:
> > > > > > > @@ -3863,6 +3885,11 @@ static void mlx5_vdpa_dev_del(struct vdpa_mgmt_dev *v_mdev, struct vdpa_device *
> > > > > > > 
> > > > > > >         unregister_link_notifier(ndev);
> > > > > > >         _vdpa_unregister_device(dev);
> > > > > > > +
> > > > > > > +       down_write(&ndev->reslock);
> > > > > > > +       teardown_vq_resources(ndev);
> > > > > > > +       up_write(&ndev->reslock);
> > > > > > > +
> > > > > > >         wq = mvdev->wq;
> > > > > > >         mvdev->wq = NULL;
> > > > > > >         destroy_workqueue(wq);
> > > > > > > 
> > > > > > > --
> > > > > > > 2.45.1
> > > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> Thanks,
> Dragos
> 


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH vhost 20/23] vdpa/mlx5: Pre-create hardware VQs at vdpa .dev_add time
  2024-06-17 15:07 ` [PATCH vhost 20/23] vdpa/mlx5: Pre-create hardware VQs at vdpa .dev_add time Dragos Tatulea
  2024-06-19 15:54   ` Eugenio Perez Martin
@ 2024-07-08 16:22   ` Zhu Yanjun
  2024-07-08 16:43     ` Dragos Tatulea
  1 sibling, 1 reply; 53+ messages in thread
From: Zhu Yanjun @ 2024-07-08 16:22 UTC (permalink / raw)
  To: Dragos Tatulea, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Eugenio Pérez, Saeed Mahameed, Leon Romanovsky, Tariq Toukan,
	Si-Wei Liu
  Cc: virtualization, linux-kernel, linux-rdma, netdev, Cosmin Ratiu

在 2024/6/17 17:07, Dragos Tatulea 写道:
> Currently, hardware VQs are created right when the vdpa device gets into
> DRIVER_OK state. That is easier because most of the VQ state is known by
> then.
> 
> This patch switches to creating all VQs and their associated resources
> at device creation time. The motivation is to reduce the vdpa device
> live migration downtime by moving the expensive operation of creating
> all the hardware VQs and their associated resources out of downtime on
> the destination VM.

Hi, Dragos Tatulea

 From the above, when a device is created, all the VQs and their 
associated resources are also created.
If VM live migration does not occur, how much resources are wasted?

I mean, to achieve a better downtime, how much resource are used?

"
On a 64 CPU, 256 GB VM with 1 vDPA device of 16 VQps, the full VQ
resource creation + resume time was ~370ms. Now it's down to 60 ms
(only VQ config and resume). The measurements were done on a ConnectX6DX
based vDPA device.
"
 From the above, the performance is amazing.
If we expect to use it in the production hosts, how much resources 
should we prepare to achieve this downtime?

Zhu Yanjun

> 
> The VQs are now created in a blank state. The VQ configuration will
> happen later, on DRIVER_OK. Then the configuration will be applied when
> the VQs are moved to the Ready state.
> 
> When .set_vq_ready() is called on a VQ before DRIVER_OK, special care is
> needed: now that the VQ is already created a resume_vq() will be
> triggered too early when no mr has been configured yet. Skip calling
> resume_vq() in this case, let it be handled during DRIVER_OK.
> 
> For virtio-vdpa, the device configuration is done earlier during
> .vdpa_dev_add() by vdpa_register_device(). Avoid calling
> setup_vq_resources() a second time in that case.
> 
> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
> ---
>   drivers/vdpa/mlx5/net/mlx5_vnet.c | 37 ++++++++++++++++++++++++++++++++-----
>   1 file changed, 32 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index 249b5afbe34a..b2836fd3d1dd 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -2444,7 +2444,7 @@ static void mlx5_vdpa_set_vq_ready(struct vdpa_device *vdev, u16 idx, bool ready
>   	mvq = &ndev->vqs[idx];
>   	if (!ready) {
>   		suspend_vq(ndev, mvq);
> -	} else {
> +	} else if (mvdev->status & VIRTIO_CONFIG_S_DRIVER_OK) {
>   		if (resume_vq(ndev, mvq))
>   			ready = false;
>   	}
> @@ -3078,10 +3078,18 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status)
>   				goto err_setup;
>   			}
>   			register_link_notifier(ndev);
> -			err = setup_vq_resources(ndev, true);
> -			if (err) {
> -				mlx5_vdpa_warn(mvdev, "failed to setup driver\n");
> -				goto err_driver;
> +			if (ndev->setup) {
> +				err = resume_vqs(ndev);
> +				if (err) {
> +					mlx5_vdpa_warn(mvdev, "failed to resume VQs\n");
> +					goto err_driver;
> +				}
> +			} else {
> +				err = setup_vq_resources(ndev, true);
> +				if (err) {
> +					mlx5_vdpa_warn(mvdev, "failed to setup driver\n");
> +					goto err_driver;
> +				}
>   			}
>   		} else {
>   			mlx5_vdpa_warn(mvdev, "did not expect DRIVER_OK to be cleared\n");
> @@ -3142,6 +3150,7 @@ static int mlx5_vdpa_compat_reset(struct vdpa_device *vdev, u32 flags)
>   		if (mlx5_vdpa_create_dma_mr(mvdev))
>   			mlx5_vdpa_warn(mvdev, "create MR failed\n");
>   	}
> +	setup_vq_resources(ndev, false);
>   	up_write(&ndev->reslock);
>   
>   	return 0;
> @@ -3836,8 +3845,21 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name,
>   		goto err_reg;
>   
>   	mgtdev->ndev = ndev;
> +
> +	/* For virtio-vdpa, the device was set up during device register. */
> +	if (ndev->setup)
> +		return 0;
> +
> +	down_write(&ndev->reslock);
> +	err = setup_vq_resources(ndev, false);
> +	up_write(&ndev->reslock);
> +	if (err)
> +		goto err_setup_vq_res;
> +
>   	return 0;
>   
> +err_setup_vq_res:
> +	_vdpa_unregister_device(&mvdev->vdev);
>   err_reg:
>   	destroy_workqueue(mvdev->wq);
>   err_res2:
> @@ -3863,6 +3885,11 @@ static void mlx5_vdpa_dev_del(struct vdpa_mgmt_dev *v_mdev, struct vdpa_device *
>   
>   	unregister_link_notifier(ndev);
>   	_vdpa_unregister_device(dev);
> +
> +	down_write(&ndev->reslock);
> +	teardown_vq_resources(ndev);
> +	up_write(&ndev->reslock);
> +
>   	wq = mvdev->wq;
>   	mvdev->wq = NULL;
>   	destroy_workqueue(wq);
> 


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH vhost 20/23] vdpa/mlx5: Pre-create hardware VQs at vdpa .dev_add time
  2024-07-08 16:22   ` Zhu Yanjun
@ 2024-07-08 16:43     ` Dragos Tatulea
  0 siblings, 0 replies; 53+ messages in thread
From: Dragos Tatulea @ 2024-07-08 16:43 UTC (permalink / raw)
  To: xuanzhuo@linux.alibaba.com, Tariq Toukan, eperezma@redhat.com,
	yanjun.zhu@linux.dev, si-wei.liu@oracle.com, mst@redhat.com,
	jasowang@redhat.com, Saeed Mahameed, leon@kernel.org
  Cc: Cosmin Ratiu, linux-kernel@vger.kernel.org,
	virtualization@lists.linux.dev, linux-rdma@vger.kernel.org,
	netdev@vger.kernel.org

Hi Zhu Yanjun,

On Mon, 2024-07-08 at 18:22 +0200, Zhu Yanjun wrote:
> 在 2024/6/17 17:07, Dragos Tatulea 写道:
> > Currently, hardware VQs are created right when the vdpa device gets into
> > DRIVER_OK state. That is easier because most of the VQ state is known by
> > then.
> > 
> > This patch switches to creating all VQs and their associated resources
> > at device creation time. The motivation is to reduce the vdpa device
> > live migration downtime by moving the expensive operation of creating
> > all the hardware VQs and their associated resources out of downtime on
> > the destination VM.
> 
> Hi, Dragos Tatulea
> 
>  From the above, when a device is created, all the VQs and their 
> associated resources are also created.
> If VM live migration does not occur, how much resources are wasted?
> 
> I mean, to achieve a better downtime, how much resource are used?
> 
When you use the vdpa device there are no resources wasted. The HW VQs that were
previously created at VM boot (during DRIVER_OK state) are now created at vdpa
device add time.

The trade-off here is that if you configure different VQ sizes then you will pay
the price of re-creating the VQs.

This could be mitigated by adding a default VQ size parameter that is setable
via the vdpa tool. But this part is not implemented in this series.

Ah, one more thing to keep in mind: the MSIX interrupts will be now allocated at
vdpa device creation time instead of VM startup.

> "
> On a 64 CPU, 256 GB VM with 1 vDPA device of 16 VQps, the full VQ
> resource creation + resume time was ~370ms. Now it's down to 60 ms
> (only VQ config and resume). The measurements were done on a ConnectX6DX
> based vDPA device.
> "
>  From the above, the performance is amazing.
> If we expect to use it in the production hosts, how much resources 
> should we prepare to achieve this downtime?
> 

You do need to have the latest FW (22.41.1000) to be able to get the full
benefit of the optimization.

Thanks,
Dragos
> Zhu Yanjun
> 
> > 
> > The VQs are now created in a blank state. The VQ configuration will
> > happen later, on DRIVER_OK. Then the configuration will be applied when
> > the VQs are moved to the Ready state.
> > 
> > When .set_vq_ready() is called on a VQ before DRIVER_OK, special care is
> > needed: now that the VQ is already created a resume_vq() will be
> > triggered too early when no mr has been configured yet. Skip calling
> > resume_vq() in this case, let it be handled during DRIVER_OK.
> > 
> > For virtio-vdpa, the device configuration is done earlier during
> > .vdpa_dev_add() by vdpa_register_device(). Avoid calling
> > setup_vq_resources() a second time in that case.
> > 
> > Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
> > Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
> > ---
> >   drivers/vdpa/mlx5/net/mlx5_vnet.c | 37 ++++++++++++++++++++++++++++++++-----
> >   1 file changed, 32 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > index 249b5afbe34a..b2836fd3d1dd 100644
> > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > @@ -2444,7 +2444,7 @@ static void mlx5_vdpa_set_vq_ready(struct vdpa_device *vdev, u16 idx, bool ready
> >   	mvq = &ndev->vqs[idx];
> >   	if (!ready) {
> >   		suspend_vq(ndev, mvq);
> > -	} else {
> > +	} else if (mvdev->status & VIRTIO_CONFIG_S_DRIVER_OK) {
> >   		if (resume_vq(ndev, mvq))
> >   			ready = false;
> >   	}
> > @@ -3078,10 +3078,18 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status)
> >   				goto err_setup;
> >   			}
> >   			register_link_notifier(ndev);
> > -			err = setup_vq_resources(ndev, true);
> > -			if (err) {
> > -				mlx5_vdpa_warn(mvdev, "failed to setup driver\n");
> > -				goto err_driver;
> > +			if (ndev->setup) {
> > +				err = resume_vqs(ndev);
> > +				if (err) {
> > +					mlx5_vdpa_warn(mvdev, "failed to resume VQs\n");
> > +					goto err_driver;
> > +				}
> > +			} else {
> > +				err = setup_vq_resources(ndev, true);
> > +				if (err) {
> > +					mlx5_vdpa_warn(mvdev, "failed to setup driver\n");
> > +					goto err_driver;
> > +				}
> >   			}
> >   		} else {
> >   			mlx5_vdpa_warn(mvdev, "did not expect DRIVER_OK to be cleared\n");
> > @@ -3142,6 +3150,7 @@ static int mlx5_vdpa_compat_reset(struct vdpa_device *vdev, u32 flags)
> >   		if (mlx5_vdpa_create_dma_mr(mvdev))
> >   			mlx5_vdpa_warn(mvdev, "create MR failed\n");
> >   	}
> > +	setup_vq_resources(ndev, false);
> >   	up_write(&ndev->reslock);
> >   
> >   	return 0;
> > @@ -3836,8 +3845,21 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name,
> >   		goto err_reg;
> >   
> >   	mgtdev->ndev = ndev;
> > +
> > +	/* For virtio-vdpa, the device was set up during device register. */
> > +	if (ndev->setup)
> > +		return 0;
> > +
> > +	down_write(&ndev->reslock);
> > +	err = setup_vq_resources(ndev, false);
> > +	up_write(&ndev->reslock);
> > +	if (err)
> > +		goto err_setup_vq_res;
> > +
> >   	return 0;
> >   
> > +err_setup_vq_res:
> > +	_vdpa_unregister_device(&mvdev->vdev);
> >   err_reg:
> >   	destroy_workqueue(mvdev->wq);
> >   err_res2:
> > @@ -3863,6 +3885,11 @@ static void mlx5_vdpa_dev_del(struct vdpa_mgmt_dev *v_mdev, struct vdpa_device *
> >   
> >   	unregister_link_notifier(ndev);
> >   	_vdpa_unregister_device(dev);
> > +
> > +	down_write(&ndev->reslock);
> > +	teardown_vq_resources(ndev);
> > +	up_write(&ndev->reslock);
> > +
> >   	wq = mvdev->wq;
> >   	mvdev->wq = NULL;
> >   	destroy_workqueue(wq);
> > 
> 


^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2024-07-08 16:43 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-17 15:07 [PATCH vhost 00/23] vdpa/mlx5: Pre-create HW VQs to reduce LM downtime Dragos Tatulea
2024-06-17 15:07 ` [PATCH vhost 01/23] vdpa/mlx5: Clarify meaning thorough function rename Dragos Tatulea
2024-06-19 10:37   ` Eugenio Perez Martin
2024-06-17 15:07 ` [PATCH vhost 02/23] vdpa/mlx5: Make setup/teardown_vq_resources() symmetrical Dragos Tatulea
2024-06-19 10:38   ` Eugenio Perez Martin
2024-06-17 15:07 ` [PATCH vhost 03/23] vdpa/mlx5: Drop redundant code Dragos Tatulea
2024-06-19 10:55   ` Eugenio Perez Martin
2024-06-17 15:07 ` [PATCH vhost 04/23] vdpa/mlx5: Drop redundant check in teardown_virtqueues() Dragos Tatulea
2024-06-19 10:56   ` Eugenio Perez Martin
2024-06-17 15:07 ` [PATCH vhost 05/23] vdpa/mlx5: Iterate over active VQs during suspend/resume Dragos Tatulea
2024-06-19 11:04   ` Eugenio Perez Martin
2024-06-17 15:07 ` [PATCH vhost 06/23] vdpa/mlx5: Remove duplicate suspend code Dragos Tatulea
2024-06-19 11:02   ` Eugenio Perez Martin
2024-06-17 15:07 ` [PATCH vhost 07/23] vdpa/mlx5: Initialize and reset device with one queue pair Dragos Tatulea
2024-06-17 15:07 ` [PATCH vhost 08/23] vdpa/mlx5: Clear and reinitialize software VQ data on reset Dragos Tatulea
2024-06-19 11:28   ` Eugenio Perez Martin
2024-06-19 17:03     ` Dragos Tatulea
2024-06-17 15:07 ` [PATCH vhost 09/23] vdpa/mlx5: Add support for modifying the virtio_version VQ field Dragos Tatulea
2024-06-17 15:07 ` [PATCH vhost 10/23] vdpa/mlx5: Add support for modifying the VQ features field Dragos Tatulea
2024-06-17 15:07 ` [PATCH vhost 11/23] vdpa/mlx5: Set an initial size on the VQ Dragos Tatulea
2024-06-19 15:08   ` Eugenio Perez Martin
2024-06-19 17:06     ` Dragos Tatulea
2024-06-17 15:07 ` [PATCH vhost 12/23] vdpa/mlx5: Start off rqt_size with max VQPs Dragos Tatulea
2024-06-19 15:33   ` Eugenio Perez Martin
2024-06-17 15:07 ` [PATCH vhost 13/23] vdpa/mlx5: Set mkey modified flags on all VQs Dragos Tatulea
2024-06-19 15:33   ` Eugenio Perez Martin
2024-06-17 15:07 ` [PATCH vhost 14/23] vdpa/mlx5: Allow creation of blank VQs Dragos Tatulea
2024-06-17 15:07 ` [PATCH vhost 15/23] vdpa/mlx5: Accept Init -> Ready VQ transition in resume_vq() Dragos Tatulea
2024-06-17 15:07 ` [PATCH vhost 16/23] vdpa/mlx5: Add error code for suspend/resume VQ Dragos Tatulea
2024-06-19 15:41   ` Eugenio Perez Martin
2024-06-17 15:07 ` [PATCH vhost 17/23] vdpa/mlx5: Consolidate all VQ modify to Ready to use resume_vq() Dragos Tatulea
2024-06-19 15:43   ` Eugenio Perez Martin
2024-06-17 15:07 ` [PATCH vhost 18/23] vdpa/mlx5: Forward error in suspend/resume device Dragos Tatulea
2024-06-23 11:19   ` Zhu Yanjun
2024-06-26  9:28     ` Dragos Tatulea
2024-06-17 15:07 ` [PATCH vhost 19/23] vdpa/mlx5: Use suspend/resume during VQP change Dragos Tatulea
2024-06-19 15:46   ` Eugenio Perez Martin
2024-06-17 15:07 ` [PATCH vhost 20/23] vdpa/mlx5: Pre-create hardware VQs at vdpa .dev_add time Dragos Tatulea
2024-06-19 15:54   ` Eugenio Perez Martin
2024-06-26  9:27     ` Dragos Tatulea
2024-07-03 16:01       ` Eugenio Perez Martin
2024-07-08 11:01         ` Dragos Tatulea
2024-07-08 11:11           ` Michael S. Tsirkin
2024-07-08 11:17             ` Dragos Tatulea
2024-07-08 11:25               ` Michael S. Tsirkin
2024-07-08 16:22   ` Zhu Yanjun
2024-07-08 16:43     ` Dragos Tatulea
2024-06-17 15:07 ` [PATCH vhost 21/23] vdpa/mlx5: Re-create HW VQs under certain conditions Dragos Tatulea
2024-06-19 16:04   ` Eugenio Perez Martin
2024-06-17 15:07 ` [PATCH vhost 22/23] vdpa/mlx5: Don't reset VQs more than necessary Dragos Tatulea
2024-06-19 16:14   ` Eugenio Perez Martin
2024-06-17 15:07 ` [PATCH vhost 23/23] vdpa/mlx5: Don't enable non-active VQs in .set_vq_ready() Dragos Tatulea
2024-06-19 16:39   ` Eugenio Perez Martin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).