virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH 1/2] vdpa/mlx5: Fix suspend/resume index restoration
       [not found] <20210216162001.83541-1-elic@nvidia.com>
@ 2021-02-17 19:42 ` Si-Wei Liu
  2021-02-17 21:20   ` Michael S. Tsirkin
  2021-02-20  2:42   ` Si-Wei Liu
  0 siblings, 2 replies; 4+ messages in thread
From: Si-Wei Liu @ 2021-02-17 19:42 UTC (permalink / raw)
  To: Eli Cohen, mst, jasowang, linux-kernel, virtualization, netdev



On 2/16/2021 8:20 AM, Eli Cohen wrote:
> When we suspend the VM, the VDPA interface will be reset. When the VM is
> resumed again, clear_virtqueues() will clear the available and used
> indices resulting in hardware virqtqueue objects becoming out of sync.
> We can avoid this function alltogether since qemu will clear them if
> required, e.g. when the VM went through a reboot.
>
> Moreover, since the hw available and used indices should always be
> identical on query and should be restored to the same value same value
> for virtqueues that complete in order, we set the single value provided
> by set_vq_state(). In get_vq_state() we return the value of hardware
> used index.
>
> Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
> Signed-off-by: Eli Cohen <elic@nvidia.com>
Acked-by: Si-Wei Liu <si-wei.liu@oracle.com>

> ---
>   drivers/vdpa/mlx5/net/mlx5_vnet.c | 17 ++++-------------
>   1 file changed, 4 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index b8e9d525d66c..a51b0f86afe2 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -1169,6 +1169,7 @@ static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *m
>   		return;
>   	}
>   	mvq->avail_idx = attr.available_index;
> +	mvq->used_idx = attr.used_index;
>   }
>   
>   static void suspend_vqs(struct mlx5_vdpa_net *ndev)
> @@ -1426,6 +1427,7 @@ static int mlx5_vdpa_set_vq_state(struct vdpa_device *vdev, u16 idx,
>   		return -EINVAL;
>   	}
>   
> +	mvq->used_idx = state->avail_index;
>   	mvq->avail_idx = state->avail_index;
>   	return 0;
>   }
> @@ -1443,7 +1445,7 @@ static int mlx5_vdpa_get_vq_state(struct vdpa_device *vdev, u16 idx, struct vdpa
>   	 * that cares about emulating the index after vq is stopped.
>   	 */
>   	if (!mvq->initialized) {
> -		state->avail_index = mvq->avail_idx;
> +		state->avail_index = mvq->used_idx;
>   		return 0;
>   	}
>   
> @@ -1452,7 +1454,7 @@ static int mlx5_vdpa_get_vq_state(struct vdpa_device *vdev, u16 idx, struct vdpa
>   		mlx5_vdpa_warn(mvdev, "failed to query virtqueue\n");
>   		return err;
>   	}
> -	state->avail_index = attr.available_index;
> +	state->avail_index = attr.used_index;
>   	return 0;
>   }
>   
> @@ -1532,16 +1534,6 @@ static void teardown_virtqueues(struct mlx5_vdpa_net *ndev)
>   	}
>   }
>   
> -static void clear_virtqueues(struct mlx5_vdpa_net *ndev)
> -{
> -	int i;
> -
> -	for (i = ndev->mvdev.max_vqs - 1; i >= 0; i--) {
> -		ndev->vqs[i].avail_idx = 0;
> -		ndev->vqs[i].used_idx = 0;
> -	}
> -}
> -
>   /* TODO: cross-endian support */
>   static inline bool mlx5_vdpa_is_little_endian(struct mlx5_vdpa_dev *mvdev)
>   {
> @@ -1777,7 +1769,6 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status)
>   	if (!status) {
>   		mlx5_vdpa_info(mvdev, "performing device reset\n");
>   		teardown_driver(ndev);
> -		clear_virtqueues(ndev);
>   		mlx5_vdpa_destroy_mr(&ndev->mvdev);
>   		ndev->mvdev.status = 0;
>   		++mvdev->generation;

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2] vdpa/mlx5: Fix suspend/resume index restoration
  2021-02-17 19:42 ` [PATCH 1/2] vdpa/mlx5: Fix suspend/resume index restoration Si-Wei Liu
@ 2021-02-17 21:20   ` Michael S. Tsirkin
  2021-02-17 21:51     ` Si-Wei Liu
  2021-02-20  2:42   ` Si-Wei Liu
  1 sibling, 1 reply; 4+ messages in thread
From: Michael S. Tsirkin @ 2021-02-17 21:20 UTC (permalink / raw)
  To: Si-Wei Liu; +Cc: netdev, Eli Cohen, virtualization, linux-kernel

On Wed, Feb 17, 2021 at 11:42:48AM -0800, Si-Wei Liu wrote:
> 
> 
> On 2/16/2021 8:20 AM, Eli Cohen wrote:
> > When we suspend the VM, the VDPA interface will be reset. When the VM is
> > resumed again, clear_virtqueues() will clear the available and used
> > indices resulting in hardware virqtqueue objects becoming out of sync.
> > We can avoid this function alltogether since qemu will clear them if
> > required, e.g. when the VM went through a reboot.
> > 
> > Moreover, since the hw available and used indices should always be
> > identical on query and should be restored to the same value same value
> > for virtqueues that complete in order, we set the single value provided
> > by set_vq_state(). In get_vq_state() we return the value of hardware
> > used index.
> > 
> > Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
> > Signed-off-by: Eli Cohen <elic@nvidia.com>
> Acked-by: Si-Wei Liu <si-wei.liu@oracle.com>


Seems to also fix b35ccebe3ef76168aa2edaa35809c0232cb3578e, right?


> > ---
> >   drivers/vdpa/mlx5/net/mlx5_vnet.c | 17 ++++-------------
> >   1 file changed, 4 insertions(+), 13 deletions(-)
> > 
> > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > index b8e9d525d66c..a51b0f86afe2 100644
> > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > @@ -1169,6 +1169,7 @@ static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *m
> >   		return;
> >   	}
> >   	mvq->avail_idx = attr.available_index;
> > +	mvq->used_idx = attr.used_index;
> >   }
> >   static void suspend_vqs(struct mlx5_vdpa_net *ndev)
> > @@ -1426,6 +1427,7 @@ static int mlx5_vdpa_set_vq_state(struct vdpa_device *vdev, u16 idx,
> >   		return -EINVAL;
> >   	}
> > +	mvq->used_idx = state->avail_index;
> >   	mvq->avail_idx = state->avail_index;
> >   	return 0;
> >   }
> > @@ -1443,7 +1445,7 @@ static int mlx5_vdpa_get_vq_state(struct vdpa_device *vdev, u16 idx, struct vdpa
> >   	 * that cares about emulating the index after vq is stopped.
> >   	 */
> >   	if (!mvq->initialized) {
> > -		state->avail_index = mvq->avail_idx;
> > +		state->avail_index = mvq->used_idx;
> >   		return 0;
> >   	}
> > @@ -1452,7 +1454,7 @@ static int mlx5_vdpa_get_vq_state(struct vdpa_device *vdev, u16 idx, struct vdpa
> >   		mlx5_vdpa_warn(mvdev, "failed to query virtqueue\n");
> >   		return err;
> >   	}
> > -	state->avail_index = attr.available_index;
> > +	state->avail_index = attr.used_index;
> >   	return 0;
> >   }
> > @@ -1532,16 +1534,6 @@ static void teardown_virtqueues(struct mlx5_vdpa_net *ndev)
> >   	}
> >   }
> > -static void clear_virtqueues(struct mlx5_vdpa_net *ndev)
> > -{
> > -	int i;
> > -
> > -	for (i = ndev->mvdev.max_vqs - 1; i >= 0; i--) {
> > -		ndev->vqs[i].avail_idx = 0;
> > -		ndev->vqs[i].used_idx = 0;
> > -	}
> > -}
> > -
> >   /* TODO: cross-endian support */
> >   static inline bool mlx5_vdpa_is_little_endian(struct mlx5_vdpa_dev *mvdev)
> >   {
> > @@ -1777,7 +1769,6 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status)
> >   	if (!status) {
> >   		mlx5_vdpa_info(mvdev, "performing device reset\n");
> >   		teardown_driver(ndev);
> > -		clear_virtqueues(ndev);
> >   		mlx5_vdpa_destroy_mr(&ndev->mvdev);
> >   		ndev->mvdev.status = 0;
> >   		++mvdev->generation;

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2] vdpa/mlx5: Fix suspend/resume index restoration
  2021-02-17 21:20   ` Michael S. Tsirkin
@ 2021-02-17 21:51     ` Si-Wei Liu
  0 siblings, 0 replies; 4+ messages in thread
From: Si-Wei Liu @ 2021-02-17 21:51 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: netdev, Eli Cohen, virtualization, linux-kernel



On 2/17/2021 1:20 PM, Michael S. Tsirkin wrote:
> On Wed, Feb 17, 2021 at 11:42:48AM -0800, Si-Wei Liu wrote:
>>
>> On 2/16/2021 8:20 AM, Eli Cohen wrote:
>>> When we suspend the VM, the VDPA interface will be reset. When the VM is
>>> resumed again, clear_virtqueues() will clear the available and used
>>> indices resulting in hardware virqtqueue objects becoming out of sync.
>>> We can avoid this function alltogether since qemu will clear them if
>>> required, e.g. when the VM went through a reboot.
>>>
>>> Moreover, since the hw available and used indices should always be
>>> identical on query and should be restored to the same value same value
>>> for virtqueues that complete in order, we set the single value provided
>>> by set_vq_state(). In get_vq_state() we return the value of hardware
>>> used index.
>>>
>>> Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
>>> Signed-off-by: Eli Cohen <elic@nvidia.com>
>> Acked-by: Si-Wei Liu <si-wei.liu@oracle.com>
>
> Seems to also fix b35ccebe3ef76168aa2edaa35809c0232cb3578e, right?
I think so. It should have both "Fixes" tags.

-Siwei
>
>>> ---
>>>    drivers/vdpa/mlx5/net/mlx5_vnet.c | 17 ++++-------------
>>>    1 file changed, 4 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>> index b8e9d525d66c..a51b0f86afe2 100644
>>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>>> @@ -1169,6 +1169,7 @@ static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *m
>>>    		return;
>>>    	}
>>>    	mvq->avail_idx = attr.available_index;
>>> +	mvq->used_idx = attr.used_index;
>>>    }
>>>    static void suspend_vqs(struct mlx5_vdpa_net *ndev)
>>> @@ -1426,6 +1427,7 @@ static int mlx5_vdpa_set_vq_state(struct vdpa_device *vdev, u16 idx,
>>>    		return -EINVAL;
>>>    	}
>>> +	mvq->used_idx = state->avail_index;
>>>    	mvq->avail_idx = state->avail_index;
>>>    	return 0;
>>>    }
>>> @@ -1443,7 +1445,7 @@ static int mlx5_vdpa_get_vq_state(struct vdpa_device *vdev, u16 idx, struct vdpa
>>>    	 * that cares about emulating the index after vq is stopped.
>>>    	 */
>>>    	if (!mvq->initialized) {
>>> -		state->avail_index = mvq->avail_idx;
>>> +		state->avail_index = mvq->used_idx;
>>>    		return 0;
>>>    	}
>>> @@ -1452,7 +1454,7 @@ static int mlx5_vdpa_get_vq_state(struct vdpa_device *vdev, u16 idx, struct vdpa
>>>    		mlx5_vdpa_warn(mvdev, "failed to query virtqueue\n");
>>>    		return err;
>>>    	}
>>> -	state->avail_index = attr.available_index;
>>> +	state->avail_index = attr.used_index;
>>>    	return 0;
>>>    }
>>> @@ -1532,16 +1534,6 @@ static void teardown_virtqueues(struct mlx5_vdpa_net *ndev)
>>>    	}
>>>    }
>>> -static void clear_virtqueues(struct mlx5_vdpa_net *ndev)
>>> -{
>>> -	int i;
>>> -
>>> -	for (i = ndev->mvdev.max_vqs - 1; i >= 0; i--) {
>>> -		ndev->vqs[i].avail_idx = 0;
>>> -		ndev->vqs[i].used_idx = 0;
>>> -	}
>>> -}
>>> -
>>>    /* TODO: cross-endian support */
>>>    static inline bool mlx5_vdpa_is_little_endian(struct mlx5_vdpa_dev *mvdev)
>>>    {
>>> @@ -1777,7 +1769,6 @@ static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status)
>>>    	if (!status) {
>>>    		mlx5_vdpa_info(mvdev, "performing device reset\n");
>>>    		teardown_driver(ndev);
>>> -		clear_virtqueues(ndev);
>>>    		mlx5_vdpa_destroy_mr(&ndev->mvdev);
>>>    		ndev->mvdev.status = 0;
>>>    		++mvdev->generation;

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2] vdpa/mlx5: Fix suspend/resume index restoration
  2021-02-17 19:42 ` [PATCH 1/2] vdpa/mlx5: Fix suspend/resume index restoration Si-Wei Liu
  2021-02-17 21:20   ` Michael S. Tsirkin
@ 2021-02-20  2:42   ` Si-Wei Liu
  1 sibling, 0 replies; 4+ messages in thread
From: Si-Wei Liu @ 2021-02-20  2:42 UTC (permalink / raw)
  To: Eli Cohen, jasowang; +Cc: netdev, virtualization, linux-kernel, mst



On 2/17/2021 11:42 AM, Si-Wei Liu wrote:
>
>
> On 2/16/2021 8:20 AM, Eli Cohen wrote:
>> When we suspend the VM, the VDPA interface will be reset. When the VM is
>> resumed again, clear_virtqueues() will clear the available and used
>> indices resulting in hardware virqtqueue objects becoming out of sync.
>> We can avoid this function alltogether since qemu will clear them if
>> required, e.g. when the VM went through a reboot.
>>
>> Moreover, since the hw available and used indices should always be
>> identical on query and should be restored to the same value same value
>> for virtqueues that complete in order, we set the single value provided
>> by set_vq_state(). In get_vq_state() we return the value of hardware
>> used index.
>>
>> Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 
>> devices")
>> Signed-off-by: Eli Cohen <elic@nvidia.com>
> Acked-by: Si-Wei Liu <si-wei.liu@oracle.com>

I'd take it back for the moment, according to Jason's latest comment. 
Discussion still going.

>
>> ---
>>   drivers/vdpa/mlx5/net/mlx5_vnet.c | 17 ++++-------------
>>   1 file changed, 4 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
>> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>> index b8e9d525d66c..a51b0f86afe2 100644
>> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
>> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
>> @@ -1169,6 +1169,7 @@ static void suspend_vq(struct mlx5_vdpa_net 
>> *ndev, struct mlx5_vdpa_virtqueue *m
>>           return;
>>       }
>>       mvq->avail_idx = attr.available_index;
>> +    mvq->used_idx = attr.used_index;
>>   }
>>     static void suspend_vqs(struct mlx5_vdpa_net *ndev)
>> @@ -1426,6 +1427,7 @@ static int mlx5_vdpa_set_vq_state(struct 
>> vdpa_device *vdev, u16 idx,
>>           return -EINVAL;
>>       }
>>   +    mvq->used_idx = state->avail_index;
>>       mvq->avail_idx = state->avail_index;
This is where things starts getting interesting. According to Jason, the 
original expectation of this API would be to restore the internal 
last_avail_index in the hardware. With Mellanox network vDPA hardware 
implementation, there's no such last_avail_index thing in the hardware 
but only a single last_used_index representing both indices, which 
should always be the same and in sync with each other. So from migration 
point of view, it appears logical to restore the saved last_avail_index 
to the last_used_index in the hardware, right? But what is the point to 
restore mvq->avail_idx?

Actually, this code path is being repurposed to address the index reset 
problem in the device reset scenario. Where the mvq->avail_idx and 
mvq->used_idx are both reset to 0 once device is reset. This is a bit 
crossing the boundary to me.


>>       return 0;
>>   }
>> @@ -1443,7 +1445,7 @@ static int mlx5_vdpa_get_vq_state(struct 
>> vdpa_device *vdev, u16 idx, struct vdpa
>>        * that cares about emulating the index after vq is stopped.
>>        */
>>       if (!mvq->initialized) {
>> -        state->avail_index = mvq->avail_idx;
>> +        state->avail_index = mvq->used_idx;
This is where the last_used_index gets loaded from the hardware (when 
device is stopped).

>>           return 0;
>>       }
>>   @@ -1452,7 +1454,7 @@ static int mlx5_vdpa_get_vq_state(struct 
>> vdpa_device *vdev, u16 idx, struct vdpa
>>           mlx5_vdpa_warn(mvdev, "failed to query virtqueue\n");
>>           return err;
>>       }
>> -    state->avail_index = attr.available_index;
>> +    state->avail_index = attr.used_index;
This code path never gets called from userspace (when device is running).

-Siwei

>>       return 0;
>>   }
>>   @@ -1532,16 +1534,6 @@ static void teardown_virtqueues(struct 
>> mlx5_vdpa_net *ndev)
>>       }
>>   }
>>   -static void clear_virtqueues(struct mlx5_vdpa_net *ndev)
>> -{
>> -    int i;
>> -
>> -    for (i = ndev->mvdev.max_vqs - 1; i >= 0; i--) {
>> -        ndev->vqs[i].avail_idx = 0;
>> -        ndev->vqs[i].used_idx = 0;
>> -    }
>> -}
>> -
>>   /* TODO: cross-endian support */
>>   static inline bool mlx5_vdpa_is_little_endian(struct mlx5_vdpa_dev 
>> *mvdev)
>>   {
>> @@ -1777,7 +1769,6 @@ static void mlx5_vdpa_set_status(struct 
>> vdpa_device *vdev, u8 status)
>>       if (!status) {
>>           mlx5_vdpa_info(mvdev, "performing device reset\n");
>>           teardown_driver(ndev);
>> -        clear_virtqueues(ndev);
>>           mlx5_vdpa_destroy_mr(&ndev->mvdev);
>>           ndev->mvdev.status = 0;
>>           ++mvdev->generation;
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-02-20  2:42 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20210216162001.83541-1-elic@nvidia.com>
2021-02-17 19:42 ` [PATCH 1/2] vdpa/mlx5: Fix suspend/resume index restoration Si-Wei Liu
2021-02-17 21:20   ` Michael S. Tsirkin
2021-02-17 21:51     ` Si-Wei Liu
2021-02-20  2:42   ` Si-Wei Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).