* [PATCH 0/2] Fix the virito features negotiation flaw
@ 2022-09-26 6:36 huangy81
2022-09-26 6:36 ` [PATCH 1/2] vhost-user: Refactor vhost acked features saving huangy81
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: huangy81 @ 2022-09-26 6:36 UTC (permalink / raw)
To: qemu-devel
Cc: Michael S . Tsirkin, Jason Wang, tugy, baiyw2, dengpc12, liuym16,
yangchg, yuanmh12, zhoupx, huangy81, lic121
From: "Hyman Huang(黄勇)" <huangy81@chinatelecom.cn>
This patchset aim to fix the unexpected negotiation features for
vhost-user netdev interface.
Steps to reproduce the issue:
Prepare a vm (CentOS 8 in my work scenario) with vhost-user
backend interface and configure qemu as server mode. So dpdk
would connect qemu's unix socket periodically.
1. start vm in background and restart openvswitch service
concurrently and repeatedly in the process of vm start.
2. check if negotiated virtio features of port is "0x40000000" at
dpdk side by executing:
ovs-vsctl list interface | grep features | grep {port_socket_path}
3. if features equals "0x40000000", go to the vm and check if sending
arp package works, executing:
arping {IP_ADDR}
if vm interface is configured to boot with dhcp protocol, it
would get no ip.
After doing above steps, we'll find the arping not work, the ovs on
host side has forwarded unexpected arp packages, which be added 0x0000
in the head of ethenet frame. Though qemu report some error when
read/write cmd of vhost protocol during the process of vm start,
like the following:
"Failed to set msg fds"
"vhost VQ 0 ring restore failed: -22: Invalid argument (22)"
The vm does not stop or report more suggestive error message, it
seems that everthing is ok.
The root cause is that dpdk port negotiated nothing but only one
VHOST_USER_F_PROTOCOL_FEATURES feature with vhost-user interface at
qemu side, which is an unexpected behavior. qemu only load the
VHOST_USER_F_PROTOCOL_FEATURES when VHOST_USER_SET_FEATURES and loss
the guest features configured by front-end virtio driver using the
VIRTIO_PCI_COMMON_GF addr, which is stored in acked_features field
of struct vhost_dev.
To explain how the acked_features disappear, we may need to know the
lifecyle of acked_features in vhost_dev during feature negotiation.
1. qemu init acked_features field of struct vhost_dev in vhost_net_init()
by calling vhost_net_ack_features(), the init value fetched from
acked_features field of struct NetVhostUserState, which is the backup
role after vhost stopping or unix socket closed.
In the first time, the acked_features of struct NetVhostUserState is 0
so the init value of vhost_dev's acked_features also 0.
2. when guest virtio driver set features, qemu accept the features and
call virtio_set_features to store the features as acked_features in
vhost_dev.
3. when unix socket closed or vhost_dev device doesn't work and be
stopped unexpectedly, qemu will call chr_closed_bh or vhost_user_stop,
which will copy acked_features from vhost_dev to NetVhostUserState and
cleanup the vhost_dev. Since virtio driver not allowed to set features
once status of virtio device changes to VIRTIO_CONFIG_S_FEATURE_OK,
qemu need to backup it in case of loss.
4. once unix socket return to normal and get connected, qemu will
call vhost_user_start to restore the vhost_dev and fetch the
acked_features stored in NetVhostUserState previously.
The above flow works fine in the normal scenarios, but it doesn't cover
the scenario that openvswitch service restart in the same time of
virtio features negotiation.
Let's analyze such scenario:
qemu dpdk
vhost_net_init()
| systemctl stop openvswitch.service
virtio_set_features() |
| systemctl start openvswitch.service
virtio_set_status()
Ovs stop service before guset setting virtio features, chr_closed_bh()
be called and fetch acked_features in vhost_dev, if may store the
incomplete features to NetVhostUserState since it doesn't include
guest features, eg "0x40000000".
Guest set virtio features with another features, eg "0x7060a782",
this value will store in acked_features of vhost_dev, which is the
right and up-to-date features.
After ovs service show up, qemu unix socket get connected and call
vhost_user_start(), which will restore acked_features of vhost_dev
using NetVhostUserState and "0x40000000" be loaded, which is obsolete.
Guest set virtio device status and therefore qemu call
virtio_net_vhost_status finally, checking if vhost-net device has
started, start it if not, consequently the obsolete acked_features
"0x40000000" be negotiated after calling vhost_dev_set_features().
So the key point of solving this issue making the acked_features
in NetVhostUserState up-to-date, these patchset provide this
solution.
[PATCH 1/2]: Abstract the existing code of saving acked_features
into vhost_user_save_acked_features so the next
patch seems clean.
[PATCH 2/2]: Save the acked_features to NetVhostUserState once
Guest virtio driver configured. This step makes
acked_features in NetVhostUserState up-to-date.
Please review, any comments and suggestions are welcome.
Best regard.
Yong
Hyman Huang (2):
vhost-user: Refactor vhost acked features saving
vhost-net: Fix the virito features negotiation flaw
hw/net/vhost_net.c | 9 +++++++++
hw/net/virtio-net.c | 5 +++++
include/net/vhost-user.h | 2 ++
include/net/vhost_net.h | 2 ++
net/vhost-user.c | 35 +++++++++++++++++++----------------
5 files changed, 37 insertions(+), 16 deletions(-)
--
2.27.0
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/2] vhost-user: Refactor vhost acked features saving
2022-09-26 6:36 [PATCH 0/2] Fix the virito features negotiation flaw huangy81
@ 2022-09-26 6:36 ` huangy81
2022-10-07 14:01 ` Michael S. Tsirkin
2022-09-26 6:36 ` [PATCH 2/2] vhost-net: Fix the virito features negotiation flaw huangy81
2022-10-05 8:17 ` [PATCH 0/2] " Hyman Huang
2 siblings, 1 reply; 9+ messages in thread
From: huangy81 @ 2022-09-26 6:36 UTC (permalink / raw)
To: qemu-devel
Cc: Michael S . Tsirkin, Jason Wang, tugy, baiyw2, dengpc12, liuym16,
yangchg, yuanmh12, zhoupx, huangy81, lic121
From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
Abstract vhost acked features saving into
vhost_user_save_acked_features, export it as util function.
Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
Signed-off-by: Guoyi Tu <tugy@chinatelecom.cn>
---
include/net/vhost-user.h | 2 ++
net/vhost-user.c | 35 +++++++++++++++++++----------------
2 files changed, 21 insertions(+), 16 deletions(-)
diff --git a/include/net/vhost-user.h b/include/net/vhost-user.h
index 5bcd8a6285..00d46613d3 100644
--- a/include/net/vhost-user.h
+++ b/include/net/vhost-user.h
@@ -14,5 +14,7 @@
struct vhost_net;
struct vhost_net *vhost_user_get_vhost_net(NetClientState *nc);
uint64_t vhost_user_get_acked_features(NetClientState *nc);
+void vhost_user_save_acked_features(NetClientState *nc,
+ bool cleanup);
#endif /* VHOST_USER_H */
diff --git a/net/vhost-user.c b/net/vhost-user.c
index b1a0247b59..c512cc9727 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -45,24 +45,31 @@ uint64_t vhost_user_get_acked_features(NetClientState *nc)
return s->acked_features;
}
-static void vhost_user_stop(int queues, NetClientState *ncs[])
+void vhost_user_save_acked_features(NetClientState *nc, bool cleanup)
{
NetVhostUserState *s;
+
+ s = DO_UPCAST(NetVhostUserState, nc, nc);
+ if (s->vhost_net) {
+ uint64_t features = vhost_net_get_acked_features(s->vhost_net);
+ if (features) {
+ s->acked_features = features;
+ }
+
+ if (cleanup) {
+ vhost_net_cleanup(s->vhost_net);
+ }
+ }
+}
+
+static void vhost_user_stop(int queues, NetClientState *ncs[])
+{
int i;
for (i = 0; i < queues; i++) {
assert(ncs[i]->info->type == NET_CLIENT_DRIVER_VHOST_USER);
- s = DO_UPCAST(NetVhostUserState, nc, ncs[i]);
-
- if (s->vhost_net) {
- /* save acked features */
- uint64_t features = vhost_net_get_acked_features(s->vhost_net);
- if (features) {
- s->acked_features = features;
- }
- vhost_net_cleanup(s->vhost_net);
- }
+ vhost_user_save_acked_features(ncs[i], true);
}
}
@@ -251,11 +258,7 @@ static void chr_closed_bh(void *opaque)
s = DO_UPCAST(NetVhostUserState, nc, ncs[0]);
for (i = queues -1; i >= 0; i--) {
- s = DO_UPCAST(NetVhostUserState, nc, ncs[i]);
-
- if (s->vhost_net) {
- s->acked_features = vhost_net_get_acked_features(s->vhost_net);
- }
+ vhost_user_save_acked_features(ncs[i], false);
}
qmp_set_link(name, false, &err);
--
2.27.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/2] vhost-net: Fix the virito features negotiation flaw
2022-09-26 6:36 [PATCH 0/2] Fix the virito features negotiation flaw huangy81
2022-09-26 6:36 ` [PATCH 1/2] vhost-user: Refactor vhost acked features saving huangy81
@ 2022-09-26 6:36 ` huangy81
2022-10-05 8:17 ` [PATCH 0/2] " Hyman Huang
2 siblings, 0 replies; 9+ messages in thread
From: huangy81 @ 2022-09-26 6:36 UTC (permalink / raw)
To: qemu-devel
Cc: Michael S . Tsirkin, Jason Wang, tugy, baiyw2, dengpc12, liuym16,
yangchg, yuanmh12, zhoupx, huangy81, lic121
From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
Save the acked_features once it be configured by guest
virtio driver so it can't miss any features.
Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
Signed-off-by: Guoyi Tu <tugy@chinatelecom.cn>
---
hw/net/vhost_net.c | 9 +++++++++
hw/net/virtio-net.c | 5 +++++
include/net/vhost_net.h | 2 ++
3 files changed, 16 insertions(+)
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index d28f8b974b..2bffc276b9 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -141,6 +141,15 @@ uint64_t vhost_net_get_acked_features(VHostNetState *net)
return net->dev.acked_features;
}
+void vhost_net_save_acked_features(NetClientState *nc)
+{
+ if (nc->info->type != NET_CLIENT_DRIVER_VHOST_USER) {
+ return;
+ }
+
+ vhost_user_save_acked_features(nc, false);
+}
+
static int vhost_net_get_fd(NetClientState *backend)
{
switch (backend->info->type) {
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index dd0d056fde..69c00b4e74 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -918,6 +918,11 @@ static void virtio_net_set_features(VirtIODevice *vdev, uint64_t features)
continue;
}
vhost_net_ack_features(get_vhost_net(nc->peer), features);
+ /*
+ * keep acked_features in NetVhostUserState up-to-date so it
+ * can't miss any features configured by guest virtio driver.
+ */
+ vhost_net_save_acked_features(nc->peer);
}
if (virtio_has_feature(features, VIRTIO_NET_F_CTRL_VLAN)) {
diff --git a/include/net/vhost_net.h b/include/net/vhost_net.h
index 387e913e4e..3a5579b075 100644
--- a/include/net/vhost_net.h
+++ b/include/net/vhost_net.h
@@ -46,6 +46,8 @@ int vhost_set_vring_enable(NetClientState * nc, int enable);
uint64_t vhost_net_get_acked_features(VHostNetState *net);
+void vhost_net_save_acked_features(NetClientState *nc);
+
int vhost_net_set_mtu(struct vhost_net *net, uint16_t mtu);
#endif
--
2.27.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 0/2] Fix the virito features negotiation flaw
2022-09-26 6:36 [PATCH 0/2] Fix the virito features negotiation flaw huangy81
2022-09-26 6:36 ` [PATCH 1/2] vhost-user: Refactor vhost acked features saving huangy81
2022-09-26 6:36 ` [PATCH 2/2] vhost-net: Fix the virito features negotiation flaw huangy81
@ 2022-10-05 8:17 ` Hyman Huang
2022-10-26 15:00 ` Michael S. Tsirkin
2 siblings, 1 reply; 9+ messages in thread
From: Hyman Huang @ 2022-10-05 8:17 UTC (permalink / raw)
To: qemu-devel
Cc: Michael S . Tsirkin, Jason Wang, tugy, baiyw2, dengpc12, liuym16,
yangchg, yuanmh12, zhoupx, lic121
Ping,
Hi, Michael and Jason, how does this patchset feel think? :)
Sorry if i made noise.
Yong
在 2022/9/26 14:36, huangy81@chinatelecom.cn 写道:
> From: "Hyman Huang(黄勇)" <huangy81@chinatelecom.cn>
>
> This patchset aim to fix the unexpected negotiation features for
> vhost-user netdev interface.
>
> Steps to reproduce the issue:
> Prepare a vm (CentOS 8 in my work scenario) with vhost-user
> backend interface and configure qemu as server mode. So dpdk
> would connect qemu's unix socket periodically.
>
> 1. start vm in background and restart openvswitch service
> concurrently and repeatedly in the process of vm start.
>
> 2. check if negotiated virtio features of port is "0x40000000" at
> dpdk side by executing:
> ovs-vsctl list interface | grep features | grep {port_socket_path}
>
> 3. if features equals "0x40000000", go to the vm and check if sending
> arp package works, executing:
> arping {IP_ADDR}
> if vm interface is configured to boot with dhcp protocol, it
> would get no ip.
>
> After doing above steps, we'll find the arping not work, the ovs on
> host side has forwarded unexpected arp packages, which be added 0x0000
> in the head of ethenet frame. Though qemu report some error when
> read/write cmd of vhost protocol during the process of vm start,
> like the following:
>
> "Failed to set msg fds"
> "vhost VQ 0 ring restore failed: -22: Invalid argument (22)"
>
> The vm does not stop or report more suggestive error message, it
> seems that everthing is ok.
>
> The root cause is that dpdk port negotiated nothing but only one
> VHOST_USER_F_PROTOCOL_FEATURES feature with vhost-user interface at
> qemu side, which is an unexpected behavior. qemu only load the
> VHOST_USER_F_PROTOCOL_FEATURES when VHOST_USER_SET_FEATURES and loss
> the guest features configured by front-end virtio driver using the
> VIRTIO_PCI_COMMON_GF addr, which is stored in acked_features field
> of struct vhost_dev.
>
> To explain how the acked_features disappear, we may need to know the
> lifecyle of acked_features in vhost_dev during feature negotiation.
>
> 1. qemu init acked_features field of struct vhost_dev in vhost_net_init()
> by calling vhost_net_ack_features(), the init value fetched from
> acked_features field of struct NetVhostUserState, which is the backup
> role after vhost stopping or unix socket closed.
> In the first time, the acked_features of struct NetVhostUserState is 0
> so the init value of vhost_dev's acked_features also 0.
>
> 2. when guest virtio driver set features, qemu accept the features and
> call virtio_set_features to store the features as acked_features in
> vhost_dev.
>
> 3. when unix socket closed or vhost_dev device doesn't work and be
> stopped unexpectedly, qemu will call chr_closed_bh or vhost_user_stop,
> which will copy acked_features from vhost_dev to NetVhostUserState and
> cleanup the vhost_dev. Since virtio driver not allowed to set features
> once status of virtio device changes to VIRTIO_CONFIG_S_FEATURE_OK,
> qemu need to backup it in case of loss.
>
> 4. once unix socket return to normal and get connected, qemu will
> call vhost_user_start to restore the vhost_dev and fetch the
> acked_features stored in NetVhostUserState previously.
>
> The above flow works fine in the normal scenarios, but it doesn't cover
> the scenario that openvswitch service restart in the same time of
> virtio features negotiation.
>
> Let's analyze such scenario:
> qemu dpdk
>
> vhost_net_init()
> | systemctl stop openvswitch.service
> virtio_set_features() |
> | systemctl start openvswitch.service
> virtio_set_status()
>
> Ovs stop service before guset setting virtio features, chr_closed_bh()
> be called and fetch acked_features in vhost_dev, if may store the
> incomplete features to NetVhostUserState since it doesn't include
> guest features, eg "0x40000000".
>
> Guest set virtio features with another features, eg "0x7060a782",
> this value will store in acked_features of vhost_dev, which is the
> right and up-to-date features.
>
> After ovs service show up, qemu unix socket get connected and call
> vhost_user_start(), which will restore acked_features of vhost_dev
> using NetVhostUserState and "0x40000000" be loaded, which is obsolete.
>
> Guest set virtio device status and therefore qemu call
> virtio_net_vhost_status finally, checking if vhost-net device has
> started, start it if not, consequently the obsolete acked_features
> "0x40000000" be negotiated after calling vhost_dev_set_features().
>
> So the key point of solving this issue making the acked_features
> in NetVhostUserState up-to-date, these patchset provide this
> solution.
>
> [PATCH 1/2]: Abstract the existing code of saving acked_features
> into vhost_user_save_acked_features so the next
> patch seems clean.
>
> [PATCH 2/2]: Save the acked_features to NetVhostUserState once
> Guest virtio driver configured. This step makes
> acked_features in NetVhostUserState up-to-date.
>
> Please review, any comments and suggestions are welcome.
>
> Best regard.
>
> Yong
>
> Hyman Huang (2):
> vhost-user: Refactor vhost acked features saving
> vhost-net: Fix the virito features negotiation flaw
>
> hw/net/vhost_net.c | 9 +++++++++
> hw/net/virtio-net.c | 5 +++++
> include/net/vhost-user.h | 2 ++
> include/net/vhost_net.h | 2 ++
> net/vhost-user.c | 35 +++++++++++++++++++----------------
> 5 files changed, 37 insertions(+), 16 deletions(-)
>
--
Best regard
Hyman Huang(黄勇)
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] vhost-user: Refactor vhost acked features saving
2022-09-26 6:36 ` [PATCH 1/2] vhost-user: Refactor vhost acked features saving huangy81
@ 2022-10-07 14:01 ` Michael S. Tsirkin
2022-10-07 14:57 ` Stefano Garzarella
2022-10-07 16:58 ` Hyman Huang
0 siblings, 2 replies; 9+ messages in thread
From: Michael S. Tsirkin @ 2022-10-07 14:01 UTC (permalink / raw)
To: huangy81
Cc: qemu-devel, Jason Wang, tugy, baiyw2, dengpc12, liuym16, yangchg,
yuanmh12, zhoupx, lic121, Stefano Garzarella, Raphael Norwitz
On Mon, Sep 26, 2022 at 02:36:40PM +0800, huangy81@chinatelecom.cn wrote:
> From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
>
> Abstract vhost acked features saving into
> vhost_user_save_acked_features, export it as util function.
>
> Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
> Signed-off-by: Guoyi Tu <tugy@chinatelecom.cn>
> ---
> include/net/vhost-user.h | 2 ++
> net/vhost-user.c | 35 +++++++++++++++++++----------------
> 2 files changed, 21 insertions(+), 16 deletions(-)
>
> diff --git a/include/net/vhost-user.h b/include/net/vhost-user.h
> index 5bcd8a6285..00d46613d3 100644
> --- a/include/net/vhost-user.h
> +++ b/include/net/vhost-user.h
> @@ -14,5 +14,7 @@
> struct vhost_net;
> struct vhost_net *vhost_user_get_vhost_net(NetClientState *nc);
> uint64_t vhost_user_get_acked_features(NetClientState *nc);
> +void vhost_user_save_acked_features(NetClientState *nc,
> + bool cleanup);
>
> #endif /* VHOST_USER_H */
> diff --git a/net/vhost-user.c b/net/vhost-user.c
> index b1a0247b59..c512cc9727 100644
> --- a/net/vhost-user.c
> +++ b/net/vhost-user.c
> @@ -45,24 +45,31 @@ uint64_t vhost_user_get_acked_features(NetClientState *nc)
> return s->acked_features;
> }
>
> -static void vhost_user_stop(int queues, NetClientState *ncs[])
> +void vhost_user_save_acked_features(NetClientState *nc, bool cleanup)
> {
> NetVhostUserState *s;
> +
> + s = DO_UPCAST(NetVhostUserState, nc, nc);
> + if (s->vhost_net) {
> + uint64_t features = vhost_net_get_acked_features(s->vhost_net);
> + if (features) {
> + s->acked_features = features;
> + }
Note it does not set acked_features if features are 0.
Which might be the case for legacy ...
I will need to analyze this more to figure out what's
the correct behaviour....
Stefano? Raphael?
> +
> + if (cleanup) {
> + vhost_net_cleanup(s->vhost_net);
> + }
> + }
> +}
> +
> +static void vhost_user_stop(int queues, NetClientState *ncs[])
> +{
> int i;
>
> for (i = 0; i < queues; i++) {
> assert(ncs[i]->info->type == NET_CLIENT_DRIVER_VHOST_USER);
>
> - s = DO_UPCAST(NetVhostUserState, nc, ncs[i]);
> -
> - if (s->vhost_net) {
> - /* save acked features */
> - uint64_t features = vhost_net_get_acked_features(s->vhost_net);
> - if (features) {
> - s->acked_features = features;
> - }
> - vhost_net_cleanup(s->vhost_net);
> - }
> + vhost_user_save_acked_features(ncs[i], true);
> }
> }
>
> @@ -251,11 +258,7 @@ static void chr_closed_bh(void *opaque)
> s = DO_UPCAST(NetVhostUserState, nc, ncs[0]);
>
> for (i = queues -1; i >= 0; i--) {
> - s = DO_UPCAST(NetVhostUserState, nc, ncs[i]);
> -
> - if (s->vhost_net) {
> - s->acked_features = vhost_net_get_acked_features(s->vhost_net);
> - }
> + vhost_user_save_acked_features(ncs[i], false);
> }
>
> qmp_set_link(name, false, &err);
> --
> 2.27.0
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] vhost-user: Refactor vhost acked features saving
2022-10-07 14:01 ` Michael S. Tsirkin
@ 2022-10-07 14:57 ` Stefano Garzarella
2022-10-07 16:58 ` Hyman Huang
1 sibling, 0 replies; 9+ messages in thread
From: Stefano Garzarella @ 2022-10-07 14:57 UTC (permalink / raw)
To: Michael S. Tsirkin, Marc-André Lureau
Cc: huangy81, qemu-devel, Jason Wang, tugy, baiyw2, dengpc12, liuym16,
yangchg, yuanmh12, zhoupx, lic121, Raphael Norwitz
On Fri, Oct 07, 2022 at 10:01:21AM -0400, Michael S. Tsirkin wrote:
>On Mon, Sep 26, 2022 at 02:36:40PM +0800, huangy81@chinatelecom.cn wrote:
>> From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
>>
>> Abstract vhost acked features saving into
>> vhost_user_save_acked_features, export it as util function.
>>
>> Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
>> Signed-off-by: Guoyi Tu <tugy@chinatelecom.cn>
>> ---
>> include/net/vhost-user.h | 2 ++
>> net/vhost-user.c | 35 +++++++++++++++++++----------------
>> 2 files changed, 21 insertions(+), 16 deletions(-)
>>
>> diff --git a/include/net/vhost-user.h b/include/net/vhost-user.h
>> index 5bcd8a6285..00d46613d3 100644
>> --- a/include/net/vhost-user.h
>> +++ b/include/net/vhost-user.h
>> @@ -14,5 +14,7 @@
>> struct vhost_net;
>> struct vhost_net *vhost_user_get_vhost_net(NetClientState *nc);
>> uint64_t vhost_user_get_acked_features(NetClientState *nc);
>> +void vhost_user_save_acked_features(NetClientState *nc,
>> + bool cleanup);
>>
>> #endif /* VHOST_USER_H */
>> diff --git a/net/vhost-user.c b/net/vhost-user.c
>> index b1a0247b59..c512cc9727 100644
>> --- a/net/vhost-user.c
>> +++ b/net/vhost-user.c
>> @@ -45,24 +45,31 @@ uint64_t vhost_user_get_acked_features(NetClientState *nc)
>> return s->acked_features;
>> }
>>
>> -static void vhost_user_stop(int queues, NetClientState *ncs[])
>> +void vhost_user_save_acked_features(NetClientState *nc, bool cleanup)
>> {
>> NetVhostUserState *s;
>> +
>> + s = DO_UPCAST(NetVhostUserState, nc, nc);
>> + if (s->vhost_net) {
>> + uint64_t features = vhost_net_get_acked_features(s->vhost_net);
>> + if (features) {
>> + s->acked_features = features;
>> + }
>
>Note it does not set acked_features if features are 0.
>Which might be the case for legacy ...
>I will need to analyze this more to figure out what's
>the correct behaviour....
>
>Stefano? Raphael?
I just noticed that we now call vhost_user_stop() only in the error path
of vhost_user_start(). We used it elsewhere, but over time we removed
those calls.
Do we still need to save acked_feature in that path?
About doing it only if the features aren't 0, it isn't clear to me.
I see we did in e6bcb1b617 ("vhost-user: keep vhost_net after a
disconnection"). @Marc-André do you remember why?
Thanks,
Stefano
>
>> +
>> + if (cleanup) {
>> + vhost_net_cleanup(s->vhost_net);
>> + }
>> + }
>> +}
>> +
>> +static void vhost_user_stop(int queues, NetClientState *ncs[])
>> +{
>> int i;
>>
>> for (i = 0; i < queues; i++) {
>> assert(ncs[i]->info->type == NET_CLIENT_DRIVER_VHOST_USER);
>>
>> - s = DO_UPCAST(NetVhostUserState, nc, ncs[i]);
>> -
>> - if (s->vhost_net) {
>> - /* save acked features */
>> - uint64_t features = vhost_net_get_acked_features(s->vhost_net);
>> - if (features) {
>> - s->acked_features = features;
>> - }
>> - vhost_net_cleanup(s->vhost_net);
>> - }
>> + vhost_user_save_acked_features(ncs[i], true);
>> }
>> }
>>
>> @@ -251,11 +258,7 @@ static void chr_closed_bh(void *opaque)
>> s = DO_UPCAST(NetVhostUserState, nc, ncs[0]);
>>
>> for (i = queues -1; i >= 0; i--) {
>> - s = DO_UPCAST(NetVhostUserState, nc, ncs[i]);
>> -
>> - if (s->vhost_net) {
>> - s->acked_features = vhost_net_get_acked_features(s->vhost_net);
>> - }
>> + vhost_user_save_acked_features(ncs[i], false);
>> }
>>
>> qmp_set_link(name, false, &err);
>> --
>> 2.27.0
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] vhost-user: Refactor vhost acked features saving
2022-10-07 14:01 ` Michael S. Tsirkin
2022-10-07 14:57 ` Stefano Garzarella
@ 2022-10-07 16:58 ` Hyman Huang
1 sibling, 0 replies; 9+ messages in thread
From: Hyman Huang @ 2022-10-07 16:58 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: qemu-devel, Jason Wang, tugy, baiyw2, dengpc12, liuym16, yangchg,
yuanmh12, zhoupx, lic121, Stefano Garzarella, Raphael Norwitz
在 2022/10/7 22:01, Michael S. Tsirkin 写道:
> On Mon, Sep 26, 2022 at 02:36:40PM +0800, huangy81@chinatelecom.cn wrote:
>> From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
>>
>> Abstract vhost acked features saving into
>> vhost_user_save_acked_features, export it as util function.
>>
>> Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
>> Signed-off-by: Guoyi Tu <tugy@chinatelecom.cn>
>> ---
>> include/net/vhost-user.h | 2 ++
>> net/vhost-user.c | 35 +++++++++++++++++++----------------
>> 2 files changed, 21 insertions(+), 16 deletions(-)
>>
>> diff --git a/include/net/vhost-user.h b/include/net/vhost-user.h
>> index 5bcd8a6285..00d46613d3 100644
>> --- a/include/net/vhost-user.h
>> +++ b/include/net/vhost-user.h
>> @@ -14,5 +14,7 @@
>> struct vhost_net;
>> struct vhost_net *vhost_user_get_vhost_net(NetClientState *nc);
>> uint64_t vhost_user_get_acked_features(NetClientState *nc);
>> +void vhost_user_save_acked_features(NetClientState *nc,
>> + bool cleanup);
>>
>> #endif /* VHOST_USER_H */
>> diff --git a/net/vhost-user.c b/net/vhost-user.c
>> index b1a0247b59..c512cc9727 100644
>> --- a/net/vhost-user.c
>> +++ b/net/vhost-user.c
>> @@ -45,24 +45,31 @@ uint64_t vhost_user_get_acked_features(NetClientState *nc)
>> return s->acked_features;
>> }
>>
>> -static void vhost_user_stop(int queues, NetClientState *ncs[])
>> +void vhost_user_save_acked_features(NetClientState *nc, bool cleanup)
>> {
>> NetVhostUserState *s;
>> +
>> + s = DO_UPCAST(NetVhostUserState, nc, nc);
>> + if (s->vhost_net) {
>> + uint64_t features = vhost_net_get_acked_features(s->vhost_net);
>> + if (features) {
>> + s->acked_features = features;
>> + }
>
> Note it does not set acked_features if features are 0.
> Which might be the case for legacy ...
> I will need to analyze this more to figure out what's
> the correct behaviour....
>
Thanks Michael for commentting. :)
Indeed, backing up acked_features in the two functions chr_closed_bh()
vhost_user_stop() are kind of different as above, it also seems a little
weried for me.
IMHO, we can always keep the acked_features in NetVhostUserState the
same as acked_features in vhost_dev no matter what features are, since
this is the role that acked_features in NetVhostUserState plays and we
can just focus on the validation of acked_features in vhost_dev if
something goes wrong.
Thanks,
Yong
> Stefano? Raphael?
>
>> +
>> + if (cleanup) {
>> + vhost_net_cleanup(s->vhost_net);
>> + }
>> + }
>> +}
>> +
>> +static void vhost_user_stop(int queues, NetClientState *ncs[])
>> +{
>> int i;
>>
>> for (i = 0; i < queues; i++) {
>> assert(ncs[i]->info->type == NET_CLIENT_DRIVER_VHOST_USER);
>>
>> - s = DO_UPCAST(NetVhostUserState, nc, ncs[i]);
>> -
>> - if (s->vhost_net) {
>> - /* save acked features */
>> - uint64_t features = vhost_net_get_acked_features(s->vhost_net);
>> - if (features) {
>> - s->acked_features = features;
>> - }
>> - vhost_net_cleanup(s->vhost_net);
>> - }
>> + vhost_user_save_acked_features(ncs[i], true);
>> }
>> }
>>
>> @@ -251,11 +258,7 @@ static void chr_closed_bh(void *opaque)
>> s = DO_UPCAST(NetVhostUserState, nc, ncs[0]);
>>
>> for (i = queues -1; i >= 0; i--) {
>> - s = DO_UPCAST(NetVhostUserState, nc, ncs[i]);
>> -
>> - if (s->vhost_net) {
>> - s->acked_features = vhost_net_get_acked_features(s->vhost_net);
>> - }
>> + vhost_user_save_acked_features(ncs[i], false);
>> }
>>
>> qmp_set_link(name, false, &err);
>> --
>> 2.27.0
>
--
Best regard
Hyman Huang(黄勇)
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/2] Fix the virito features negotiation flaw
2022-10-05 8:17 ` [PATCH 0/2] " Hyman Huang
@ 2022-10-26 15:00 ` Michael S. Tsirkin
2022-10-26 16:31 ` Hyman Huang
0 siblings, 1 reply; 9+ messages in thread
From: Michael S. Tsirkin @ 2022-10-26 15:00 UTC (permalink / raw)
To: Hyman Huang
Cc: qemu-devel, Jason Wang, tugy, baiyw2, dengpc12, liuym16, yangchg,
yuanmh12, zhoupx, lic121
I guess I'll apply this. Can you fix a typo in subject though?
Would also be nice if we had a test for this behaviour.
On Wed, Oct 05, 2022 at 04:17:30PM +0800, Hyman Huang wrote:
> Ping,
> Hi, Michael and Jason, how does this patchset feel think? :)
> Sorry if i made noise.
>
> Yong
>
> 在 2022/9/26 14:36, huangy81@chinatelecom.cn 写道:
> > From: "Hyman Huang(黄勇)" <huangy81@chinatelecom.cn>
> >
> > This patchset aim to fix the unexpected negotiation features for
> > vhost-user netdev interface.
> >
> > Steps to reproduce the issue:
> > Prepare a vm (CentOS 8 in my work scenario) with vhost-user
> > backend interface and configure qemu as server mode. So dpdk
> > would connect qemu's unix socket periodically.
> >
> > 1. start vm in background and restart openvswitch service
> > concurrently and repeatedly in the process of vm start.
> >
> > 2. check if negotiated virtio features of port is "0x40000000" at
> > dpdk side by executing:
> > ovs-vsctl list interface | grep features | grep {port_socket_path}
> > 3. if features equals "0x40000000", go to the vm and check if sending
> > arp package works, executing:
> > arping {IP_ADDR}
> > if vm interface is configured to boot with dhcp protocol, it
> > would get no ip.
> >
> > After doing above steps, we'll find the arping not work, the ovs on
> > host side has forwarded unexpected arp packages, which be added 0x0000
> > in the head of ethenet frame. Though qemu report some error when
> > read/write cmd of vhost protocol during the process of vm start,
> > like the following:
> >
> > "Failed to set msg fds"
> > "vhost VQ 0 ring restore failed: -22: Invalid argument (22)"
> >
> > The vm does not stop or report more suggestive error message, it
> > seems that everthing is ok.
> >
> > The root cause is that dpdk port negotiated nothing but only one
> > VHOST_USER_F_PROTOCOL_FEATURES feature with vhost-user interface at
> > qemu side, which is an unexpected behavior. qemu only load the
> > VHOST_USER_F_PROTOCOL_FEATURES when VHOST_USER_SET_FEATURES and loss
> > the guest features configured by front-end virtio driver using the
> > VIRTIO_PCI_COMMON_GF addr, which is stored in acked_features field
> > of struct vhost_dev.
> >
> > To explain how the acked_features disappear, we may need to know the
> > lifecyle of acked_features in vhost_dev during feature negotiation.
> >
> > 1. qemu init acked_features field of struct vhost_dev in vhost_net_init()
> > by calling vhost_net_ack_features(), the init value fetched from
> > acked_features field of struct NetVhostUserState, which is the backup
> > role after vhost stopping or unix socket closed.
> > In the first time, the acked_features of struct NetVhostUserState is 0
> > so the init value of vhost_dev's acked_features also 0.
> >
> > 2. when guest virtio driver set features, qemu accept the features and
> > call virtio_set_features to store the features as acked_features in
> > vhost_dev.
> >
> > 3. when unix socket closed or vhost_dev device doesn't work and be
> > stopped unexpectedly, qemu will call chr_closed_bh or vhost_user_stop,
> > which will copy acked_features from vhost_dev to NetVhostUserState and
> > cleanup the vhost_dev. Since virtio driver not allowed to set features
> > once status of virtio device changes to VIRTIO_CONFIG_S_FEATURE_OK,
> > qemu need to backup it in case of loss.
> > 4. once unix socket return to normal and get connected, qemu will
> > call vhost_user_start to restore the vhost_dev and fetch the
> > acked_features stored in NetVhostUserState previously.
> >
> > The above flow works fine in the normal scenarios, but it doesn't cover
> > the scenario that openvswitch service restart in the same time of
> > virtio features negotiation.
> >
> > Let's analyze such scenario:
> > qemu dpdk
> >
> > vhost_net_init()
> > | systemctl stop openvswitch.service
> > virtio_set_features() |
> > | systemctl start openvswitch.service
> > virtio_set_status()
> >
> > Ovs stop service before guset setting virtio features, chr_closed_bh()
> > be called and fetch acked_features in vhost_dev, if may store the
> > incomplete features to NetVhostUserState since it doesn't include
> > guest features, eg "0x40000000".
> >
> > Guest set virtio features with another features, eg "0x7060a782",
> > this value will store in acked_features of vhost_dev, which is the
> > right and up-to-date features.
> >
> > After ovs service show up, qemu unix socket get connected and call
> > vhost_user_start(), which will restore acked_features of vhost_dev
> > using NetVhostUserState and "0x40000000" be loaded, which is obsolete.
> >
> > Guest set virtio device status and therefore qemu call
> > virtio_net_vhost_status finally, checking if vhost-net device has
> > started, start it if not, consequently the obsolete acked_features
> > "0x40000000" be negotiated after calling vhost_dev_set_features().
> >
> > So the key point of solving this issue making the acked_features
> > in NetVhostUserState up-to-date, these patchset provide this
> > solution.
> >
> > [PATCH 1/2]: Abstract the existing code of saving acked_features
> > into vhost_user_save_acked_features so the next
> > patch seems clean.
> >
> > [PATCH 2/2]: Save the acked_features to NetVhostUserState once
> > Guest virtio driver configured. This step makes
> > acked_features in NetVhostUserState up-to-date.
> >
> > Please review, any comments and suggestions are welcome.
> >
> > Best regard.
> >
> > Yong
> >
> > Hyman Huang (2):
> > vhost-user: Refactor vhost acked features saving
> > vhost-net: Fix the virito features negotiation flaw
> >
> > hw/net/vhost_net.c | 9 +++++++++
> > hw/net/virtio-net.c | 5 +++++
> > include/net/vhost-user.h | 2 ++
> > include/net/vhost_net.h | 2 ++
> > net/vhost-user.c | 35 +++++++++++++++++++----------------
> > 5 files changed, 37 insertions(+), 16 deletions(-)
> >
>
> --
> Best regard
>
> Hyman Huang(黄勇)
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/2] Fix the virito features negotiation flaw
2022-10-26 15:00 ` Michael S. Tsirkin
@ 2022-10-26 16:31 ` Hyman Huang
0 siblings, 0 replies; 9+ messages in thread
From: Hyman Huang @ 2022-10-26 16:31 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: qemu-devel, Jason Wang, tugy, baiyw2, dengpc12, liuym16, yangchg,
yuanmh12, zhoupx, lic121
在 2022/10/26 23:00, Michael S. Tsirkin 写道:
> I guess I'll apply this. Can you fix a typo in subject though?
Of course yes. :)
> Would also be nice if we had a test for this behaviour.
Ok, i'll add a test patch next version.
>
>
> On Wed, Oct 05, 2022 at 04:17:30PM +0800, Hyman Huang wrote:
>> Ping,
>> Hi, Michael and Jason, how does this patchset feel think? :)
>> Sorry if i made noise.
>>
>> Yong
>>
>> 在 2022/9/26 14:36, huangy81@chinatelecom.cn 写道:
>>> From: "Hyman Huang(黄勇)" <huangy81@chinatelecom.cn>
>>>
>>> This patchset aim to fix the unexpected negotiation features for
>>> vhost-user netdev interface.
>>>
>>> Steps to reproduce the issue:
>>> Prepare a vm (CentOS 8 in my work scenario) with vhost-user
>>> backend interface and configure qemu as server mode. So dpdk
>>> would connect qemu's unix socket periodically.
>>>
>>> 1. start vm in background and restart openvswitch service
>>> concurrently and repeatedly in the process of vm start.
>>>
>>> 2. check if negotiated virtio features of port is "0x40000000" at
>>> dpdk side by executing:
>>> ovs-vsctl list interface | grep features | grep {port_socket_path}
>>> 3. if features equals "0x40000000", go to the vm and check if sending
>>> arp package works, executing:
>>> arping {IP_ADDR}
>>> if vm interface is configured to boot with dhcp protocol, it
>>> would get no ip.
>>>
>>> After doing above steps, we'll find the arping not work, the ovs on
>>> host side has forwarded unexpected arp packages, which be added 0x0000
>>> in the head of ethenet frame. Though qemu report some error when
>>> read/write cmd of vhost protocol during the process of vm start,
>>> like the following:
>>>
>>> "Failed to set msg fds"
>>> "vhost VQ 0 ring restore failed: -22: Invalid argument (22)"
>>>
>>> The vm does not stop or report more suggestive error message, it
>>> seems that everthing is ok.
>>>
>>> The root cause is that dpdk port negotiated nothing but only one
>>> VHOST_USER_F_PROTOCOL_FEATURES feature with vhost-user interface at
>>> qemu side, which is an unexpected behavior. qemu only load the
>>> VHOST_USER_F_PROTOCOL_FEATURES when VHOST_USER_SET_FEATURES and loss
>>> the guest features configured by front-end virtio driver using the
>>> VIRTIO_PCI_COMMON_GF addr, which is stored in acked_features field
>>> of struct vhost_dev.
>>>
>>> To explain how the acked_features disappear, we may need to know the
>>> lifecyle of acked_features in vhost_dev during feature negotiation.
>>>
>>> 1. qemu init acked_features field of struct vhost_dev in vhost_net_init()
>>> by calling vhost_net_ack_features(), the init value fetched from
>>> acked_features field of struct NetVhostUserState, which is the backup
>>> role after vhost stopping or unix socket closed.
>>> In the first time, the acked_features of struct NetVhostUserState is 0
>>> so the init value of vhost_dev's acked_features also 0.
>>>
>>> 2. when guest virtio driver set features, qemu accept the features and
>>> call virtio_set_features to store the features as acked_features in
>>> vhost_dev.
>>>
>>> 3. when unix socket closed or vhost_dev device doesn't work and be
>>> stopped unexpectedly, qemu will call chr_closed_bh or vhost_user_stop,
>>> which will copy acked_features from vhost_dev to NetVhostUserState and
>>> cleanup the vhost_dev. Since virtio driver not allowed to set features
>>> once status of virtio device changes to VIRTIO_CONFIG_S_FEATURE_OK,
>>> qemu need to backup it in case of loss.
>>> 4. once unix socket return to normal and get connected, qemu will
>>> call vhost_user_start to restore the vhost_dev and fetch the
>>> acked_features stored in NetVhostUserState previously.
>>>
>>> The above flow works fine in the normal scenarios, but it doesn't cover
>>> the scenario that openvswitch service restart in the same time of
>>> virtio features negotiation.
>>>
>>> Let's analyze such scenario:
>>> qemu dpdk
>>>
>>> vhost_net_init()
>>> | systemctl stop openvswitch.service
>>> virtio_set_features() |
>>> | systemctl start openvswitch.service
>>> virtio_set_status()
>>>
>>> Ovs stop service before guset setting virtio features, chr_closed_bh()
>>> be called and fetch acked_features in vhost_dev, if may store the
>>> incomplete features to NetVhostUserState since it doesn't include
>>> guest features, eg "0x40000000".
>>>
>>> Guest set virtio features with another features, eg "0x7060a782",
>>> this value will store in acked_features of vhost_dev, which is the
>>> right and up-to-date features.
>>>
>>> After ovs service show up, qemu unix socket get connected and call
>>> vhost_user_start(), which will restore acked_features of vhost_dev
>>> using NetVhostUserState and "0x40000000" be loaded, which is obsolete.
>>>
>>> Guest set virtio device status and therefore qemu call
>>> virtio_net_vhost_status finally, checking if vhost-net device has
>>> started, start it if not, consequently the obsolete acked_features
>>> "0x40000000" be negotiated after calling vhost_dev_set_features().
>>>
>>> So the key point of solving this issue making the acked_features
>>> in NetVhostUserState up-to-date, these patchset provide this
>>> solution.
>>>
>>> [PATCH 1/2]: Abstract the existing code of saving acked_features
>>> into vhost_user_save_acked_features so the next
>>> patch seems clean.
>>>
>>> [PATCH 2/2]: Save the acked_features to NetVhostUserState once
>>> Guest virtio driver configured. This step makes
>>> acked_features in NetVhostUserState up-to-date.
>>>
>>> Please review, any comments and suggestions are welcome.
>>>
>>> Best regard.
>>>
>>> Yong
>>>
>>> Hyman Huang (2):
>>> vhost-user: Refactor vhost acked features saving
>>> vhost-net: Fix the virito features negotiation flaw
>>>
>>> hw/net/vhost_net.c | 9 +++++++++
>>> hw/net/virtio-net.c | 5 +++++
>>> include/net/vhost-user.h | 2 ++
>>> include/net/vhost_net.h | 2 ++
>>> net/vhost-user.c | 35 +++++++++++++++++++----------------
>>> 5 files changed, 37 insertions(+), 16 deletions(-)
>>>
>>
>> --
>> Best regard
>>
>> Hyman Huang(黄勇)
>
--
Best regard
Hyman Huang(黄勇)
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2022-10-26 16:37 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-09-26 6:36 [PATCH 0/2] Fix the virito features negotiation flaw huangy81
2022-09-26 6:36 ` [PATCH 1/2] vhost-user: Refactor vhost acked features saving huangy81
2022-10-07 14:01 ` Michael S. Tsirkin
2022-10-07 14:57 ` Stefano Garzarella
2022-10-07 16:58 ` Hyman Huang
2022-09-26 6:36 ` [PATCH 2/2] vhost-net: Fix the virito features negotiation flaw huangy81
2022-10-05 8:17 ` [PATCH 0/2] " Hyman Huang
2022-10-26 15:00 ` Michael S. Tsirkin
2022-10-26 16:31 ` Hyman Huang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).