From: Jason Wang <jasowang@redhat.com>
To: Yuri Benditovich <yuri.benditovich@daynix.com>
Cc: Akihiko Odaki <akihiko.odaki@daynix.com>,
Dmitry Fleytman <dmitry.fleytman@gmail.com>,
Sriram Yagnaraman <sriram.yagnaraman@est.tech>,
"Michael S. Tsirkin" <mst@redhat.com>,
Luigi Rizzo <rizzo@iet.unipi.it>,
Giuseppe Lettieri <g.lettieri@iet.unipi.it>,
Vincenzo Maffione <v.maffione@gmail.com>,
Andrew Melnychenko <andrew@daynix.com>,
qemu-devel@nongnu.org, Jonathon Jongsma <jjongsma@redhat.com>
Subject: Re: [PATCH v9 13/20] virtio-net: Return an error when vhost cannot enable RSS
Date: Wed, 17 Apr 2024 12:18:21 +0800 [thread overview]
Message-ID: <CACGkMEvNVCVsQVM_meeW9eAZwuUvzX8Jir3TcBS2t_uk9_O+vQ@mail.gmail.com> (raw)
In-Reply-To: <CAOEp5OdNGmCbmrqPCW4Pp3boOVxF+JGMPaVM3utjV0gC0emY2g@mail.gmail.com>
On Tue, Apr 16, 2024 at 5:51 PM Yuri Benditovich
<yuri.benditovich@daynix.com> wrote:
>
> On Tue, Apr 16, 2024 at 10:14 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Tue, Apr 16, 2024 at 1:43 PM Yuri Benditovich
> > <yuri.benditovich@daynix.com> wrote:
> > >
> > > On Tue, Apr 16, 2024 at 7:00 AM Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > > On Mon, Apr 15, 2024 at 10:05 PM Yuri Benditovich
> > > > <yuri.benditovich@daynix.com> wrote:
> > > > >
> > > > > On Wed, Apr 3, 2024 at 2:11 PM Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
> > > > > >
> > > > > > vhost requires eBPF for RSS. When eBPF is not available, virtio-net
> > > > > > implicitly disables RSS even if the user explicitly requests it. Return
> > > > > > an error instead of implicitly disabling RSS if RSS is requested but not
> > > > > > available.
> > > > > >
> > > > > > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> > > > > > ---
> > > > > > hw/net/virtio-net.c | 97 ++++++++++++++++++++++++++---------------------------
> > > > > > 1 file changed, 48 insertions(+), 49 deletions(-)
> > > > > >
> > > > > > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > > > > > index 61b49e335dea..3d53eba88cfc 100644
> > > > > > --- a/hw/net/virtio-net.c
> > > > > > +++ b/hw/net/virtio-net.c
> > > > > > @@ -793,9 +793,6 @@ static uint64_t virtio_net_get_features(VirtIODevice *vdev, uint64_t features,
> > > > > > return features;
> > > > > > }
> > > > > >
> > > > > > - if (!ebpf_rss_is_loaded(&n->ebpf_rss)) {
> > > > > > - virtio_clear_feature(&features, VIRTIO_NET_F_RSS);
> > > > > > - }
> > > > > > features = vhost_net_get_features(get_vhost_net(nc->peer), features);
> > > > > > vdev->backend_features = features;
> > > > > >
> > > > > > @@ -3591,6 +3588,50 @@ static bool failover_hide_primary_device(DeviceListener *listener,
> > > > > > return qatomic_read(&n->failover_primary_hidden);
> > > > > > }
> > > > > >
> > > > > > +static void virtio_net_device_unrealize(DeviceState *dev)
> > > > > > +{
> > > > > > + VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> > > > > > + VirtIONet *n = VIRTIO_NET(dev);
> > > > > > + int i, max_queue_pairs;
> > > > > > +
> > > > > > + if (virtio_has_feature(n->host_features, VIRTIO_NET_F_RSS)) {
> > > > > > + virtio_net_unload_ebpf(n);
> > > > > > + }
> > > > > > +
> > > > > > + /* This will stop vhost backend if appropriate. */
> > > > > > + virtio_net_set_status(vdev, 0);
> > > > > > +
> > > > > > + g_free(n->netclient_name);
> > > > > > + n->netclient_name = NULL;
> > > > > > + g_free(n->netclient_type);
> > > > > > + n->netclient_type = NULL;
> > > > > > +
> > > > > > + g_free(n->mac_table.macs);
> > > > > > + g_free(n->vlans);
> > > > > > +
> > > > > > + if (n->failover) {
> > > > > > + qobject_unref(n->primary_opts);
> > > > > > + device_listener_unregister(&n->primary_listener);
> > > > > > + migration_remove_notifier(&n->migration_state);
> > > > > > + } else {
> > > > > > + assert(n->primary_opts == NULL);
> > > > > > + }
> > > > > > +
> > > > > > + max_queue_pairs = n->multiqueue ? n->max_queue_pairs : 1;
> > > > > > + for (i = 0; i < max_queue_pairs; i++) {
> > > > > > + virtio_net_del_queue(n, i);
> > > > > > + }
> > > > > > + /* delete also control vq */
> > > > > > + virtio_del_queue(vdev, max_queue_pairs * 2);
> > > > > > + qemu_announce_timer_del(&n->announce_timer, false);
> > > > > > + g_free(n->vqs);
> > > > > > + qemu_del_nic(n->nic);
> > > > > > + virtio_net_rsc_cleanup(n);
> > > > > > + g_free(n->rss_data.indirections_table);
> > > > > > + net_rx_pkt_uninit(n->rx_pkt);
> > > > > > + virtio_cleanup(vdev);
> > > > > > +}
> > > > > > +
> > > > > > static void virtio_net_device_realize(DeviceState *dev, Error **errp)
> > > > > > {
> > > > > > VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> > > > > > @@ -3760,53 +3801,11 @@ static void virtio_net_device_realize(DeviceState *dev, Error **errp)
> > > > > >
> > > > > > net_rx_pkt_init(&n->rx_pkt);
> > > > > >
> > > > > > - if (virtio_has_feature(n->host_features, VIRTIO_NET_F_RSS)) {
> > > > > > - virtio_net_load_ebpf(n);
> > > > > > - }
> > > > > > -}
> > > > > > -
> > > > > > -static void virtio_net_device_unrealize(DeviceState *dev)
> > > > > > -{
> > > > > > - VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> > > > > > - VirtIONet *n = VIRTIO_NET(dev);
> > > > > > - int i, max_queue_pairs;
> > > > > > -
> > > > > > - if (virtio_has_feature(n->host_features, VIRTIO_NET_F_RSS)) {
> > > > > > - virtio_net_unload_ebpf(n);
> > > > > > + if (virtio_has_feature(n->host_features, VIRTIO_NET_F_RSS) &&
> > > > > > + !virtio_net_load_ebpf(n) && get_vhost_net(nc->peer)) {
> > > > > > + virtio_net_device_unrealize(dev);
> > > > > > + error_setg(errp, "Can't load eBPF RSS for vhost");
> > > > > > }
> > > > >
> > > > > As I already mentioned, I think this is an extremely bad idea to
> > > > > fail to run qemu due to such a reason as .absence of one feature.
> > > > > What I suggest is:
> > > > > 1. Redefine rss as tri-state (off|auto|on)
> > > > > 2. Fail to run only if rss is on and not available via ebpf
> > > > > 3. On auto - silently drop it
> > > >
> > > > "Auto" might be promatic for migration compatibility which is hard to
> > > > be used by management layers like libvirt. The reason is that there's
> > > > no way for libvirt to know if it is supported by device or not.
> > >
> > > In terms of migration every feature that somehow depends on the kernel
> > > is problematic, not only RSS.
> >
> > True, but if we can avoid more, it would still be better.
> >
> > > Last time we added the USO feature - is
> > > it different?
> >
> > I may miss something but we never define tristate for USO?
> >
> > DEFINE_PROP_BIT64("guest_uso4", VirtIONet, host_features,
> > VIRTIO_NET_F_GUEST_USO4, true),
> > DEFINE_PROP_BIT64("guest_uso6", VirtIONet, host_features,
> > VIRTIO_NET_F_GUEST_USO6, true),
> > DEFINE_PROP_BIT64("host_uso", VirtIONet, host_features,
> > VIRTIO_NET_F_HOST_USO, true),
> >
> When I've added USO feature I followed the existing approach of virtio-net.
> On get_features - check what was "requested" including those that were "on"
> by default and drop those that aren't supported (vhost by itself also
> can drop some of features).
>
> Still if we have on source machine kernel that supports USO (visible
> on TAP flags)
> and on dest we have older kernel without such support, the migration
> will probably fail.
I may miss something, do we have something USO specific to migrate? If
not, the migration won't fail. And even if migration fails, it's still
not good.
Kernel intends to remove UFO support from 2016, but it breaks the
migration so there's no other choice by introducing UFO (via
emulation) back.
>
> The available solution today is to reduce machine generation in
> libvirt profile (as an example),
> aligning the generation over all the machines that are expected to
> participate in migration.
>
> IMO we should think on some _generic_ solution, for example feature
> negotiation between
> machines before the migration - if the driver receives notification
> from the device it
> can negotiate the change of hardware features to OS (at least for most of them).
> Not trivial, but IMO better than just failing the execution.
Adding Jonathon.
Yes, technically libvirt can detect the support for USO/RSS and
generate the correct qemu command line.
But what I want to say is, failing the launching is still better than
failing the workload running in the guest.
>
> > ?
> > > And in terms of migration "rss=on" is problematic the same way as "rss=auto".
> >
> > Failing early when launching Qemu is better than failing silently as a
> > guest after a migration.
>
> Do I understand correctly - you mean fail qemu initialization on the
> destination machine?
Yes, it's a hint for the management layer that the migration
compatibility check is wrong.
>
> >
> > > Can you please show one scenario of migration where they will behave
> > > differently?
> >
> > If you mean the problem of "auto", here's one:
> >
> > Assuming auto is used in both src and dst. On source, rss is enabled
> > but not destination. RSS failed to work after migration.
>
> I think in this case the migration will fail when set_feature is
> called on destination.
> The same way as with "on". Am I mistaken?
See above.
>
> >
> > > And in terms of regular experience there is a big advantage.
> >
> > Similarly, silent clearing a feature is also not good:
> >
> > if (!peer_has_vnet_hdr(n)) {
> > virtio_clear_feature(&features, VIRTIO_NET_F_CSUM);
> > virtio_clear_feature(&features, VIRTIO_NET_F_HOST_TSO4);
> > virtio_clear_feature(&features, VIRTIO_NET_F_HOST_TSO6);
> > virtio_clear_feature(&features, VIRTIO_NET_F_HOST_ECN);
> >
> > virtio_clear_feature(&features, VIRTIO_NET_F_GUEST_CSUM);
> > virtio_clear_feature(&features, VIRTIO_NET_F_GUEST_TSO4);
> > virtio_clear_feature(&features, VIRTIO_NET_F_GUEST_TSO6);
> > virtio_clear_feature(&features, VIRTIO_NET_F_GUEST_ECN);
> >
> > virtio_clear_feature(&features, VIRTIO_NET_F_HOST_USO);
> > virtio_clear_feature(&features, VIRTIO_NET_F_GUEST_USO4);
> > virtio_clear_feature(&features, VIRTIO_NET_F_GUEST_USO6);
> >
> > virtio_clear_feature(&features, VIRTIO_NET_F_HASH_REPORT);
> > }
> >
> > The reason we never see complaints is probably because vhost/TAP are
> > the only backend that supports migration where vnet support there has
> > been more than a decade.
>
> I think we never see complaints because we did not add new features
> for a long time.
Probably but I basically meant peer_has_vnet_hdr() is always true for
the cases we support. So Qemu won't silently clear them even if they
were turned on explicitly by qemu command line.
Thanks
>
> >
> > Thanks
> >
> >
> > >
> > >
> > > >
> > > > Thanks
> > > >
> > > > > 4. The same with 'hash' option - it is not compatible with vhost (at
> > > > > least at the moment)
> > > > > 5. Reformat the patch as it is hard to review it due to replacing
> > > > > entire procedures, i.e. one patch with replacing without changes,
> > > > > another one - with real changes.
> > > > > If this is hard to review only for me - please ignore that.
> > > > >
> > > > > > -
> > > > > > - /* This will stop vhost backend if appropriate. */
> > > > > > - virtio_net_set_status(vdev, 0);
> > > > > > -
> > > > > > - g_free(n->netclient_name);
> > > > > > - n->netclient_name = NULL;
> > > > > > - g_free(n->netclient_type);
> > > > > > - n->netclient_type = NULL;
> > > > > > -
> > > > > > - g_free(n->mac_table.macs);
> > > > > > - g_free(n->vlans);
> > > > > > -
> > > > > > - if (n->failover) {
> > > > > > - qobject_unref(n->primary_opts);
> > > > > > - device_listener_unregister(&n->primary_listener);
> > > > > > - migration_remove_notifier(&n->migration_state);
> > > > > > - } else {
> > > > > > - assert(n->primary_opts == NULL);
> > > > > > - }
> > > > > > -
> > > > > > - max_queue_pairs = n->multiqueue ? n->max_queue_pairs : 1;
> > > > > > - for (i = 0; i < max_queue_pairs; i++) {
> > > > > > - virtio_net_del_queue(n, i);
> > > > > > - }
> > > > > > - /* delete also control vq */
> > > > > > - virtio_del_queue(vdev, max_queue_pairs * 2);
> > > > > > - qemu_announce_timer_del(&n->announce_timer, false);
> > > > > > - g_free(n->vqs);
> > > > > > - qemu_del_nic(n->nic);
> > > > > > - virtio_net_rsc_cleanup(n);
> > > > > > - g_free(n->rss_data.indirections_table);
> > > > > > - net_rx_pkt_uninit(n->rx_pkt);
> > > > > > - virtio_cleanup(vdev);
> > > > > > }
> > > > > >
> > > > > > static void virtio_net_reset(VirtIODevice *vdev)
> > > > > >
> > > > > > --
> > > > > > 2.44.0
> > > > > >
> > > > >
> > > >
> > >
> >
>
next prev parent reply other threads:[~2024-04-17 4:18 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-03 11:10 [PATCH v9 00/20] virtio-net RSS/hash report fixes and improvements Akihiko Odaki
2024-04-03 11:10 ` [PATCH v9 01/20] tap: Remove tap_probe_vnet_hdr_len() Akihiko Odaki
2024-04-03 11:10 ` [PATCH v9 02/20] tap: Remove qemu_using_vnet_hdr() Akihiko Odaki
2024-04-03 11:10 ` [PATCH v9 03/20] net: Move virtio-net header length assertion Akihiko Odaki
2024-04-03 11:10 ` [PATCH v9 04/20] net: Remove receive_raw() Akihiko Odaki
2024-04-03 11:10 ` [PATCH v9 05/20] tap: Call tap_receive_iov() from tap_receive() Akihiko Odaki
2024-04-03 11:10 ` [PATCH v9 06/20] tap: Shrink zeroed virtio-net header Akihiko Odaki
2024-04-03 11:10 ` [PATCH v9 07/20] virtio-net: Do not propagate ebpf-rss-fds errors Akihiko Odaki
2024-04-03 11:10 ` [PATCH v9 08/20] virtio-net: Add only one queue pair when realizing Akihiko Odaki
2024-04-03 11:10 ` [PATCH v9 09/20] virtio-net: Copy header only when necessary Akihiko Odaki
2024-04-03 11:10 ` [PATCH v9 10/20] virtio-net: Shrink header byte swapping buffer Akihiko Odaki
2024-04-03 11:10 ` [PATCH v9 11/20] virtio-net: Disable RSS on reset Akihiko Odaki
2024-04-03 11:11 ` [PATCH v9 12/20] virtio-net: Unify the logic to update NIC state for RSS Akihiko Odaki
2024-04-03 11:11 ` [PATCH v9 13/20] virtio-net: Return an error when vhost cannot enable RSS Akihiko Odaki
2024-04-07 21:46 ` Yuri Benditovich
2024-04-08 1:29 ` Akihiko Odaki
2024-04-11 11:28 ` Yan Vugenfirer
2024-04-15 14:05 ` Yuri Benditovich
2024-04-16 4:00 ` Jason Wang
2024-04-16 5:43 ` Yuri Benditovich
2024-04-16 7:13 ` Jason Wang
2024-04-16 9:50 ` Yuri Benditovich
2024-04-17 4:18 ` Jason Wang [this message]
2024-04-16 6:54 ` Akihiko Odaki
2024-04-20 14:27 ` Yuri Benditovich
2024-04-16 9:54 ` Yuri Benditovich
2024-04-03 11:11 ` [PATCH v9 14/20] virtio-net: Report RSS warning at device realization Akihiko Odaki
2024-04-03 11:11 ` [PATCH v9 15/20] virtio-net: Always set populate_hash Akihiko Odaki
2024-04-03 11:11 ` [PATCH v9 16/20] virtio-net: Do not write hashes to peer buffer Akihiko Odaki
2024-04-07 22:09 ` Yuri Benditovich
2024-04-08 1:30 ` Akihiko Odaki
2024-04-08 7:40 ` Yuri Benditovich
2024-04-08 7:42 ` Akihiko Odaki
2024-04-08 7:54 ` Yuri Benditovich
2024-04-08 7:57 ` Akihiko Odaki
2024-04-08 8:06 ` Yuri Benditovich
2024-04-08 8:11 ` Akihiko Odaki
2024-04-03 11:11 ` [PATCH v9 17/20] ebpf: Fix RSS error handling Akihiko Odaki
2024-04-13 12:16 ` Yuri Benditovich
2024-04-14 6:36 ` Akihiko Odaki
2024-04-03 11:11 ` [PATCH v9 18/20] ebpf: Return 0 when configuration fails Akihiko Odaki
2024-04-03 11:11 ` [PATCH v9 19/20] ebpf: Refactor tun_rss_steering_prog() Akihiko Odaki
2024-04-03 11:11 ` [PATCH v9 20/20] ebpf: Add a separate target for skeleton Akihiko Odaki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CACGkMEvNVCVsQVM_meeW9eAZwuUvzX8Jir3TcBS2t_uk9_O+vQ@mail.gmail.com \
--to=jasowang@redhat.com \
--cc=akihiko.odaki@daynix.com \
--cc=andrew@daynix.com \
--cc=dmitry.fleytman@gmail.com \
--cc=g.lettieri@iet.unipi.it \
--cc=jjongsma@redhat.com \
--cc=mst@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=rizzo@iet.unipi.it \
--cc=sriram.yagnaraman@est.tech \
--cc=v.maffione@gmail.com \
--cc=yuri.benditovich@daynix.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).