From: Jason Wang <jasowang@redhat.com>
To: "Longpeng(Mike)" <longpeng2@huawei.com>
Cc: mst <mst@redhat.com>, qemu-devel <qemu-devel@nongnu.org>,
yechuan@huawei.com, Yongji Xie <xieyongji@bytedance.com>,
"Gonglei \(Arei\)" <arei.gonglei@huawei.com>,
Parav Pandit <parav@nvidia.com>,
Stefan Hajnoczi <stefanha@redhat.com>,
Stefano Garzarella <sgarzare@redhat.com>
Subject: Re: [RFC] vhost-vdpa-net: add vhost-vdpa-net host device support
Date: Wed, 8 Dec 2021 14:27:04 +0800 [thread overview]
Message-ID: <CACGkMEvW0W-mMU159bUyDo2jK03FYQEn3ZedbC9vaEvDj2v7KQ@mail.gmail.com> (raw)
In-Reply-To: <20211208052010.1719-1-longpeng2@huawei.com>
On Wed, Dec 8, 2021 at 1:20 PM Longpeng(Mike) <longpeng2@huawei.com> wrote:
>
> From: Longpeng <longpeng2@huawei.com>
>
> Hi guys,
>
> This patch introduces vhost-vdpa-net device, which is inspired
> by vhost-user-blk and the proposal of vhost-vdpa-blk device [1].
>
> I've tested this patch on Huawei's offload card:
> ./x86_64-softmmu/qemu-system-x86_64 \
> -device vhost-vdpa-net-pci,vdpa-dev=/dev/vhost-vdpa-0
>
> For virtio hardware offloading, the most important requirement for us
> is to support live migration between offloading cards from different
> vendors, the combination of netdev and virtio-net seems too heavy, we
> prefer a lightweight way.
Could you elaborate more on this? It's mainly the control path when
using with netdev, and it provides a lot of other benefits:
- decouple the transport specific stuff out of the vhost abstraction,
mmio device is supported with 0 line of code
- migration compatibility, reuse the migration stream that is already
supported by Qemu virtio-net, this will allow migration among
different vhost backends.
- software mediation facility, not all the virtqueues are assigned to
guests directly. One example is the virtio-net cvq, qemu may want to
intercept and record the device state for migration. Reusing the
current virtio-net codes simplifies a lot of codes.
- transparent failover (in the future), the nic model can choose to
switch between vhost backends etc.
>
> Maybe we could support both in the future ?
For the net, we need to figure out the advantages of this approach
first. Note that we didn't have vhost-user-net-pci or vhost-pci in the
past.
For the block, I will leave Stefan and Stefano to comment.
> Such as:
>
> * Lightweight
> Net: vhost-vdpa-net
> Storage: vhost-vdpa-blk
>
> * Heavy but more powerful
> Net: netdev + virtio-net + vhost-vdpa
> Storage: bdrv + virtio-blk + vhost-vdpa
>
> [1] https://www.mail-archive.com/qemu-devel@nongnu.org/msg797569.html
>
> Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
> ---
> hw/net/meson.build | 1 +
> hw/net/vhost-vdpa-net.c | 338 +++++++++++++++++++++++++++++++++++++
> hw/virtio/Kconfig | 5 +
> hw/virtio/meson.build | 1 +
> hw/virtio/vhost-vdpa-net-pci.c | 118 +++++++++++++
I'd expect there's no device type specific code in this approach and
any kind of vDPA devices could be used with a general pci device.
Any reason for having net specific types here?
> include/hw/virtio/vhost-vdpa-net.h | 31 ++++
> include/net/vhost-vdpa.h | 2 +
> net/vhost-vdpa.c | 2 +-
> 8 files changed, 497 insertions(+), 1 deletion(-)
> create mode 100644 hw/net/vhost-vdpa-net.c
> create mode 100644 hw/virtio/vhost-vdpa-net-pci.c
> create mode 100644 include/hw/virtio/vhost-vdpa-net.h
>
> diff --git a/hw/net/meson.build b/hw/net/meson.build
> index bdf71f1..139ebc4 100644
> --- a/hw/net/meson.build
> +++ b/hw/net/meson.build
> @@ -44,6 +44,7 @@ specific_ss.add(when: 'CONFIG_XILINX_ETHLITE', if_true: files('xilinx_ethlite.c'
>
> softmmu_ss.add(when: 'CONFIG_VIRTIO_NET', if_true: files('net_rx_pkt.c'))
> specific_ss.add(when: 'CONFIG_VIRTIO_NET', if_true: files('virtio-net.c'))
> +specific_ss.add(when: 'CONFIG_VHOST_VDPA_NET', if_true: files('vhost-vdpa-net.c'))
>
> softmmu_ss.add(when: ['CONFIG_VIRTIO_NET', 'CONFIG_VHOST_NET'], if_true: files('vhost_net.c'), if_false: files('vhost_net-stub.c'))
> softmmu_ss.add(when: 'CONFIG_ALL', if_true: files('vhost_net-stub.c'))
> diff --git a/hw/net/vhost-vdpa-net.c b/hw/net/vhost-vdpa-net.c
> new file mode 100644
> index 0000000..48b99f9
> --- /dev/null
> +++ b/hw/net/vhost-vdpa-net.c
> @@ -0,0 +1,338 @@
> +#include "qemu/osdep.h"
> +#include "qapi/error.h"
> +#include "qemu/error-report.h"
> +#include "qemu/cutils.h"
> +#include "hw/qdev-core.h"
> +#include "hw/qdev-properties.h"
> +#include "hw/qdev-properties-system.h"
> +#include "hw/virtio/vhost.h"
> +#include "hw/virtio/vhost-vdpa-net.h"
> +#include "hw/virtio/virtio.h"
> +#include "hw/virtio/virtio-bus.h"
> +#include "hw/virtio/virtio-access.h"
> +#include "sysemu/sysemu.h"
> +#include "sysemu/runstate.h"
> +#include "net/vhost-vdpa.h"
> +
> +static void vhost_vdpa_net_get_config(VirtIODevice *vdev, uint8_t *config)
> +{
> + VHostVdpaNet *s = VHOST_VDPA_NET(vdev);
> +
> + memcpy(config, &s->netcfg, sizeof(struct virtio_net_config));
> +}
> +
> +static void vhost_vdpa_net_set_config(VirtIODevice *vdev, const uint8_t *config)
> +{
> + VHostVdpaNet *s = VHOST_VDPA_NET(vdev);
> + struct virtio_net_config *netcfg = (struct virtio_net_config *)config;
> + int ret;
> +
> + ret = vhost_dev_set_config(&s->dev, (uint8_t *)netcfg, 0, sizeof(*netcfg),
> + VHOST_SET_CONFIG_TYPE_MASTER);
> + if (ret) {
> + error_report("set device config space failed");
> + return;
> + }
> +}
> +
> +static uint64_t vhost_vdpa_net_get_features(VirtIODevice *vdev,
> + uint64_t features,
> + Error **errp)
> +{
> + VHostVdpaNet *s = VHOST_VDPA_NET(vdev);
> +
> + virtio_add_feature(&features, VIRTIO_NET_F_CSUM);
> + virtio_add_feature(&features, VIRTIO_NET_F_GUEST_CSUM);
> + virtio_add_feature(&features, VIRTIO_NET_F_MAC);
> + virtio_add_feature(&features, VIRTIO_NET_F_GSO);
> + virtio_add_feature(&features, VIRTIO_NET_F_GUEST_TSO4);
> + virtio_add_feature(&features, VIRTIO_NET_F_GUEST_TSO6);
> + virtio_add_feature(&features, VIRTIO_NET_F_GUEST_ECN);
> + virtio_add_feature(&features, VIRTIO_NET_F_GUEST_UFO);
> + virtio_add_feature(&features, VIRTIO_NET_F_GUEST_ANNOUNCE);
> + virtio_add_feature(&features, VIRTIO_NET_F_HOST_TSO4);
> + virtio_add_feature(&features, VIRTIO_NET_F_HOST_TSO6);
> + virtio_add_feature(&features, VIRTIO_NET_F_HOST_ECN);
> + virtio_add_feature(&features, VIRTIO_NET_F_HOST_UFO);
> + virtio_add_feature(&features, VIRTIO_NET_F_MRG_RXBUF);
> + virtio_add_feature(&features, VIRTIO_NET_F_STATUS);
> + virtio_add_feature(&features, VIRTIO_NET_F_CTRL_VQ);
> + virtio_add_feature(&features, VIRTIO_NET_F_CTRL_RX);
> + virtio_add_feature(&features, VIRTIO_NET_F_CTRL_VLAN);
> + virtio_add_feature(&features, VIRTIO_NET_F_CTRL_RX_EXTRA);
> + virtio_add_feature(&features, VIRTIO_NET_F_CTRL_MAC_ADDR);
> + virtio_add_feature(&features, VIRTIO_NET_F_MQ);
Any reason for those hand crafted features?
> +
> + return vhost_get_features(&s->dev, vdpa_feature_bits, features);
> +}
> +
> +static int vhost_vdpa_net_start(VirtIODevice *vdev, Error **errp)
> +{
> + VHostVdpaNet *s = VHOST_VDPA_NET(vdev);
> + BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
> + VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
> + int i, ret;
> +
> + if (!k->set_guest_notifiers) {
> + error_setg(errp, "binding does not support guest notifiers");
> + return -ENOSYS;
> + }
> +
> + ret = vhost_dev_enable_notifiers(&s->dev, vdev);
> + if (ret < 0) {
> + error_setg_errno(errp, -ret, "Error enabling host notifiers");
> + return ret;
> + }
> +
> + ret = k->set_guest_notifiers(qbus->parent, s->dev.nvqs, true);
> + if (ret < 0) {
> + error_setg_errno(errp, -ret, "Error binding guest notifier");
> + goto err_host_notifiers;
> + }
> +
> + s->dev.acked_features = vdev->guest_features;
> +
> + ret = vhost_dev_start(&s->dev, vdev);
> + if (ret < 0) {
> + error_setg_errno(errp, -ret, "Error starting vhost");
> + goto err_guest_notifiers;
> + }
> + s->started = true;
> +
> + /* guest_notifier_mask/pending not used yet, so just unmask
> + * everything here. virtio-pci will do the right thing by
> + * enabling/disabling irqfd.
> + */
> + for (i = 0; i < s->dev.nvqs; i++) {
> + vhost_virtqueue_mask(&s->dev, vdev, i, false);
> + }
> +
> + return ret;
> +
> +err_guest_notifiers:
> + k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false);
> +err_host_notifiers:
> + vhost_dev_disable_notifiers(&s->dev, vdev);
> + return ret;
> +}
> +
> +static void vhost_vdpa_net_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> +{
> + VHostVdpaNet *s = VHOST_VDPA_NET(vdev);
> + Error *local_err = NULL;
> + int i, ret;
> +
> + if (!vdev->start_on_kick) {
> + return;
> + }
> +
> + if (s->dev.started) {
> + return;
> + }
> +
> + /* Some guests kick before setting VIRTIO_CONFIG_S_DRIVER_OK so start
> + * vhost here instead of waiting for .set_status().
> + */
> + ret = vhost_vdpa_net_start(vdev, &local_err);
> + if (ret < 0) {
> + error_reportf_err(local_err, "vhost-vdpa-net: start failed: ");
> + return;
> + }
> +
> + /* Kick right away to begin processing requests already in vring */
> + for (i = 0; i < s->dev.nvqs; i++) {
> + VirtQueue *kick_vq = virtio_get_queue(vdev, i);
> +
> + if (!virtio_queue_get_desc_addr(vdev, i)) {
> + continue;
> + }
> + event_notifier_set(virtio_queue_get_host_notifier(kick_vq));
> + }
> +}
> +
> +static void vhost_vdpa_net_stop(VirtIODevice *vdev)
> +{
> + VHostVdpaNet *s = VHOST_VDPA_NET(vdev);
> + BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
> + VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
> + int ret;
> +
> + if (!s->started) {
> + return;
> + }
> + s->started = false;
> +
> + if (!k->set_guest_notifiers) {
> + return;
> + }
> +
> + vhost_dev_stop(&s->dev, vdev);
> +
> + ret = k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false);
> + if (ret < 0) {
> + error_report("vhost guest notifier cleanup failed: %d", ret);
> + return;
> + }
> +
> + vhost_dev_disable_notifiers(&s->dev, vdev);
> +}
> +
> +static void vhost_vdpa_net_set_status(VirtIODevice *vdev, uint8_t status)
> +{
> + VHostVdpaNet *s = VHOST_VDPA_NET(vdev);
> + bool should_start = virtio_device_started(vdev, status);
> + Error *local_err = NULL;
> + int ret;
> +
> + if (!vdev->vm_running) {
> + should_start = false;
> + }
> +
> + if (s->started == should_start) {
> + return;
> + }
> +
> + if (should_start) {
> + ret = vhost_vdpa_net_start(vdev, &local_err);
> + if (ret < 0) {
> + error_reportf_err(local_err, "vhost-vdpa-net: start failed: ");
> + }
> + } else {
> + vhost_vdpa_net_stop(vdev);
> + }
> +}
> +
> +static void vhost_vdpa_net_unrealize(VHostVdpaNet *s)
> +{
> + VirtIODevice *vdev = VIRTIO_DEVICE(s);
> + int i;
> +
> + for (i = 0; i < s->queue_pairs * 2; i++) {
> + virtio_delete_queue(s->virtqs[i]);
> + }
> + /* ctrl vq */
> + virtio_delete_queue(s->virtqs[i]);
> +
> + g_free(s->virtqs);
> + virtio_cleanup(vdev);
> +}
> +
> +static void vhost_vdpa_net_device_realize(DeviceState *dev, Error **errp)
> +{
> + VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> + VHostVdpaNet *s = VHOST_VDPA_NET(vdev);
> + int i, ret;
> +
> + s->vdpa.device_fd = qemu_open_old(s->vdpa_dev, O_RDWR);
> + if (s->vdpa.device_fd == -1) {
> + error_setg(errp, "vhost-vdpa-net: open %s failed: %s",
> + s->vdpa_dev, strerror(errno));
> + return;
> + }
> +
> + virtio_init(vdev, "virtio-net", VIRTIO_ID_NET,
> + sizeof(struct virtio_net_config));
> +
> + s->dev.nvqs = s->queue_pairs * 2 + 1;
> + s->dev.vqs = g_new0(struct vhost_virtqueue, s->dev.nvqs);
> + s->dev.vq_index = 0;
> + s->dev.vq_index_end = s->dev.nvqs;
> + s->dev.backend_features = 0;
> + s->started = false;
> +
> + s->virtqs = g_new0(VirtQueue *, s->dev.nvqs);
> + for (i = 0; i < s->dev.nvqs; i++) {
> + s->virtqs[i] = virtio_add_queue(vdev, s->queue_size,
> + vhost_vdpa_net_handle_output);
We should check whether MQ is negotiated since the index varies
depending on that.
> + }
> +
> + ret = vhost_dev_init(&s->dev, &s->vdpa, VHOST_BACKEND_TYPE_VDPA, 0, NULL);
> + if (ret < 0) {
> + error_setg(errp, "vhost-vdpa-net: vhost initialization failed: %s",
> + strerror(-ret));
> + goto init_err;
> + }
> +
> + ret = vhost_dev_get_config(&s->dev, (uint8_t *)&s->netcfg,
> + sizeof(struct virtio_net_config), NULL);
> + if (ret < 0) {
> + error_setg(errp, "vhost-vdpa-net: get network config failed");
> + goto config_err;
> + }
> +
> + return;
> +config_err:
> + vhost_dev_cleanup(&s->dev);
> +init_err:
> + vhost_vdpa_net_unrealize(s);
> + close(s->vdpa.device_fd);
> +}
> +
> +static void vhost_vdpa_net_device_unrealize(DeviceState *dev)
> +{
> + VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> + VHostVdpaNet *s = VHOST_VDPA_NET(vdev);
> +
> + virtio_set_status(vdev, 0);
> + vhost_dev_cleanup(&s->dev);
> + vhost_vdpa_net_unrealize(s);
> + close(s->vdpa.device_fd);
> +}
> +
> +static const VMStateDescription vmstate_vhost_vdpa_net = {
> + .name = "vhost-vdpa-net",
> + .minimum_version_id = 1,
> + .version_id = 1,
> + .fields = (VMStateField[]) {
> + VMSTATE_VIRTIO_DEVICE,
> + VMSTATE_END_OF_LIST()
> + },
> +};
> +
> +static void vhost_vdpa_net_instance_init(Object *obj)
> +{
> + VHostVdpaNet *s = VHOST_VDPA_NET(obj);
> +
> + device_add_bootindex_property(obj, &s->bootindex, "bootindex",
> + "/ethernet-phy@0,0", DEVICE(obj));
> +}
> +
> +static Property vhost_vdpa_net_properties[] = {
> + DEFINE_PROP_STRING("vdpa-dev", VHostVdpaNet, vdpa_dev),
> + DEFINE_PROP_UINT16("queue-pairs", VHostVdpaNet, queue_pairs,
> + VHOST_VDPA_NET_AUTO_QUEUE_PAIRS),
Any reason that we need the queue pairs parameter? Note that it is
expected to be provisioned by the netlink for the management device.
Thanks
next prev parent reply other threads:[~2021-12-08 6:28 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-08 5:20 [RFC] vhost-vdpa-net: add vhost-vdpa-net host device support Longpeng(Mike) via
2021-12-08 6:27 ` Jason Wang [this message]
2021-12-11 5:23 ` longpeng2--- via
2021-12-13 3:23 ` Jason Wang
2021-12-14 0:15 ` longpeng2--- via
2021-12-08 19:05 ` Michael S. Tsirkin
2021-12-11 2:43 ` longpeng2--- via
2021-12-09 9:16 ` Stefan Hajnoczi
2021-12-09 15:55 ` Stefano Garzarella
2021-12-11 4:11 ` longpeng2--- via
2021-12-13 11:10 ` Stefano Garzarella
2021-12-13 15:16 ` Stefan Hajnoczi
2021-12-14 1:44 ` longpeng2--- via
2021-12-14 13:03 ` Stefan Hajnoczi
2021-12-11 3:00 ` longpeng2--- via
2021-12-12 9:30 ` Michael S. Tsirkin
2021-12-13 2:47 ` Jason Wang
2021-12-13 10:58 ` Stefano Garzarella
2021-12-13 15:14 ` Stefan Hajnoczi
2021-12-14 2:22 ` Jason Wang
2021-12-14 13:11 ` Stefan Hajnoczi
2021-12-15 3:18 ` Jason Wang
2021-12-15 10:07 ` Stefan Hajnoczi
2021-12-16 3:01 ` Jason Wang
2021-12-16 9:10 ` Stefan Hajnoczi
2021-12-17 4:26 ` Jason Wang
2021-12-17 8:35 ` Stefan Hajnoczi
2021-12-20 2:48 ` Jason Wang
2021-12-20 8:11 ` Stefan Hajnoczi
2021-12-20 9:17 ` longpeng2--- via
2021-12-20 14:05 ` Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CACGkMEvW0W-mMU159bUyDo2jK03FYQEn3ZedbC9vaEvDj2v7KQ@mail.gmail.com \
--to=jasowang@redhat.com \
--cc=arei.gonglei@huawei.com \
--cc=longpeng2@huawei.com \
--cc=mst@redhat.com \
--cc=parav@nvidia.com \
--cc=qemu-devel@nongnu.org \
--cc=sgarzare@redhat.com \
--cc=stefanha@redhat.com \
--cc=xieyongji@bytedance.com \
--cc=yechuan@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).