From: Bobby Eshleman <bobbyeshleman@gmail.com>
To: Bobby Eshleman <bobby.eshleman@gmail.com>
Cc: virtio-dev@lists.oasis-open.org,
Bobby Eshleman <bobby.eshleman@bytedance.com>,
Cong Wang <cong.wang@bytedance.com>,
Jiang Wang <jiang.wang@bytedance.com>,
Stefan Hajnoczi <stefanha@redhat.com>,
Stefano Garzarella <sgarzare@redhat.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
Jason Wang <jasowang@redhat.com>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
kvm@vger.kernel.org, virtualization@lists.linux-foundation.org,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/6] vsock: add netdev to vhost/virtio vsock
Date: Tue, 16 Aug 2022 02:31:22 +0000 [thread overview]
Message-ID: <YvsBei6p30vw7b9I@bullseye> (raw)
In-Reply-To: <5a93c5aad99d79f028d349cb7e3c128c65d5d7e2.1660362668.git.bobby.eshleman@bytedance.com>
CC'ing virtio-dev@lists.oasis-open.org
On Mon, Aug 15, 2022 at 10:56:06AM -0700, Bobby Eshleman wrote:
> In order to support usage of qdisc on vsock traffic, this commit
> introduces a struct net_device to vhost and virtio vsock.
>
> Two new devices are created, vhost-vsock for vhost and virtio-vsock
> for virtio. The devices are attached to the respective transports.
>
> To bypass the usage of the device, the user may "down" the associated
> network interface using common tools. For example, "ip link set dev
> virtio-vsock down" lets vsock bypass the net_device and qdisc entirely,
> simply using the FIFO logic of the prior implementation.
>
> For both hosts and guests, there is one device for all G2H vsock sockets
> and one device for all H2G vsock sockets. This makes sense for guests
> because the driver only supports a single vsock channel (one pair of
> TX/RX virtqueues), so one device and qdisc fits. For hosts, this may not
> seem ideal for some workloads. However, it is possible to use a
> multi-queue qdisc, where a given queue is responsible for a range of
> sockets. This seems to be a better solution than having one device per
> socket, which may yield a very large number of devices and qdiscs, all
> of which are dynamically being created and destroyed. Because of this
> dynamism, it would also require a complex policy management daemon, as
> devices would constantly be spun up and down as sockets were created and
> destroyed. To avoid this, one device and qdisc also applies to all H2G
> sockets.
>
> Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
> ---
> drivers/vhost/vsock.c | 19 +++-
> include/linux/virtio_vsock.h | 10 +++
> net/vmw_vsock/virtio_transport.c | 19 +++-
> net/vmw_vsock/virtio_transport_common.c | 112 +++++++++++++++++++++++-
> 4 files changed, 152 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index f8601d93d94d..b20ddec2664b 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -927,13 +927,30 @@ static int __init vhost_vsock_init(void)
> VSOCK_TRANSPORT_F_H2G);
> if (ret < 0)
> return ret;
> - return misc_register(&vhost_vsock_misc);
> +
> + ret = virtio_transport_init(&vhost_transport, "vhost-vsock");
> + if (ret < 0)
> + goto out_unregister;
> +
> + ret = misc_register(&vhost_vsock_misc);
> + if (ret < 0)
> + goto out_transport_exit;
> + return ret;
> +
> +out_transport_exit:
> + virtio_transport_exit(&vhost_transport);
> +
> +out_unregister:
> + vsock_core_unregister(&vhost_transport.transport);
> + return ret;
> +
> };
>
> static void __exit vhost_vsock_exit(void)
> {
> misc_deregister(&vhost_vsock_misc);
> vsock_core_unregister(&vhost_transport.transport);
> + virtio_transport_exit(&vhost_transport);
> };
>
> module_init(vhost_vsock_init);
> diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> index 9a37eddbb87a..5d7e7fbd75f8 100644
> --- a/include/linux/virtio_vsock.h
> +++ b/include/linux/virtio_vsock.h
> @@ -91,10 +91,20 @@ struct virtio_transport {
> /* This must be the first field */
> struct vsock_transport transport;
>
> + /* Used almost exclusively for qdisc */
> + struct net_device *dev;
> +
> /* Takes ownership of the packet */
> int (*send_pkt)(struct sk_buff *skb);
> };
>
> +int
> +virtio_transport_init(struct virtio_transport *t,
> + const char *name);
> +
> +void
> +virtio_transport_exit(struct virtio_transport *t);
> +
> ssize_t
> virtio_transport_stream_dequeue(struct vsock_sock *vsk,
> struct msghdr *msg,
> diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> index 3bb293fd8607..c6212eb38d3c 100644
> --- a/net/vmw_vsock/virtio_transport.c
> +++ b/net/vmw_vsock/virtio_transport.c
> @@ -131,7 +131,9 @@ virtio_transport_send_pkt_work(struct work_struct *work)
> * the vq
> */
> if (ret < 0) {
> - skb_queue_head(&vsock->send_pkt_queue, skb);
> + spin_lock_bh(&vsock->send_pkt_queue.lock);
> + __skb_queue_head(&vsock->send_pkt_queue, skb);
> + spin_unlock_bh(&vsock->send_pkt_queue.lock);
> break;
> }
>
> @@ -676,7 +678,9 @@ static void virtio_vsock_vqs_del(struct virtio_vsock *vsock)
> kfree_skb(skb);
> mutex_unlock(&vsock->tx_lock);
>
> - skb_queue_purge(&vsock->send_pkt_queue);
> + spin_lock_bh(&vsock->send_pkt_queue.lock);
> + __skb_queue_purge(&vsock->send_pkt_queue);
> + spin_unlock_bh(&vsock->send_pkt_queue.lock);
>
> /* Delete virtqueues and flush outstanding callbacks if any */
> vdev->config->del_vqs(vdev);
> @@ -760,6 +764,8 @@ static void virtio_vsock_remove(struct virtio_device *vdev)
> flush_work(&vsock->event_work);
> flush_work(&vsock->send_pkt_work);
>
> + virtio_transport_exit(&virtio_transport);
> +
> mutex_unlock(&the_virtio_vsock_mutex);
>
> kfree(vsock);
> @@ -844,12 +850,18 @@ static int __init virtio_vsock_init(void)
> if (ret)
> goto out_wq;
>
> - ret = register_virtio_driver(&virtio_vsock_driver);
> + ret = virtio_transport_init(&virtio_transport, "virtio-vsock");
> if (ret)
> goto out_vci;
>
> + ret = register_virtio_driver(&virtio_vsock_driver);
> + if (ret)
> + goto out_transport;
> +
> return 0;
>
> +out_transport:
> + virtio_transport_exit(&virtio_transport);
> out_vci:
> vsock_core_unregister(&virtio_transport.transport);
> out_wq:
> @@ -861,6 +873,7 @@ static void __exit virtio_vsock_exit(void)
> {
> unregister_virtio_driver(&virtio_vsock_driver);
> vsock_core_unregister(&virtio_transport.transport);
> + virtio_transport_exit(&virtio_transport);
> destroy_workqueue(virtio_vsock_workqueue);
> }
>
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index d5780599fe93..bdf16fff054f 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -16,6 +16,7 @@
>
> #include <net/sock.h>
> #include <net/af_vsock.h>
> +#include <net/pkt_sched.h>
>
> #define CREATE_TRACE_POINTS
> #include <trace/events/vsock_virtio_transport_common.h>
> @@ -23,6 +24,93 @@
> /* How long to wait for graceful shutdown of a connection */
> #define VSOCK_CLOSE_TIMEOUT (8 * HZ)
>
> +struct virtio_transport_priv {
> + struct virtio_transport *trans;
> +};
> +
> +static netdev_tx_t virtio_transport_start_xmit(struct sk_buff *skb, struct net_device *dev)
> +{
> + struct virtio_transport *t =
> + ((struct virtio_transport_priv *)netdev_priv(dev))->trans;
> + int ret;
> +
> + ret = t->send_pkt(skb);
> + if (unlikely(ret == -ENODEV))
> + return NETDEV_TX_BUSY;
> +
> + return NETDEV_TX_OK;
> +}
> +
> +const struct net_device_ops virtio_transport_netdev_ops = {
> + .ndo_start_xmit = virtio_transport_start_xmit,
> +};
> +
> +static void virtio_transport_setup(struct net_device *dev)
> +{
> + dev->netdev_ops = &virtio_transport_netdev_ops;
> + dev->needs_free_netdev = true;
> + dev->flags = IFF_NOARP;
> + dev->mtu = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
> + dev->tx_queue_len = DEFAULT_TX_QUEUE_LEN;
> +}
> +
> +static int ifup(struct net_device *dev)
> +{
> + int ret;
> +
> + rtnl_lock();
> + ret = dev_open(dev, NULL) ? -ENOMEM : 0;
> + rtnl_unlock();
> +
> + return ret;
> +}
> +
> +/* virtio_transport_init - initialize a virtio vsock transport layer
> + *
> + * @t: ptr to the virtio transport struct to initialize
> + * @name: the name of the net_device to be created.
> + *
> + * Return 0 on success, otherwise negative errno.
> + */
> +int virtio_transport_init(struct virtio_transport *t, const char *name)
> +{
> + struct virtio_transport_priv *priv;
> + int ret;
> +
> + t->dev = alloc_netdev(sizeof(*priv), name, NET_NAME_UNKNOWN, virtio_transport_setup);
> + if (!t->dev)
> + return -ENOMEM;
> +
> + priv = netdev_priv(t->dev);
> + priv->trans = t;
> +
> + ret = register_netdev(t->dev);
> + if (ret < 0)
> + goto out_free_netdev;
> +
> + ret = ifup(t->dev);
> + if (ret < 0)
> + goto out_unregister_netdev;
> +
> + return 0;
> +
> +out_unregister_netdev:
> + unregister_netdev(t->dev);
> +
> +out_free_netdev:
> + free_netdev(t->dev);
> +
> + return ret;
> +}
> +
> +void virtio_transport_exit(struct virtio_transport *t)
> +{
> + if (t->dev) {
> + unregister_netdev(t->dev);
> + free_netdev(t->dev);
> + }
> +}
> +
> static const struct virtio_transport *
> virtio_transport_get_ops(struct vsock_sock *vsk)
> {
> @@ -147,6 +235,24 @@ static u16 virtio_transport_get_type(struct sock *sk)
> return VIRTIO_VSOCK_TYPE_SEQPACKET;
> }
>
> +/* Return pkt->len on success, otherwise negative errno */
> +static int virtio_transport_send_pkt(const struct virtio_transport *t, struct sk_buff *skb)
> +{
> + int ret;
> + int len = skb->len;
> +
> + if (unlikely(!t->dev || !(t->dev->flags & IFF_UP)))
> + return t->send_pkt(skb);
> +
> + skb->dev = t->dev;
> + ret = dev_queue_xmit(skb);
> +
> + if (likely(ret == NET_XMIT_SUCCESS || ret == NET_XMIT_CN))
> + return len;
> +
> + return -ENOMEM;
> +}
> +
> /* This function can only be used on connecting/connected sockets,
> * since a socket assigned to a transport is required.
> *
> @@ -202,9 +308,7 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>
> virtio_transport_inc_tx_pkt(vvs, skb);
>
> - err = t_ops->send_pkt(skb);
> -
> - return err < 0 ? -ENOMEM : err;
> + return virtio_transport_send_pkt(t_ops, skb);
> }
>
> static bool virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs,
> @@ -834,7 +938,7 @@ static int virtio_transport_reset_no_sock(const struct virtio_transport *t,
> return -ENOTCONN;
> }
>
> - return t->send_pkt(reply);
> + return virtio_transport_send_pkt(t, reply);
> }
>
> /* This function should be called with sk_lock held and SOCK_DONE set */
> --
> 2.35.1
>
next prev parent reply other threads:[~2022-08-16 16:27 UTC|newest]
Thread overview: 67+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-15 17:56 [PATCH 0/6] virtio/vsock: introduce dgrams, sk_buff, and qdisc Bobby Eshleman
2022-08-15 17:56 ` [PATCH 1/6] vsock: replace virtio_vsock_pkt with sk_buff Bobby Eshleman
2022-08-16 2:30 ` Bobby Eshleman
2022-08-15 17:56 ` [PATCH 2/6] vsock: return errors other than -ENOMEM to socket Bobby Eshleman
2022-08-15 20:01 ` kernel test robot
2022-08-15 23:13 ` kernel test robot
2022-08-16 2:16 ` kernel test robot
2022-08-16 2:30 ` Bobby Eshleman
2022-08-17 5:28 ` [virtio-dev] " Arseniy Krasnov
2022-09-26 13:21 ` Stefano Garzarella
2022-09-26 21:30 ` Bobby Eshleman
2022-08-15 17:56 ` [PATCH 3/6] vsock: add netdev to vhost/virtio vsock Bobby Eshleman
2022-08-16 2:31 ` Bobby Eshleman [this message]
2022-08-16 16:38 ` Michael S. Tsirkin
2022-08-16 6:18 ` Bobby Eshleman
2022-08-16 18:07 ` Jakub Kicinski
2022-08-16 7:02 ` Bobby Eshleman
2022-08-16 23:07 ` Jakub Kicinski
2022-08-16 8:29 ` Bobby Eshleman
2022-08-17 1:15 ` Jakub Kicinski
2022-08-16 10:50 ` Bobby Eshleman
2022-08-17 17:20 ` Michael S. Tsirkin
2022-08-18 4:34 ` Jason Wang
2022-08-17 1:23 ` [External] " Cong Wang .
2022-09-06 10:58 ` Michael S. Tsirkin
2022-08-18 14:20 ` Bobby Eshleman
2022-08-15 17:56 ` [PATCH 4/6] virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit Bobby Eshleman
2022-08-16 2:31 ` Bobby Eshleman
2022-09-26 13:17 ` Stefano Garzarella
2022-09-26 21:52 ` Bobby Eshleman
2022-08-15 17:56 ` [PATCH 5/6] virtio/vsock: add support for dgram Bobby Eshleman
2022-08-15 21:02 ` kernel test robot
2022-08-16 2:32 ` Bobby Eshleman
2022-08-17 5:01 ` [virtio-dev] " Arseniy Krasnov
2022-08-16 9:57 ` Bobby Eshleman
2022-08-18 8:24 ` Arseniy Krasnov
2022-08-17 5:42 ` Arseniy Krasnov
2022-08-16 9:58 ` Bobby Eshleman
2022-08-18 8:35 ` Arseniy Krasnov
2022-08-16 20:52 ` Bobby Eshleman
2022-08-19 4:30 ` Arseniy Krasnov
2022-08-15 17:56 ` [PATCH 6/6] vsock_test: add tests for vsock dgram Bobby Eshleman
2022-08-16 2:32 ` Bobby Eshleman
2022-08-15 20:39 ` [PATCH 0/6] virtio/vsock: introduce dgrams, sk_buff, and qdisc Michael S. Tsirkin
2022-08-16 1:55 ` Bobby Eshleman
2022-08-16 2:29 ` Bobby Eshleman
[not found] ` <CAGxU2F7+L-UiNPtUm4EukOgTVJ1J6Orqs1LMvhWWvfL9zWb23g@mail.gmail.com>
2022-08-16 2:35 ` Bobby Eshleman
2022-08-17 6:54 ` Michael S. Tsirkin
2022-08-16 9:42 ` Bobby Eshleman
2022-08-17 17:02 ` Michael S. Tsirkin
2022-08-16 11:08 ` Bobby Eshleman
2022-08-17 17:53 ` Michael S. Tsirkin
2022-08-16 12:10 ` Bobby Eshleman
2022-08-18 4:28 ` Jason Wang
2022-09-06 9:00 ` Stefano Garzarella
2022-09-06 13:26 ` Stefan Hajnoczi
2022-08-18 14:39 ` Bobby Eshleman
2022-09-08 8:30 ` Stefano Garzarella
2022-09-08 14:36 ` Call to discuss vsock netdev/sk_buff [was Re: [PATCH 0/6] virtio/vsock: introduce dgrams, sk_buff, and qdisc] Stefano Garzarella
2022-09-09 18:13 ` Bobby Eshleman
2022-09-12 18:12 ` Stefano Garzarella
2022-09-09 23:33 ` Bobby Eshleman
2022-09-16 3:51 ` Stefano Garzarella
2022-09-10 16:29 ` Bobby Eshleman
2022-09-26 13:42 ` [PATCH 0/6] virtio/vsock: introduce dgrams, sk_buff, and qdisc Stefano Garzarella
2022-09-26 21:44 ` Bobby Eshleman
2022-09-27 17:45 ` Stefano Garzarella
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YvsBei6p30vw7b9I@bullseye \
--to=bobbyeshleman@gmail.com \
--cc=bobby.eshleman@bytedance.com \
--cc=bobby.eshleman@gmail.com \
--cc=cong.wang@bytedance.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=jasowang@redhat.com \
--cc=jiang.wang@bytedance.com \
--cc=kuba@kernel.org \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mst@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sgarzare@redhat.com \
--cc=stefanha@redhat.com \
--cc=virtio-dev@lists.oasis-open.org \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox