From: Bobby Eshleman <bobbyeshleman@gmail.com>
To: Arseniy Krasnov <oxffffaa@gmail.com>
Cc: Bobby Eshleman <bobby.eshleman@bytedance.com>,
Stefan Hajnoczi <stefanha@redhat.com>,
Stefano Garzarella <sgarzare@redhat.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
Jason Wang <jasowang@redhat.com>,
Xuan Zhuo <xuanzhuo@linux.alibaba.com>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
"K. Y. Srinivasan" <kys@microsoft.com>,
Haiyang Zhang <haiyangz@microsoft.com>,
Wei Liu <wei.liu@kernel.org>, Dexuan Cui <decui@microsoft.com>,
Bryan Tan <bryantan@vmware.com>, Vishnu Dasa <vdasa@vmware.com>,
VMware PV-Drivers Reviewers <pv-drivers@vmware.com>,
Dan Carpenter <dan.carpenter@linaro.org>,
Simon Horman <simon.horman@corigine.com>,
kvm@vger.kernel.org, virtualization@lists.linux-foundation.org,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-hyperv@vger.kernel.org, bpf@vger.kernel.org
Subject: Re: [PATCH RFC net-next v5 11/14] vhost/vsock: implement datagram support
Date: Wed, 26 Jul 2023 17:55:04 +0000 [thread overview]
Message-ID: <ZMFd+Jd/LrfpJsVA@bullseye> (raw)
In-Reply-To: <b15d237e-31b5-40ae-83fc-e71649febd2b@gmail.com>
On Sat, Jul 22, 2023 at 11:42:38AM +0300, Arseniy Krasnov wrote:
>
>
> On 19.07.2023 03:50, Bobby Eshleman wrote:
> > This commit implements datagram support for vhost/vsock by teaching
> > vhost to use the common virtio transport datagram functions.
> >
> > If the virtio RX buffer is too small, then the transmission is
> > abandoned, the packet dropped, and EHOSTUNREACH is added to the socket's
> > error queue.
> >
> > Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
> > ---
> > drivers/vhost/vsock.c | 62 +++++++++++++++++++++++++++++++++++++++++++++---
> > net/vmw_vsock/af_vsock.c | 5 +++-
> > 2 files changed, 63 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> > index d5d6a3c3f273..da14260c6654 100644
> > --- a/drivers/vhost/vsock.c
> > +++ b/drivers/vhost/vsock.c
> > @@ -8,6 +8,7 @@
> > */
> > #include <linux/miscdevice.h>
> > #include <linux/atomic.h>
> > +#include <linux/errqueue.h>
> > #include <linux/module.h>
> > #include <linux/mutex.h>
> > #include <linux/vmalloc.h>
> > @@ -32,7 +33,8 @@
> > enum {
> > VHOST_VSOCK_FEATURES = VHOST_FEATURES |
> > (1ULL << VIRTIO_F_ACCESS_PLATFORM) |
> > - (1ULL << VIRTIO_VSOCK_F_SEQPACKET)
> > + (1ULL << VIRTIO_VSOCK_F_SEQPACKET) |
> > + (1ULL << VIRTIO_VSOCK_F_DGRAM)
> > };
> >
> > enum {
> > @@ -56,6 +58,7 @@ struct vhost_vsock {
> > atomic_t queued_replies;
> >
> > u32 guest_cid;
> > + bool dgram_allow;
> > bool seqpacket_allow;
> > };
> >
> > @@ -86,6 +89,32 @@ static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
> > return NULL;
> > }
> >
> > +/* Claims ownership of the skb, do not free the skb after calling! */
> > +static void
> > +vhost_transport_error(struct sk_buff *skb, int err)
> > +{
> > + struct sock_exterr_skb *serr;
> > + struct sock *sk = skb->sk;
> > + struct sk_buff *clone;
> > +
> > + serr = SKB_EXT_ERR(skb);
> > + memset(serr, 0, sizeof(*serr));
> > + serr->ee.ee_errno = err;
> > + serr->ee.ee_origin = SO_EE_ORIGIN_NONE;
> > +
> > + clone = skb_clone(skb, GFP_KERNEL);
>
> May for skb which is error carrier we can use 'sock_omalloc()', not 'skb_clone()' ? TCP uses skb
> allocated by this function as carriers of error structure. I guess 'skb_clone()' also clones data of origin,
> but i think that there is no need in data as we insert it to error queue of the socket.
>
> What do You think?
IIUC skb_clone() is often used in this scenario so that the user can
retrieve the error-causing packet from the error queue. Is there some
reason we shouldn't do this?
I'm seeing that the serr bits need to occur on the clone here, not the
original. I didn't realize the SKB_EXT_ERR() is a skb->cb cast. I'm not
actually sure how this passes the test case since ->cb isn't cloned.
>
> > + if (!clone)
> > + return;
>
> What will happen here 'if (!clone)' ? skb will leak as it was removed from queue?
>
Ah yes, true.
> > +
> > + if (sock_queue_err_skb(sk, clone))
> > + kfree_skb(clone);
> > +
> > + sk->sk_err = err;
> > + sk_error_report(sk);
> > +
> > + kfree_skb(skb);
> > +}
> > +
> > static void
> > vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> > struct vhost_virtqueue *vq)
> > @@ -160,9 +189,15 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
> > hdr = virtio_vsock_hdr(skb);
> >
> > /* If the packet is greater than the space available in the
> > - * buffer, we split it using multiple buffers.
> > + * buffer, we split it using multiple buffers for connectible
> > + * sockets and drop the packet for datagram sockets.
> > */
> > if (payload_len > iov_len - sizeof(*hdr)) {
> > + if (le16_to_cpu(hdr->type) == VIRTIO_VSOCK_TYPE_DGRAM) {
> > + vhost_transport_error(skb, EHOSTUNREACH);
> > + continue;
> > + }
> > +
> > payload_len = iov_len - sizeof(*hdr);
> >
> > /* As we are copying pieces of large packet's buffer to
> > @@ -394,6 +429,7 @@ static bool vhost_vsock_more_replies(struct vhost_vsock *vsock)
> > return val < vq->num;
> > }
> >
> > +static bool vhost_transport_dgram_allow(u32 cid, u32 port);
> > static bool vhost_transport_seqpacket_allow(u32 remote_cid);
> >
> > static struct virtio_transport vhost_transport = {
> > @@ -410,7 +446,8 @@ static struct virtio_transport vhost_transport = {
> > .cancel_pkt = vhost_transport_cancel_pkt,
> >
> > .dgram_enqueue = virtio_transport_dgram_enqueue,
> > - .dgram_allow = virtio_transport_dgram_allow,
> > + .dgram_allow = vhost_transport_dgram_allow,
> > + .dgram_addr_init = virtio_transport_dgram_addr_init,
> >
> > .stream_enqueue = virtio_transport_stream_enqueue,
> > .stream_dequeue = virtio_transport_stream_dequeue,
> > @@ -443,6 +480,22 @@ static struct virtio_transport vhost_transport = {
> > .send_pkt = vhost_transport_send_pkt,
> > };
> >
> > +static bool vhost_transport_dgram_allow(u32 cid, u32 port)
> > +{
> > + struct vhost_vsock *vsock;
> > + bool dgram_allow = false;
> > +
> > + rcu_read_lock();
> > + vsock = vhost_vsock_get(cid);
> > +
> > + if (vsock)
> > + dgram_allow = vsock->dgram_allow;
> > +
> > + rcu_read_unlock();
> > +
> > + return dgram_allow;
> > +}
> > +
> > static bool vhost_transport_seqpacket_allow(u32 remote_cid)
> > {
> > struct vhost_vsock *vsock;
> > @@ -799,6 +852,9 @@ static int vhost_vsock_set_features(struct vhost_vsock *vsock, u64 features)
> > if (features & (1ULL << VIRTIO_VSOCK_F_SEQPACKET))
> > vsock->seqpacket_allow = true;
> >
> > + if (features & (1ULL << VIRTIO_VSOCK_F_DGRAM))
> > + vsock->dgram_allow = true;
> > +
> > for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++) {
> > vq = &vsock->vqs[i];
> > mutex_lock(&vq->mutex);
> > diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> > index e73f3b2c52f1..449ed63ac2b0 100644
> > --- a/net/vmw_vsock/af_vsock.c
> > +++ b/net/vmw_vsock/af_vsock.c
> > @@ -1427,9 +1427,12 @@ int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
> > return prot->recvmsg(sk, msg, len, flags, NULL);
> > #endif
> >
> > - if (flags & MSG_OOB || flags & MSG_ERRQUEUE)
> > + if (unlikely(flags & MSG_OOB))
> > return -EOPNOTSUPP;
> >
> > + if (unlikely(flags & MSG_ERRQUEUE))
> > + return sock_recv_errqueue(sk, msg, len, SOL_VSOCK, 0);
> > +
>
> Sorry, but I get build error here, because SOL_VSOCK in undefined. I think it should be added to
> include/linux/socket.h and to uapi files also for future use in userspace.
>
Strange, I built each patch individually without issue. My base is
netdev/main with your SOL_VSOCK patch applied. I will look today and see
if I'm missing something.
> Also Stefano Garzarella <sgarzare@redhat.com> suggested to add define something like VSOCK_RECVERR,
> in the same way as IP_RECVERR, and use it as last parameter of 'sock_recv_errqueue()'.
>
Got it, thanks.
> > transport = vsk->transport;
> >
> > /* Retrieve the head sk_buff from the socket's receive queue. */
> >
>
> Thanks, Arseniy
Thanks,
Bobby
next prev parent reply other threads:[~2023-07-26 17:55 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-19 0:50 [PATCH RFC net-next v5 00/14] virtio/vsock: support datagrams Bobby Eshleman
2023-07-19 0:50 ` [PATCH RFC net-next v5 01/14] af_vsock: generalize vsock_dgram_recvmsg() to all transports Bobby Eshleman
2023-07-24 18:11 ` Arseniy Krasnov
2023-07-26 18:21 ` Bobby Eshleman
2023-07-27 7:53 ` Arseniy Krasnov
2023-07-19 0:50 ` [PATCH RFC net-next v5 02/14] af_vsock: refactor transport lookup code Bobby Eshleman
2023-07-19 0:50 ` [PATCH RFC net-next v5 03/14] af_vsock: support multi-transport datagrams Bobby Eshleman
2023-07-22 21:53 ` Arseniy Krasnov
2023-08-02 22:24 ` Bobby Eshleman
2023-08-03 0:53 ` Bobby Eshleman
2023-08-03 12:42 ` Stefano Garzarella
2023-08-03 12:42 ` Stefano Garzarella
2023-08-03 18:58 ` Bobby Eshleman
2023-08-04 14:11 ` Stefano Garzarella
2023-08-04 14:11 ` Stefano Garzarella
2023-08-04 17:01 ` Bobby Eshleman
2023-07-19 0:50 ` [PATCH RFC net-next v5 04/14] af_vsock: generalize bind table functions Bobby Eshleman
2023-07-19 0:50 ` [PATCH RFC net-next v5 05/14] af_vsock: use a separate dgram bind table Bobby Eshleman
2023-07-19 0:50 ` [PATCH RFC net-next v5 06/14] virtio/vsock: add VIRTIO_VSOCK_TYPE_DGRAM Bobby Eshleman
2023-07-19 0:50 ` [PATCH RFC net-next v5 07/14] virtio/vsock: add common datagram send path Bobby Eshleman
2023-07-22 8:16 ` Arseniy Krasnov
2023-07-26 17:08 ` Bobby Eshleman
2023-07-27 7:57 ` Arseniy Krasnov
2023-08-02 20:08 ` Bobby Eshleman
2023-07-19 0:50 ` [PATCH RFC net-next v5 08/14] af_vsock: add vsock_find_bound_dgram_socket() Bobby Eshleman
2023-07-19 0:50 ` [PATCH RFC net-next v5 09/14] virtio/vsock: add common datagram recv path Bobby Eshleman
2023-07-19 0:50 ` [PATCH RFC net-next v5 10/14] virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit Bobby Eshleman
2023-07-26 18:38 ` Michael S. Tsirkin
2023-07-26 18:38 ` Michael S. Tsirkin
2023-07-27 7:48 ` Stefano Garzarella
2023-07-27 7:48 ` Stefano Garzarella
2023-08-01 4:30 ` Bobby Eshleman
2023-08-01 16:04 ` Stefano Garzarella
2023-08-01 16:04 ` Stefano Garzarella
2023-07-19 0:50 ` [PATCH RFC net-next v5 11/14] vhost/vsock: implement datagram support Bobby Eshleman
2023-07-22 8:42 ` Arseniy Krasnov
2023-07-26 17:55 ` Bobby Eshleman [this message]
2023-07-27 8:00 ` Arseniy Krasnov
2023-08-02 21:23 ` Bobby Eshleman
2023-08-04 13:16 ` Arseniy Krasnov
2023-07-27 8:04 ` Arseniy Krasnov
2023-07-26 18:40 ` Michael S. Tsirkin
2023-07-26 18:40 ` Michael S. Tsirkin
2023-08-01 4:26 ` Bobby Eshleman
2023-08-02 19:28 ` Bobby Eshleman
2023-08-12 8:01 ` Bobby Eshleman
2023-07-19 0:50 ` [PATCH RFC net-next v5 12/14] vsock/loopback: " Bobby Eshleman
2023-07-19 0:50 ` [PATCH RFC net-next v5 13/14] virtio/vsock: " Bobby Eshleman
2023-07-22 8:45 ` Arseniy Krasnov
2023-07-26 17:58 ` Bobby Eshleman
2023-07-27 8:09 ` Arseniy Krasnov
2023-08-01 4:14 ` [PATCH RFC net-next v5 13/14] virtio/vsock: implement datagram supporty Bobby Eshleman
2023-07-19 0:50 ` [PATCH RFC net-next v5 14/14] test/vsock: add vsock dgram tests Bobby Eshleman
2023-07-22 8:51 ` [PATCH RFC net-next v5 00/14] virtio/vsock: support datagrams Arseniy Krasnov
2023-07-26 18:02 ` Bobby Eshleman
2023-07-27 7:51 ` Michael S. Tsirkin
2023-07-27 7:51 ` Michael S. Tsirkin
2023-08-01 5:30 ` Bobby Eshleman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZMFd+Jd/LrfpJsVA@bullseye \
--to=bobbyeshleman@gmail.com \
--cc=bobby.eshleman@bytedance.com \
--cc=bpf@vger.kernel.org \
--cc=bryantan@vmware.com \
--cc=dan.carpenter@linaro.org \
--cc=davem@davemloft.net \
--cc=decui@microsoft.com \
--cc=edumazet@google.com \
--cc=haiyangz@microsoft.com \
--cc=jasowang@redhat.com \
--cc=kuba@kernel.org \
--cc=kvm@vger.kernel.org \
--cc=kys@microsoft.com \
--cc=linux-hyperv@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mst@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=oxffffaa@gmail.com \
--cc=pabeni@redhat.com \
--cc=pv-drivers@vmware.com \
--cc=sgarzare@redhat.com \
--cc=simon.horman@corigine.com \
--cc=stefanha@redhat.com \
--cc=vdasa@vmware.com \
--cc=virtualization@lists.linux-foundation.org \
--cc=wei.liu@kernel.org \
--cc=xuanzhuo@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.