Re: [PATCH v5] virtio/vsock: replace virtio_vsock_pkt with sk_buff

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Bobby Eshleman <bobbyeshleman@gmail.com>
To: Paolo Abeni <pabeni@redhat.com>
Cc: Bobby Eshleman <bobby.eshleman@bytedance.com>,
	Cong Wang <cong.wang@bytedance.com>,
	Jiang Wang <jiang.wang@bytedance.com>,
	Krasnov Arseniy <oxffffaa@gmail.com>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Stefano Garzarella <sgarzare@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Jason Wang <jasowang@redhat.com>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>,
	kvm@vger.kernel.org, virtualization@lists.linux-foundation.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v5] virtio/vsock: replace virtio_vsock_pkt with sk_buff
Date: Mon, 21 Nov 2022 12:01:12 +0000	[thread overview]
Message-ID: <Y3toiPtBgOcrb8TL@bullseye> (raw)
In-Reply-To: <863a58452b4a4c0d63a41b0f78b59d32919067fa.camel@redhat.com>

On Tue, Dec 06, 2022 at 11:20:21AM +0100, Paolo Abeni wrote:
> Hello,
> 
> On Fri, 2022-12-02 at 09:35 -0800, Bobby Eshleman wrote:
> [...]
> > diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
> > index 35d7eedb5e8e..6c0b2d4da3fe 100644
> > --- a/include/linux/virtio_vsock.h
> > +++ b/include/linux/virtio_vsock.h
> > @@ -3,10 +3,129 @@
> >  #define _LINUX_VIRTIO_VSOCK_H
> >  
> >  #include <uapi/linux/virtio_vsock.h>
> > +#include <linux/bits.h>
> >  #include <linux/socket.h>
> >  #include <net/sock.h>
> >  #include <net/af_vsock.h>
> >  
> > +#define VIRTIO_VSOCK_SKB_HEADROOM (sizeof(struct virtio_vsock_hdr))
> > +
> > +enum virtio_vsock_skb_flags {
> > +	VIRTIO_VSOCK_SKB_FLAGS_REPLY		= BIT(0),
> > +	VIRTIO_VSOCK_SKB_FLAGS_TAP_DELIVERED	= BIT(1),
> > +};
> > +
> > +static inline struct virtio_vsock_hdr *virtio_vsock_hdr(struct sk_buff *skb)
> > +{
> > +	return (struct virtio_vsock_hdr *)skb->head;
> > +}
> > +
> > +static inline bool virtio_vsock_skb_reply(struct sk_buff *skb)
> > +{
> > +	return skb->_skb_refdst & VIRTIO_VSOCK_SKB_FLAGS_REPLY;
> > +}
> 
> I'm sorry for the late feedback. The above is extremelly risky: if the
> skb will land later into the networking stack, we could experience the
> most difficult to track bugs.
> 
> You should use the skb control buffer instead (skb->cb), with the
> additional benefit you could use e.g. bool - the compiler could emit
> better code to manipulate such fields - and you will not need to clear
> the field before release nor enqueue.
> 
> [...]
> 

Hey Paolo, thank you for the review. For my own learning, this would
happen presumably when the skb is dropped? And I assume we don't see
this in sockmap because it is always cleared before leaving sockmap's
hands? I sanity checked this patch with an out-of-tree patch I have that
uses the networking stack, but I suspect I didn't see issues because my
test harness didn't induce dropping...

I originally avoided skb->cb because the reply flag is set at allocation
and would potentially be clobbered by a pass through the networking
stack. The reply flag would be used after a pass through the networking
stack (e.g., during transmission at the device level and when sockets
close while skbs are still queued for xmit).

I suppose using skb->cb would look like something like this:
- use skb_clone() for reply skbs
- set reply on the cloned sk_buff skb->cb
- keep a hashmap mapping original skb to cloned skb (is there a better
  way?)
- when choosing to apply reply logic, if skb_cloned() refer to the
  hashmap

Is there a better/simpler way to maintain skb->cb?

> > @@ -352,37 +360,38 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
> >  				   size_t len)
> >  {
> >  	struct virtio_vsock_sock *vvs = vsk->trans;
> > -	struct virtio_vsock_pkt *pkt;
> >  	size_t bytes, total = 0;
> > -	u32 free_space;
> > +	struct sk_buff *skb;
> >  	int err = -EFAULT;
> > +	u32 free_space;
> >  
> >  	spin_lock_bh(&vvs->rx_lock);
> > -	while (total < len && !list_empty(&vvs->rx_queue)) {
> > -		pkt = list_first_entry(&vvs->rx_queue,
> > -				       struct virtio_vsock_pkt, list);
> > +	while (total < len && !skb_queue_empty_lockless(&vvs->rx_queue)) {
> > +		skb = __skb_dequeue(&vvs->rx_queue);
> 
> Here the locking schema is confusing. It looks like vvs->rx_queue is
> under vvs->rx_lock protection, so the above should be skb_queue_empty()
> instead of the lockless variant.
> 
> [...]
> 
> > @@ -858,16 +873,11 @@ static int virtio_transport_reset_no_sock(const struct virtio_transport *t,
> >  static void virtio_transport_remove_sock(struct vsock_sock *vsk)
> >  {
> >  	struct virtio_vsock_sock *vvs = vsk->trans;
> > -	struct virtio_vsock_pkt *pkt, *tmp;
> >  
> >  	/* We don't need to take rx_lock, as the socket is closing and we are
> >  	 * removing it.
> >  	 */
> > -	list_for_each_entry_safe(pkt, tmp, &vvs->rx_queue, list) {
> > -		list_del(&pkt->list);
> > -		virtio_transport_free_pkt(pkt);
> > -	}
> > -
> > +	virtio_vsock_skb_queue_purge(&vvs->rx_queue);
> 
> Still assuming rx_queue is under the rx_lock, given you don't need the
> locking here as per the above comment, you should use the lockless
> purge variant.
> 

Good catch, thanks!

Thanks again,
Bobby

next prev parent reply	other threads:[~2022-12-06 18:07 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-02 17:35 [PATCH v5] virtio/vsock: replace virtio_vsock_pkt with sk_buff Bobby Eshleman
2022-12-05 12:22 ` Stefano Garzarella
2022-12-05 12:22   ` Stefano Garzarella
2022-12-06  1:37   ` Jakub Kicinski
2022-11-21 11:02     ` Bobby Eshleman
2022-12-06 10:20 ` Paolo Abeni
2022-12-06 10:20   ` Paolo Abeni
2022-11-21 12:01   ` Bobby Eshleman [this message]
2022-12-07  9:33     ` Paolo Abeni
2022-12-07  9:33       ` Paolo Abeni
2022-11-21 20:19       ` Bobby Eshleman
2022-12-06 22:15 ` Michael S. Tsirkin
2022-12-06 22:15   ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y3toiPtBgOcrb8TL@bullseye \
    --to=bobbyeshleman@gmail.com \
    --cc=bobby.eshleman@bytedance.com \
    --cc=cong.wang@bytedance.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=jasowang@redhat.com \
    --cc=jiang.wang@bytedance.com \
    --cc=kuba@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=oxffffaa@gmail.com \
    --cc=pabeni@redhat.com \
    --cc=sgarzare@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.