Netdev List
 help / color / mirror / Atom feed
From: Stefano Garzarella <sgarzare@redhat.com>
To: David Laight <david.laight.linux@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>,
	netdev@vger.kernel.org, "Jakub Kicinski" <kuba@kernel.org>,
	"Paolo Abeni" <pabeni@redhat.com>,
	"Simon Horman" <horms@kernel.org>,
	"Arseniy Krasnov" <avkrasnov@salutedevices.com>,
	"Stefan Hajnoczi" <stefanha@redhat.com>,
	kvm@vger.kernel.org, "Eric Dumazet" <edumazet@google.com>,
	"Eugenio Pérez" <eperezma@redhat.com>,
	"Xuan Zhuo" <xuanzhuo@linux.alibaba.com>,
	virtualization@lists.linux.dev,
	"David S. Miller" <davem@davemloft.net>,
	"Jason Wang" <jasowang@redhat.com>,
	linux-kernel@vger.kernel.org,
	"Maher Azzouzi" <maherazz04@gmail.com>
Subject: Re: [PATCH net] vsock/virtio: fix zerocopy completion for multi-skb sends
Date: Mon, 18 May 2026 13:08:52 +0200	[thread overview]
Message-ID: <agrxFFkyvyB-kbvr@sgarzare-redhat> (raw)
In-Reply-To: <20260518115005.5f13bd2b@pumpkin>

On Mon, May 18, 2026 at 11:50:05AM +0100, David Laight wrote:
>On Mon, 18 May 2026 11:54:19 +0200
>Stefano Garzarella <sgarzare@redhat.com> wrote:
>
>> On Mon, May 18, 2026 at 05:33:08AM -0400, Michael S. Tsirkin wrote:
>> >On Mon, May 18, 2026 at 11:18:24AM +0200, Stefano Garzarella wrote:
>> >> On Sat, May 16, 2026 at 12:53:29PM +0100, David Laight wrote:
>> >> > On Thu, 14 May 2026 11:29:48 +0200
>> >> > Stefano Garzarella <sgarzare@redhat.com> wrote:
>> >> >
>> >> > > From: Stefano Garzarella <sgarzare@redhat.com>
>> >> > >
>> >> > > When a large message is fragmented into multiple skbs, the zerocopy
>> >> > > uarg is only allocated and attached to the last skb in the loop.
>> >> > > Non-final skbs carry pinned user pages with no completion tracking,
>> >> > > so the kernel has no way to notify userspace when those pages are safe
>> >> > > to reuse. If the loop breaks early the uarg is never allocated at all,
>> >> > > leaking pinned pages with no completion notification.
>> >> > >
>> >> > > Fix this by following the approach used by TCP: allocate the zerocopy
>> >> > > uarg (if not provided by the caller) before the send loop and attach
>> >> > > it to every skb via skb_zcopy_set(), which takes a reference per skb.
>> >> > > Each skb's completion properly decrements the refcount, and the
>> >> > > notification only fires after the last skb is freed.
>> >> > > On failure, if no data was sent, the uarg is cleanly aborted via
>> >> > > net_zcopy_put_abort().
>> >> > >
>> >> > > This issue was initially discovered by sashiko while reviewing commit
>> >> > > 1cb36e252211 ("vsock/virtio: fix MSG_ZEROCOPY pinned-pages accounting")
>> >> > > but was pre-existing.
>> >> > >
>> >> > > Fixes: 581512a6dc93 ("vsock/virtio: MSG_ZEROCOPY flag support")
>> >> > > Cc: Arseniy Krasnov <avkrasnov@salutedevices.com>
>> >> > > Closes: https://sashiko.dev/#/patchset/20260420132051.217589-1-sgarzare%40redhat.com
>> >> > > Reported-by: Maher Azzouzi <maherazz04@gmail.com>
>> >> > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>> >> > > ---
>> >> > >  net/vmw_vsock/virtio_transport_common.c | 83 ++++++++++---------------
>> >> > >  1 file changed, 34 insertions(+), 49 deletions(-)
>> >> > >
>> >> > > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>> >> > > index 989cc252d3d3..1e3409d28164 100644
>> >> > > --- a/net/vmw_vsock/virtio_transport_common.c
>> >> > > +++ b/net/vmw_vsock/virtio_transport_common.c
>> >> > > @@ -70,34 +70,6 @@ static bool virtio_transport_can_zcopy(const struct virtio_transport *t_ops,
>> >> > >  	return true;
>> >> > >  }
>> >> > >
>> >> > > -static int virtio_transport_init_zcopy_skb(struct vsock_sock *vsk,
>> >> > > -					   struct sk_buff *skb,
>> >> > > -					   struct msghdr *msg,
>> >> > > -					   size_t pkt_len,
>> >> > > -					   bool zerocopy)
>> >> > > -{
>> >> > > -	struct ubuf_info *uarg;
>> >> > > -
>> >> > > -	if (msg->msg_ubuf) {
>> >> > > -		uarg = msg->msg_ubuf;
>> >> > > -		net_zcopy_get(uarg);
>> >> > > -	} else {
>> >> > > -		struct ubuf_info_msgzc *uarg_zc;
>> >> > > -
>> >> > > -		uarg = msg_zerocopy_realloc(sk_vsock(vsk),
>> >> > > -					    pkt_len, NULL, false);
>> >> > > -		if (!uarg)
>> >> > > -			return -1;
>> >> > > -
>> >> > > -		uarg_zc = uarg_to_msgzc(uarg);
>> >> > > -		uarg_zc->zerocopy = zerocopy ? 1 : 0;
>> >> > > -	}
>> >> > > -
>> >> > > -	skb_zcopy_init(skb, uarg);
>> >> > > -
>> >> > > -	return 0;
>> >> > > -}
>> >> > > -
>> >> > >  static int virtio_transport_fill_skb(struct sk_buff *skb,
>> >> > >  				     struct virtio_vsock_pkt_info *info,
>> >> > >  				     size_t len,
>> >> > > @@ -317,8 +289,10 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>> >> > >  	u32 src_cid, src_port, dst_cid, dst_port;
>> >> > >  	const struct virtio_transport *t_ops;
>> >> > >  	struct virtio_vsock_sock *vvs;
>> >> > > +	struct ubuf_info *uarg = NULL;
>> >> > >  	u32 pkt_len = info->pkt_len;
>> >> > >  	bool can_zcopy = false;
>> >> > > +	bool have_uref = false;
>> >> > >  	u32 rest_len;
>> >> > >  	int ret;
>> >> > >
>> >> > > @@ -360,6 +334,25 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>> >> > >  		if (can_zcopy)
>> >> > >  			max_skb_len = min_t(u32, VIRTIO_VSOCK_MAX_PKT_BUF_SIZE,
>> >> > >  					    (MAX_SKB_FRAGS * PAGE_SIZE));
>> >> > > +
>> >> > > +		if (info->msg->msg_flags & MSG_ZEROCOPY &&
>> >> > > +		    info->op == VIRTIO_VSOCK_OP_RW) {
>> >> > > +			uarg = info->msg->msg_ubuf;
>> >> > > +
>> >> > > +			if (!uarg) {
>> >> > > +				uarg = msg_zerocopy_realloc(sk_vsock(vsk),
>> >> > > +							    pkt_len, NULL, false);
>> >> > > +				if (!uarg) {
>> >> > > +					virtio_transport_put_credit(vvs, pkt_len);
>> >> > > +					return -ENOMEM;
>> >> > > +				}
>> >> > > +
>> >> > > +				if (!can_zcopy)
>> >> > > +					uarg_to_msgzc(uarg)->zerocopy = 0;
>> >> > > +
>> >> > > +				have_uref = true;
>> >> > > +			}
>> >> > > +		}
>> >> >
>> >> > Surely that block should only be done if can_zcopy is true?
>> >> > And shouldn't something unset it if info->op != VIRTIO_VSOCK_OP_RW ?
>> >> > If the msg_zerocopy_realloc() fails then can't you just set can_zcopy to false.
>> >> >
>> >> > It info->msg->msg_buf is already set then I think you have to disable zero-copy.
>> >> > The caller has already requested a callback - and you can't add another.
>> >> >
>> >> > In any case by the end of this can_zcopy and have_uref are really the same flag.
>> >>
>> >> I kept the same approach we had before, trying to make as few changes as
>> >> possible.
>> >>
>> >> All these potential issues seem to be pre-existing and should be eventually
>> >> addressed in other patches IMHO. This patch one only resolves the main issue
>> >> of calling `skb_zcopy_set()` for every skb to avoid leaking pages, etc.
>> >
>> >the patch is upstream now, right? So pretty much have to be patches on
>> >top.
>>
>> If those are actual issues, then yes. TBH, I didn’t look into that
>> aspect and left it as it was before. We should take a closer look at how
>> MSG_ZEROCOPY is handled.
>>
>> David, if you think it needs fixing and you have time, feel free to send
>> patches on top.
>
>I'm not fully sure how it all works.

Same here, so I pinged Arseniy who worked on that, since it seemed 
deliberate to have `can_zcopy` (and set `uarg->zerocopy` accordingly) 
only when it was supported by the transport.

>Especially the paths where msg->msg_ubuf is non-NULL, I suspect it should
>be added to all the skb even if the ZEROCOPY flag isn't set.
>I was just reading the one function.
>But there did look like some very dodgy conditionals.

I see, let's wait for Arseniy's feedback; otherwise, I'll try to fix it 
next week. As mentioned, this issue existed before this patch, so it 
shouldn't be a regression.

Thanks,
Stefano


  reply	other threads:[~2026-05-18 11:09 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-14  9:29 [PATCH net] vsock/virtio: fix zerocopy completion for multi-skb sends Stefano Garzarella
2026-05-14 14:07 ` Michael S. Tsirkin
2026-05-15 17:18 ` Arseniy Krasnov
2026-05-16  0:50 ` patchwork-bot+netdevbpf
2026-05-16 11:53 ` David Laight
2026-05-18  9:18   ` Stefano Garzarella
2026-05-18  9:33     ` Michael S. Tsirkin
2026-05-18  9:54       ` Stefano Garzarella
2026-05-18 10:50         ` David Laight
2026-05-18 11:08           ` Stefano Garzarella [this message]
     [not found]           ` <20260519053951.1C60440015@mx4.sberdevices.ru>
2026-05-19  6:37             ` Arseniy Krasnov
2026-05-19  9:49               ` Stefano Garzarella
2026-05-19 10:40                 ` Arseniy Krasnov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=agrxFFkyvyB-kbvr@sgarzare-redhat \
    --to=sgarzare@redhat.com \
    --cc=avkrasnov@salutedevices.com \
    --cc=davem@davemloft.net \
    --cc=david.laight.linux@gmail.com \
    --cc=edumazet@google.com \
    --cc=eperezma@redhat.com \
    --cc=horms@kernel.org \
    --cc=jasowang@redhat.com \
    --cc=kuba@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maherazz04@gmail.com \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=virtualization@lists.linux.dev \
    --cc=xuanzhuo@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox