Netdev List
 help / color / mirror / Atom feed
* [PATCH net] vsock/virtio: fix zerocopy completion for multi-skb sends
@ 2026-05-14  9:29 Stefano Garzarella
  2026-05-14 14:07 ` Michael S. Tsirkin
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Stefano Garzarella @ 2026-05-14  9:29 UTC (permalink / raw)
  To: netdev
  Cc: Jakub Kicinski, Paolo Abeni, Simon Horman, Arseniy Krasnov,
	Stefan Hajnoczi, Stefano Garzarella, kvm, Eric Dumazet,
	Eugenio Pérez, Michael S. Tsirkin, Xuan Zhuo, virtualization,
	David S. Miller, Jason Wang, linux-kernel, Maher Azzouzi

From: Stefano Garzarella <sgarzare@redhat.com>

When a large message is fragmented into multiple skbs, the zerocopy
uarg is only allocated and attached to the last skb in the loop.
Non-final skbs carry pinned user pages with no completion tracking,
so the kernel has no way to notify userspace when those pages are safe
to reuse. If the loop breaks early the uarg is never allocated at all,
leaking pinned pages with no completion notification.

Fix this by following the approach used by TCP: allocate the zerocopy
uarg (if not provided by the caller) before the send loop and attach
it to every skb via skb_zcopy_set(), which takes a reference per skb.
Each skb's completion properly decrements the refcount, and the
notification only fires after the last skb is freed.
On failure, if no data was sent, the uarg is cleanly aborted via
net_zcopy_put_abort().

This issue was initially discovered by sashiko while reviewing commit
1cb36e252211 ("vsock/virtio: fix MSG_ZEROCOPY pinned-pages accounting")
but was pre-existing.

Fixes: 581512a6dc93 ("vsock/virtio: MSG_ZEROCOPY flag support")
Cc: Arseniy Krasnov <avkrasnov@salutedevices.com>
Closes: https://sashiko.dev/#/patchset/20260420132051.217589-1-sgarzare%40redhat.com
Reported-by: Maher Azzouzi <maherazz04@gmail.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
---
 net/vmw_vsock/virtio_transport_common.c | 83 ++++++++++---------------
 1 file changed, 34 insertions(+), 49 deletions(-)

diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 989cc252d3d3..1e3409d28164 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -70,34 +70,6 @@ static bool virtio_transport_can_zcopy(const struct virtio_transport *t_ops,
 	return true;
 }
 
-static int virtio_transport_init_zcopy_skb(struct vsock_sock *vsk,
-					   struct sk_buff *skb,
-					   struct msghdr *msg,
-					   size_t pkt_len,
-					   bool zerocopy)
-{
-	struct ubuf_info *uarg;
-
-	if (msg->msg_ubuf) {
-		uarg = msg->msg_ubuf;
-		net_zcopy_get(uarg);
-	} else {
-		struct ubuf_info_msgzc *uarg_zc;
-
-		uarg = msg_zerocopy_realloc(sk_vsock(vsk),
-					    pkt_len, NULL, false);
-		if (!uarg)
-			return -1;
-
-		uarg_zc = uarg_to_msgzc(uarg);
-		uarg_zc->zerocopy = zerocopy ? 1 : 0;
-	}
-
-	skb_zcopy_init(skb, uarg);
-
-	return 0;
-}
-
 static int virtio_transport_fill_skb(struct sk_buff *skb,
 				     struct virtio_vsock_pkt_info *info,
 				     size_t len,
@@ -317,8 +289,10 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
 	u32 src_cid, src_port, dst_cid, dst_port;
 	const struct virtio_transport *t_ops;
 	struct virtio_vsock_sock *vvs;
+	struct ubuf_info *uarg = NULL;
 	u32 pkt_len = info->pkt_len;
 	bool can_zcopy = false;
+	bool have_uref = false;
 	u32 rest_len;
 	int ret;
 
@@ -360,6 +334,25 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
 		if (can_zcopy)
 			max_skb_len = min_t(u32, VIRTIO_VSOCK_MAX_PKT_BUF_SIZE,
 					    (MAX_SKB_FRAGS * PAGE_SIZE));
+
+		if (info->msg->msg_flags & MSG_ZEROCOPY &&
+		    info->op == VIRTIO_VSOCK_OP_RW) {
+			uarg = info->msg->msg_ubuf;
+
+			if (!uarg) {
+				uarg = msg_zerocopy_realloc(sk_vsock(vsk),
+							    pkt_len, NULL, false);
+				if (!uarg) {
+					virtio_transport_put_credit(vvs, pkt_len);
+					return -ENOMEM;
+				}
+
+				if (!can_zcopy)
+					uarg_to_msgzc(uarg)->zerocopy = 0;
+
+				have_uref = true;
+			}
+		}
 	}
 
 	rest_len = pkt_len;
@@ -378,27 +371,7 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
 			break;
 		}
 
-		/* We process buffer part by part, allocating skb on
-		 * each iteration. If this is last skb for this buffer
-		 * and MSG_ZEROCOPY mode is in use - we must allocate
-		 * completion for the current syscall.
-		 *
-		 * Pass pkt_len because msg iter is already consumed
-		 * by virtio_transport_fill_skb(), so iter->count
-		 * can not be used for RLIMIT_MEMLOCK pinned-pages
-		 * accounting done by msg_zerocopy_realloc().
-		 */
-		if (info->msg && info->msg->msg_flags & MSG_ZEROCOPY &&
-		    skb_len == rest_len && info->op == VIRTIO_VSOCK_OP_RW) {
-			if (virtio_transport_init_zcopy_skb(vsk, skb,
-							    info->msg,
-							    pkt_len,
-							    can_zcopy)) {
-				kfree_skb(skb);
-				ret = -ENOMEM;
-				break;
-			}
-		}
+		skb_zcopy_set(skb, uarg, NULL);
 
 		virtio_transport_inc_tx_pkt(vvs, skb);
 
@@ -422,6 +395,18 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
 
 	virtio_transport_put_credit(vvs, rest_len);
 
+	/* msg_zerocopy_realloc() initializes the ubuf_info refcnt to 1.
+	 * skb_zcopy_set() increases it for each skb, so we can drop that
+	 * initial reference to keep it balanced.
+	 */
+	if (have_uref) {
+		if (rest_len == pkt_len)
+			/* No data sent, abort the notification. */
+			net_zcopy_put_abort(uarg, true);
+		else
+			net_zcopy_put(uarg);
+	}
+
 	/* Return number of bytes, if any data has been sent. */
 	if (rest_len != pkt_len)
 		ret = pkt_len - rest_len;
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH net] vsock/virtio: fix zerocopy completion for multi-skb sends
  2026-05-14  9:29 [PATCH net] vsock/virtio: fix zerocopy completion for multi-skb sends Stefano Garzarella
@ 2026-05-14 14:07 ` Michael S. Tsirkin
  2026-05-15 17:18 ` Arseniy Krasnov
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: Michael S. Tsirkin @ 2026-05-14 14:07 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Arseniy Krasnov, Stefan Hajnoczi, kvm, Eric Dumazet,
	Eugenio Pérez, Xuan Zhuo, virtualization, David S. Miller,
	Jason Wang, linux-kernel, Maher Azzouzi

On Thu, May 14, 2026 at 11:29:48AM +0200, Stefano Garzarella wrote:
> From: Stefano Garzarella <sgarzare@redhat.com>
> 
> When a large message is fragmented into multiple skbs, the zerocopy
> uarg is only allocated and attached to the last skb in the loop.
> Non-final skbs carry pinned user pages with no completion tracking,
> so the kernel has no way to notify userspace when those pages are safe
> to reuse. If the loop breaks early the uarg is never allocated at all,
> leaking pinned pages with no completion notification.
> 
> Fix this by following the approach used by TCP: allocate the zerocopy
> uarg (if not provided by the caller) before the send loop and attach
> it to every skb via skb_zcopy_set(), which takes a reference per skb.
> Each skb's completion properly decrements the refcount, and the
> notification only fires after the last skb is freed.
> On failure, if no data was sent, the uarg is cleanly aborted via
> net_zcopy_put_abort().
> 
> This issue was initially discovered by sashiko while reviewing commit
> 1cb36e252211 ("vsock/virtio: fix MSG_ZEROCOPY pinned-pages accounting")
> but was pre-existing.
> 
> Fixes: 581512a6dc93 ("vsock/virtio: MSG_ZEROCOPY flag support")
> Cc: Arseniy Krasnov <avkrasnov@salutedevices.com>
> Closes: https://sashiko.dev/#/patchset/20260420132051.217589-1-sgarzare%40redhat.com
> Reported-by: Maher Azzouzi <maherazz04@gmail.com>
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  net/vmw_vsock/virtio_transport_common.c | 83 ++++++++++---------------
>  1 file changed, 34 insertions(+), 49 deletions(-)
> 
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index 989cc252d3d3..1e3409d28164 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -70,34 +70,6 @@ static bool virtio_transport_can_zcopy(const struct virtio_transport *t_ops,
>  	return true;
>  }
>  
> -static int virtio_transport_init_zcopy_skb(struct vsock_sock *vsk,
> -					   struct sk_buff *skb,
> -					   struct msghdr *msg,
> -					   size_t pkt_len,
> -					   bool zerocopy)
> -{
> -	struct ubuf_info *uarg;
> -
> -	if (msg->msg_ubuf) {
> -		uarg = msg->msg_ubuf;
> -		net_zcopy_get(uarg);
> -	} else {
> -		struct ubuf_info_msgzc *uarg_zc;
> -
> -		uarg = msg_zerocopy_realloc(sk_vsock(vsk),
> -					    pkt_len, NULL, false);
> -		if (!uarg)
> -			return -1;
> -
> -		uarg_zc = uarg_to_msgzc(uarg);
> -		uarg_zc->zerocopy = zerocopy ? 1 : 0;
> -	}
> -
> -	skb_zcopy_init(skb, uarg);
> -
> -	return 0;
> -}
> -
>  static int virtio_transport_fill_skb(struct sk_buff *skb,
>  				     struct virtio_vsock_pkt_info *info,
>  				     size_t len,
> @@ -317,8 +289,10 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>  	u32 src_cid, src_port, dst_cid, dst_port;
>  	const struct virtio_transport *t_ops;
>  	struct virtio_vsock_sock *vvs;
> +	struct ubuf_info *uarg = NULL;
>  	u32 pkt_len = info->pkt_len;
>  	bool can_zcopy = false;
> +	bool have_uref = false;
>  	u32 rest_len;
>  	int ret;
>  
> @@ -360,6 +334,25 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>  		if (can_zcopy)
>  			max_skb_len = min_t(u32, VIRTIO_VSOCK_MAX_PKT_BUF_SIZE,
>  					    (MAX_SKB_FRAGS * PAGE_SIZE));
> +
> +		if (info->msg->msg_flags & MSG_ZEROCOPY &&
> +		    info->op == VIRTIO_VSOCK_OP_RW) {
> +			uarg = info->msg->msg_ubuf;
> +
> +			if (!uarg) {
> +				uarg = msg_zerocopy_realloc(sk_vsock(vsk),
> +							    pkt_len, NULL, false);
> +				if (!uarg) {
> +					virtio_transport_put_credit(vvs, pkt_len);
> +					return -ENOMEM;
> +				}
> +
> +				if (!can_zcopy)
> +					uarg_to_msgzc(uarg)->zerocopy = 0;
> +
> +				have_uref = true;
> +			}
> +		}
>  	}
>  
>  	rest_len = pkt_len;
> @@ -378,27 +371,7 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>  			break;
>  		}
>  
> -		/* We process buffer part by part, allocating skb on
> -		 * each iteration. If this is last skb for this buffer
> -		 * and MSG_ZEROCOPY mode is in use - we must allocate
> -		 * completion for the current syscall.
> -		 *
> -		 * Pass pkt_len because msg iter is already consumed
> -		 * by virtio_transport_fill_skb(), so iter->count
> -		 * can not be used for RLIMIT_MEMLOCK pinned-pages
> -		 * accounting done by msg_zerocopy_realloc().
> -		 */
> -		if (info->msg && info->msg->msg_flags & MSG_ZEROCOPY &&
> -		    skb_len == rest_len && info->op == VIRTIO_VSOCK_OP_RW) {
> -			if (virtio_transport_init_zcopy_skb(vsk, skb,
> -							    info->msg,
> -							    pkt_len,
> -							    can_zcopy)) {
> -				kfree_skb(skb);
> -				ret = -ENOMEM;
> -				break;
> -			}
> -		}
> +		skb_zcopy_set(skb, uarg, NULL);
>  
>  		virtio_transport_inc_tx_pkt(vvs, skb);
>  
> @@ -422,6 +395,18 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>  
>  	virtio_transport_put_credit(vvs, rest_len);
>  
> +	/* msg_zerocopy_realloc() initializes the ubuf_info refcnt to 1.
> +	 * skb_zcopy_set() increases it for each skb, so we can drop that
> +	 * initial reference to keep it balanced.
> +	 */
> +	if (have_uref) {
> +		if (rest_len == pkt_len)
> +			/* No data sent, abort the notification. */
> +			net_zcopy_put_abort(uarg, true);
> +		else
> +			net_zcopy_put(uarg);
> +	}
> +
>  	/* Return number of bytes, if any data has been sent. */
>  	if (rest_len != pkt_len)
>  		ret = pkt_len - rest_len;
> -- 
> 2.54.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] vsock/virtio: fix zerocopy completion for multi-skb sends
  2026-05-14  9:29 [PATCH net] vsock/virtio: fix zerocopy completion for multi-skb sends Stefano Garzarella
  2026-05-14 14:07 ` Michael S. Tsirkin
@ 2026-05-15 17:18 ` Arseniy Krasnov
  2026-05-16  0:50 ` patchwork-bot+netdevbpf
  2026-05-16 11:53 ` David Laight
  3 siblings, 0 replies; 13+ messages in thread
From: Arseniy Krasnov @ 2026-05-15 17:18 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, Jakub Kicinski, Paolo Abeni,, Simon Horman,
	Stefan Hajnoczi, kvm, Eric Dumazet, Xuan Zhuo,, virtualization,
	David S., Miller, Jason Wang, linux-kernel, Maher Azzouzi,
	eperezma

> From: Stefano Garzarella <sgarzare@redhat.com>
>
> When a large message is fragmented into multiple skbs, the zerocopy
> uarg is only allocated and attached to the last skb in the loop.
> Non-final skbs carry pinned user pages with no completion tracking,
> so the kernel has no way to notify userspace when those pages are safe
> to reuse. If the loop breaks early the uarg is never allocated at all,
> leaking pinned pages with no completion notification.
>
> Fix this by following the approach used by TCP: allocate the zerocopy
> uarg (if not provided by the caller) before the send loop and attach
> it to every skb via skb_zcopy_set(), which takes a reference per skb.
> Each skb's completion properly decrements the refcount, and the
> notification only fires after the last skb is freed.
> On failure, if no data was sent, the uarg is cleanly aborted via
> net_zcopy_put_abort().
>
> This issue was initially discovered by sashiko while reviewing commit
> 1cb36e252211 ("vsock/virtio: fix MSG_ZEROCOPY pinned-pages accounting")
> but was pre-existing.
>
> Fixes: 581512a6dc93 ("vsock/virtio: MSG_ZEROCOPY flag support")
> Cc: Arseniy Krasnov <avkrasnov@salutedevices.com>
> Closes: https://sashiko.dev/#/patchset/20260420132051.217589-1-sgarzare%40redhat.com
> Reported-by: Maher Azzouzi <maherazz04@gmail.com>
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>


Acked-by: Arseniy Krasnov <avkrasnov@salutedevices.com>


> ---
>  net/vmw_vsock/virtio_transport_common.c | 83 ++++++++++---------------
>  1 file changed, 34 insertions(+), 49 deletions(-)
>
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index 989cc252d3d3..1e3409d28164 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -70,34 +70,6 @@ static bool virtio_transport_can_zcopy(const struct virtio_transport *t_ops,
>  	return true;
>  }
>  
> -static int virtio_transport_init_zcopy_skb(struct vsock_sock *vsk,
> -					   struct sk_buff *skb,
> -					   struct msghdr *msg,
> -					   size_t pkt_len,
> -					   bool zerocopy)
> -{
> -	struct ubuf_info *uarg;
> -
> -	if (msg->msg_ubuf) {
> -		uarg = msg->msg_ubuf;
> -		net_zcopy_get(uarg);
> -	} else {
> -		struct ubuf_info_msgzc *uarg_zc;
> -
> -		uarg = msg_zerocopy_realloc(sk_vsock(vsk),
> -					    pkt_len, NULL, false);
> -		if (!uarg)
> -			return -1;
> -
> -		uarg_zc = uarg_to_msgzc(uarg);
> -		uarg_zc->zerocopy = zerocopy ? 1 : 0;
> -	}
> -
> -	skb_zcopy_init(skb, uarg);
> -
> -	return 0;
> -}
> -
>  static int virtio_transport_fill_skb(struct sk_buff *skb,
>  				     struct virtio_vsock_pkt_info *info,
>  				     size_t len,
> @@ -317,8 +289,10 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>  	u32 src_cid, src_port, dst_cid, dst_port;
>  	const struct virtio_transport *t_ops;
>  	struct virtio_vsock_sock *vvs;
> +	struct ubuf_info *uarg = NULL;
>  	u32 pkt_len = info->pkt_len;
>  	bool can_zcopy = false;
> +	bool have_uref = false;
>  	u32 rest_len;
>  	int ret;
>  
> @@ -360,6 +334,25 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>  		if (can_zcopy)
>  			max_skb_len = min_t(u32, VIRTIO_VSOCK_MAX_PKT_BUF_SIZE,
>  					    (MAX_SKB_FRAGS * PAGE_SIZE));
> +
> +		if (info->msg->msg_flags & MSG_ZEROCOPY &&
> +		    info->op == VIRTIO_VSOCK_OP_RW) {
> +			uarg = info->msg->msg_ubuf;
> +
> +			if (!uarg) {
> +				uarg = msg_zerocopy_realloc(sk_vsock(vsk),
> +							    pkt_len, NULL, false);
> +				if (!uarg) {
> +					virtio_transport_put_credit(vvs, pkt_len);
> +					return -ENOMEM;
> +				}
> +
> +				if (!can_zcopy)
> +					uarg_to_msgzc(uarg)->zerocopy = 0;
> +
> +				have_uref = true;
> +			}
> +		}
>  	}
>  
>  	rest_len = pkt_len;
> @@ -378,27 +371,7 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>  			break;
>  		}
>  
> -		/* We process buffer part by part, allocating skb on
> -		 * each iteration. If this is last skb for this buffer
> -		 * and MSG_ZEROCOPY mode is in use - we must allocate
> -		 * completion for the current syscall.
> -		 *
> -		 * Pass pkt_len because msg iter is already consumed
> -		 * by virtio_transport_fill_skb(), so iter->count
> -		 * can not be used for RLIMIT_MEMLOCK pinned-pages
> -		 * accounting done by msg_zerocopy_realloc().
> -		 */
> -		if (info->msg && info->msg->msg_flags & MSG_ZEROCOPY &&
> -		    skb_len == rest_len && info->op == VIRTIO_VSOCK_OP_RW) {
> -			if (virtio_transport_init_zcopy_skb(vsk, skb,
> -							    info->msg,
> -							    pkt_len,
> -							    can_zcopy)) {
> -				kfree_skb(skb);
> -				ret = -ENOMEM;
> -				break;
> -			}
> -		}
> +		skb_zcopy_set(skb, uarg, NULL);
>  
>  		virtio_transport_inc_tx_pkt(vvs, skb);
>  
> @@ -422,6 +395,18 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>  
>  	virtio_transport_put_credit(vvs, rest_len);
>  
> +	/* msg_zerocopy_realloc() initializes the ubuf_info refcnt to 1.
> +	 * skb_zcopy_set() increases it for each skb, so we can drop that
> +	 * initial reference to keep it balanced.
> +	 */
> +	if (have_uref) {
> +		if (rest_len == pkt_len)
> +			/* No data sent, abort the notification. */
> +			net_zcopy_put_abort(uarg, true);
> +		else
> +			net_zcopy_put(uarg);
> +	}
> +
>  	/* Return number of bytes, if any data has been sent. */
>  	if (rest_len != pkt_len)
>  		ret = pkt_len - rest_len;
> -- 
> 2.54.0
>
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] vsock/virtio: fix zerocopy completion for multi-skb sends
  2026-05-14  9:29 [PATCH net] vsock/virtio: fix zerocopy completion for multi-skb sends Stefano Garzarella
  2026-05-14 14:07 ` Michael S. Tsirkin
  2026-05-15 17:18 ` Arseniy Krasnov
@ 2026-05-16  0:50 ` patchwork-bot+netdevbpf
  2026-05-16 11:53 ` David Laight
  3 siblings, 0 replies; 13+ messages in thread
From: patchwork-bot+netdevbpf @ 2026-05-16  0:50 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, kuba, pabeni, horms, avkrasnov, stefanha, kvm, edumazet,
	eperezma, mst, xuanzhuo, virtualization, davem, jasowang,
	linux-kernel, maherazz04

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Thu, 14 May 2026 11:29:48 +0200 you wrote:
> From: Stefano Garzarella <sgarzare@redhat.com>
> 
> When a large message is fragmented into multiple skbs, the zerocopy
> uarg is only allocated and attached to the last skb in the loop.
> Non-final skbs carry pinned user pages with no completion tracking,
> so the kernel has no way to notify userspace when those pages are safe
> to reuse. If the loop breaks early the uarg is never allocated at all,
> leaking pinned pages with no completion notification.
> 
> [...]

Here is the summary with links:
  - [net] vsock/virtio: fix zerocopy completion for multi-skb sends
    https://git.kernel.org/netdev/net/c/ae38d9179190

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] vsock/virtio: fix zerocopy completion for multi-skb sends
  2026-05-14  9:29 [PATCH net] vsock/virtio: fix zerocopy completion for multi-skb sends Stefano Garzarella
                   ` (2 preceding siblings ...)
  2026-05-16  0:50 ` patchwork-bot+netdevbpf
@ 2026-05-16 11:53 ` David Laight
  2026-05-18  9:18   ` Stefano Garzarella
  3 siblings, 1 reply; 13+ messages in thread
From: David Laight @ 2026-05-16 11:53 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: netdev, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Arseniy Krasnov, Stefan Hajnoczi, kvm, Eric Dumazet,
	Eugenio Pérez, Michael S. Tsirkin, Xuan Zhuo, virtualization,
	David S. Miller, Jason Wang, linux-kernel, Maher Azzouzi

On Thu, 14 May 2026 11:29:48 +0200
Stefano Garzarella <sgarzare@redhat.com> wrote:

> From: Stefano Garzarella <sgarzare@redhat.com>
> 
> When a large message is fragmented into multiple skbs, the zerocopy
> uarg is only allocated and attached to the last skb in the loop.
> Non-final skbs carry pinned user pages with no completion tracking,
> so the kernel has no way to notify userspace when those pages are safe
> to reuse. If the loop breaks early the uarg is never allocated at all,
> leaking pinned pages with no completion notification.
> 
> Fix this by following the approach used by TCP: allocate the zerocopy
> uarg (if not provided by the caller) before the send loop and attach
> it to every skb via skb_zcopy_set(), which takes a reference per skb.
> Each skb's completion properly decrements the refcount, and the
> notification only fires after the last skb is freed.
> On failure, if no data was sent, the uarg is cleanly aborted via
> net_zcopy_put_abort().
> 
> This issue was initially discovered by sashiko while reviewing commit
> 1cb36e252211 ("vsock/virtio: fix MSG_ZEROCOPY pinned-pages accounting")
> but was pre-existing.
> 
> Fixes: 581512a6dc93 ("vsock/virtio: MSG_ZEROCOPY flag support")
> Cc: Arseniy Krasnov <avkrasnov@salutedevices.com>
> Closes: https://sashiko.dev/#/patchset/20260420132051.217589-1-sgarzare%40redhat.com
> Reported-by: Maher Azzouzi <maherazz04@gmail.com>
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
>  net/vmw_vsock/virtio_transport_common.c | 83 ++++++++++---------------
>  1 file changed, 34 insertions(+), 49 deletions(-)
> 
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index 989cc252d3d3..1e3409d28164 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -70,34 +70,6 @@ static bool virtio_transport_can_zcopy(const struct virtio_transport *t_ops,
>  	return true;
>  }
>  
> -static int virtio_transport_init_zcopy_skb(struct vsock_sock *vsk,
> -					   struct sk_buff *skb,
> -					   struct msghdr *msg,
> -					   size_t pkt_len,
> -					   bool zerocopy)
> -{
> -	struct ubuf_info *uarg;
> -
> -	if (msg->msg_ubuf) {
> -		uarg = msg->msg_ubuf;
> -		net_zcopy_get(uarg);
> -	} else {
> -		struct ubuf_info_msgzc *uarg_zc;
> -
> -		uarg = msg_zerocopy_realloc(sk_vsock(vsk),
> -					    pkt_len, NULL, false);
> -		if (!uarg)
> -			return -1;
> -
> -		uarg_zc = uarg_to_msgzc(uarg);
> -		uarg_zc->zerocopy = zerocopy ? 1 : 0;
> -	}
> -
> -	skb_zcopy_init(skb, uarg);
> -
> -	return 0;
> -}
> -
>  static int virtio_transport_fill_skb(struct sk_buff *skb,
>  				     struct virtio_vsock_pkt_info *info,
>  				     size_t len,
> @@ -317,8 +289,10 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>  	u32 src_cid, src_port, dst_cid, dst_port;
>  	const struct virtio_transport *t_ops;
>  	struct virtio_vsock_sock *vvs;
> +	struct ubuf_info *uarg = NULL;
>  	u32 pkt_len = info->pkt_len;
>  	bool can_zcopy = false;
> +	bool have_uref = false;
>  	u32 rest_len;
>  	int ret;
>  
> @@ -360,6 +334,25 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>  		if (can_zcopy)
>  			max_skb_len = min_t(u32, VIRTIO_VSOCK_MAX_PKT_BUF_SIZE,
>  					    (MAX_SKB_FRAGS * PAGE_SIZE));
> +
> +		if (info->msg->msg_flags & MSG_ZEROCOPY &&
> +		    info->op == VIRTIO_VSOCK_OP_RW) {
> +			uarg = info->msg->msg_ubuf;
> +
> +			if (!uarg) {
> +				uarg = msg_zerocopy_realloc(sk_vsock(vsk),
> +							    pkt_len, NULL, false);
> +				if (!uarg) {
> +					virtio_transport_put_credit(vvs, pkt_len);
> +					return -ENOMEM;
> +				}
> +
> +				if (!can_zcopy)
> +					uarg_to_msgzc(uarg)->zerocopy = 0;
> +
> +				have_uref = true;
> +			}
> +		}

Surely that block should only be done if can_zcopy is true?
And shouldn't something unset it if info->op != VIRTIO_VSOCK_OP_RW ?
If the msg_zerocopy_realloc() fails then can't you just set can_zcopy to false.

It info->msg->msg_buf is already set then I think you have to disable zero-copy.
The caller has already requested a callback - and you can't add another.

In any case by the end of this can_zcopy and have_uref are really the same flag.

>  	}
>  
>  	rest_len = pkt_len;
> @@ -378,27 +371,7 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>  			break;
>  		}
>  
> -		/* We process buffer part by part, allocating skb on
> -		 * each iteration. If this is last skb for this buffer
> -		 * and MSG_ZEROCOPY mode is in use - we must allocate
> -		 * completion for the current syscall.
> -		 *
> -		 * Pass pkt_len because msg iter is already consumed
> -		 * by virtio_transport_fill_skb(), so iter->count
> -		 * can not be used for RLIMIT_MEMLOCK pinned-pages
> -		 * accounting done by msg_zerocopy_realloc().
> -		 */
> -		if (info->msg && info->msg->msg_flags & MSG_ZEROCOPY &&
> -		    skb_len == rest_len && info->op == VIRTIO_VSOCK_OP_RW) {
> -			if (virtio_transport_init_zcopy_skb(vsk, skb,
> -							    info->msg,
> -							    pkt_len,
> -							    can_zcopy)) {
> -				kfree_skb(skb);
> -				ret = -ENOMEM;
> -				break;
> -			}
> -		}
> +		skb_zcopy_set(skb, uarg, NULL);
>  
>  		virtio_transport_inc_tx_pkt(vvs, skb);
>  
> @@ -422,6 +395,18 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>  
>  	virtio_transport_put_credit(vvs, rest_len);
>  
> +	/* msg_zerocopy_realloc() initializes the ubuf_info refcnt to 1.
> +	 * skb_zcopy_set() increases it for each skb, so we can drop that
                                                            ^ must

> +	 * initial reference to keep it balanced.
> +	 */
> +	if (have_uref) {
> +		if (rest_len == pkt_len)
> +			/* No data sent, abort the notification. */
> +			net_zcopy_put_abort(uarg, true);

Is it worth optimising for the 'nothing sent' case ?

-- David

> +		else
> +			net_zcopy_put(uarg);
> +	}
> +
>  	/* Return number of bytes, if any data has been sent. */
>  	if (rest_len != pkt_len)
>  		ret = pkt_len - rest_len;


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] vsock/virtio: fix zerocopy completion for multi-skb sends
  2026-05-16 11:53 ` David Laight
@ 2026-05-18  9:18   ` Stefano Garzarella
  2026-05-18  9:33     ` Michael S. Tsirkin
  0 siblings, 1 reply; 13+ messages in thread
From: Stefano Garzarella @ 2026-05-18  9:18 UTC (permalink / raw)
  To: David Laight
  Cc: netdev, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Arseniy Krasnov, Stefan Hajnoczi, kvm, Eric Dumazet,
	Eugenio Pérez, Michael S. Tsirkin, Xuan Zhuo, virtualization,
	David S. Miller, Jason Wang, linux-kernel, Maher Azzouzi

On Sat, May 16, 2026 at 12:53:29PM +0100, David Laight wrote:
>On Thu, 14 May 2026 11:29:48 +0200
>Stefano Garzarella <sgarzare@redhat.com> wrote:
>
>> From: Stefano Garzarella <sgarzare@redhat.com>
>>
>> When a large message is fragmented into multiple skbs, the zerocopy
>> uarg is only allocated and attached to the last skb in the loop.
>> Non-final skbs carry pinned user pages with no completion tracking,
>> so the kernel has no way to notify userspace when those pages are safe
>> to reuse. If the loop breaks early the uarg is never allocated at all,
>> leaking pinned pages with no completion notification.
>>
>> Fix this by following the approach used by TCP: allocate the zerocopy
>> uarg (if not provided by the caller) before the send loop and attach
>> it to every skb via skb_zcopy_set(), which takes a reference per skb.
>> Each skb's completion properly decrements the refcount, and the
>> notification only fires after the last skb is freed.
>> On failure, if no data was sent, the uarg is cleanly aborted via
>> net_zcopy_put_abort().
>>
>> This issue was initially discovered by sashiko while reviewing commit
>> 1cb36e252211 ("vsock/virtio: fix MSG_ZEROCOPY pinned-pages accounting")
>> but was pre-existing.
>>
>> Fixes: 581512a6dc93 ("vsock/virtio: MSG_ZEROCOPY flag support")
>> Cc: Arseniy Krasnov <avkrasnov@salutedevices.com>
>> Closes: https://sashiko.dev/#/patchset/20260420132051.217589-1-sgarzare%40redhat.com
>> Reported-by: Maher Azzouzi <maherazz04@gmail.com>
>> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>> ---
>>  net/vmw_vsock/virtio_transport_common.c | 83 ++++++++++---------------
>>  1 file changed, 34 insertions(+), 49 deletions(-)
>>
>> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>> index 989cc252d3d3..1e3409d28164 100644
>> --- a/net/vmw_vsock/virtio_transport_common.c
>> +++ b/net/vmw_vsock/virtio_transport_common.c
>> @@ -70,34 +70,6 @@ static bool virtio_transport_can_zcopy(const struct virtio_transport *t_ops,
>>  	return true;
>>  }
>>
>> -static int virtio_transport_init_zcopy_skb(struct vsock_sock *vsk,
>> -					   struct sk_buff *skb,
>> -					   struct msghdr *msg,
>> -					   size_t pkt_len,
>> -					   bool zerocopy)
>> -{
>> -	struct ubuf_info *uarg;
>> -
>> -	if (msg->msg_ubuf) {
>> -		uarg = msg->msg_ubuf;
>> -		net_zcopy_get(uarg);
>> -	} else {
>> -		struct ubuf_info_msgzc *uarg_zc;
>> -
>> -		uarg = msg_zerocopy_realloc(sk_vsock(vsk),
>> -					    pkt_len, NULL, false);
>> -		if (!uarg)
>> -			return -1;
>> -
>> -		uarg_zc = uarg_to_msgzc(uarg);
>> -		uarg_zc->zerocopy = zerocopy ? 1 : 0;
>> -	}
>> -
>> -	skb_zcopy_init(skb, uarg);
>> -
>> -	return 0;
>> -}
>> -
>>  static int virtio_transport_fill_skb(struct sk_buff *skb,
>>  				     struct virtio_vsock_pkt_info *info,
>>  				     size_t len,
>> @@ -317,8 +289,10 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>>  	u32 src_cid, src_port, dst_cid, dst_port;
>>  	const struct virtio_transport *t_ops;
>>  	struct virtio_vsock_sock *vvs;
>> +	struct ubuf_info *uarg = NULL;
>>  	u32 pkt_len = info->pkt_len;
>>  	bool can_zcopy = false;
>> +	bool have_uref = false;
>>  	u32 rest_len;
>>  	int ret;
>>
>> @@ -360,6 +334,25 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>>  		if (can_zcopy)
>>  			max_skb_len = min_t(u32, VIRTIO_VSOCK_MAX_PKT_BUF_SIZE,
>>  					    (MAX_SKB_FRAGS * PAGE_SIZE));
>> +
>> +		if (info->msg->msg_flags & MSG_ZEROCOPY &&
>> +		    info->op == VIRTIO_VSOCK_OP_RW) {
>> +			uarg = info->msg->msg_ubuf;
>> +
>> +			if (!uarg) {
>> +				uarg = msg_zerocopy_realloc(sk_vsock(vsk),
>> +							    pkt_len, NULL, false);
>> +				if (!uarg) {
>> +					virtio_transport_put_credit(vvs, pkt_len);
>> +					return -ENOMEM;
>> +				}
>> +
>> +				if (!can_zcopy)
>> +					uarg_to_msgzc(uarg)->zerocopy = 0;
>> +
>> +				have_uref = true;
>> +			}
>> +		}
>
>Surely that block should only be done if can_zcopy is true?
>And shouldn't something unset it if info->op != VIRTIO_VSOCK_OP_RW ?
>If the msg_zerocopy_realloc() fails then can't you just set can_zcopy to false.
>
>It info->msg->msg_buf is already set then I think you have to disable zero-copy.
>The caller has already requested a callback - and you can't add another.
>
>In any case by the end of this can_zcopy and have_uref are really the same flag.

I kept the same approach we had before, trying to make as few changes as 
possible.

All these potential issues seem to be pre-existing and should be 
eventually  addressed in other patches IMHO. This patch one only 
resolves the main issue of calling `skb_zcopy_set()` for every skb to 
avoid leaking pages, etc.

@Arseniy can you help on this?

>
>>  	}
>>
>>  	rest_len = pkt_len;
>> @@ -378,27 +371,7 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>>  			break;
>>  		}
>>
>> -		/* We process buffer part by part, allocating skb on
>> -		 * each iteration. If this is last skb for this buffer
>> -		 * and MSG_ZEROCOPY mode is in use - we must allocate
>> -		 * completion for the current syscall.
>> -		 *
>> -		 * Pass pkt_len because msg iter is already consumed
>> -		 * by virtio_transport_fill_skb(), so iter->count
>> -		 * can not be used for RLIMIT_MEMLOCK pinned-pages
>> -		 * accounting done by msg_zerocopy_realloc().
>> -		 */
>> -		if (info->msg && info->msg->msg_flags & MSG_ZEROCOPY &&
>> -		    skb_len == rest_len && info->op == VIRTIO_VSOCK_OP_RW) {
>> -			if (virtio_transport_init_zcopy_skb(vsk, skb,
>> -							    info->msg,
>> -							    pkt_len,
>> -							    can_zcopy)) {
>> -				kfree_skb(skb);
>> -				ret = -ENOMEM;
>> -				break;
>> -			}
>> -		}
>> +		skb_zcopy_set(skb, uarg, NULL);
>>
>>  		virtio_transport_inc_tx_pkt(vvs, skb);
>>
>> @@ -422,6 +395,18 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>>
>>  	virtio_transport_put_credit(vvs, rest_len);
>>
>> +	/* msg_zerocopy_realloc() initializes the ubuf_info refcnt to 1.
>> +	 * skb_zcopy_set() increases it for each skb, so we can drop that
>                                                            ^ must
>
>> +	 * initial reference to keep it balanced.
>> +	 */
>> +	if (have_uref) {
>> +		if (rest_len == pkt_len)
>> +			/* No data sent, abort the notification. */
>> +			net_zcopy_put_abort(uarg, true);
>
>Is it worth optimising for the 'nothing sent' case ?

What do you suggest doing?

I followed what TCP does.

Thanks,
Stefano

>
>-- David
>
>> +		else
>> +			net_zcopy_put(uarg);
>> +	}
>> +
>>  	/* Return number of bytes, if any data has been sent. */
>>  	if (rest_len != pkt_len)
>>  		ret = pkt_len - rest_len;
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] vsock/virtio: fix zerocopy completion for multi-skb sends
  2026-05-18  9:18   ` Stefano Garzarella
@ 2026-05-18  9:33     ` Michael S. Tsirkin
  2026-05-18  9:54       ` Stefano Garzarella
  0 siblings, 1 reply; 13+ messages in thread
From: Michael S. Tsirkin @ 2026-05-18  9:33 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: David Laight, netdev, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Arseniy Krasnov, Stefan Hajnoczi, kvm, Eric Dumazet,
	Eugenio Pérez, Xuan Zhuo, virtualization, David S. Miller,
	Jason Wang, linux-kernel, Maher Azzouzi

On Mon, May 18, 2026 at 11:18:24AM +0200, Stefano Garzarella wrote:
> On Sat, May 16, 2026 at 12:53:29PM +0100, David Laight wrote:
> > On Thu, 14 May 2026 11:29:48 +0200
> > Stefano Garzarella <sgarzare@redhat.com> wrote:
> > 
> > > From: Stefano Garzarella <sgarzare@redhat.com>
> > > 
> > > When a large message is fragmented into multiple skbs, the zerocopy
> > > uarg is only allocated and attached to the last skb in the loop.
> > > Non-final skbs carry pinned user pages with no completion tracking,
> > > so the kernel has no way to notify userspace when those pages are safe
> > > to reuse. If the loop breaks early the uarg is never allocated at all,
> > > leaking pinned pages with no completion notification.
> > > 
> > > Fix this by following the approach used by TCP: allocate the zerocopy
> > > uarg (if not provided by the caller) before the send loop and attach
> > > it to every skb via skb_zcopy_set(), which takes a reference per skb.
> > > Each skb's completion properly decrements the refcount, and the
> > > notification only fires after the last skb is freed.
> > > On failure, if no data was sent, the uarg is cleanly aborted via
> > > net_zcopy_put_abort().
> > > 
> > > This issue was initially discovered by sashiko while reviewing commit
> > > 1cb36e252211 ("vsock/virtio: fix MSG_ZEROCOPY pinned-pages accounting")
> > > but was pre-existing.
> > > 
> > > Fixes: 581512a6dc93 ("vsock/virtio: MSG_ZEROCOPY flag support")
> > > Cc: Arseniy Krasnov <avkrasnov@salutedevices.com>
> > > Closes: https://sashiko.dev/#/patchset/20260420132051.217589-1-sgarzare%40redhat.com
> > > Reported-by: Maher Azzouzi <maherazz04@gmail.com>
> > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > > ---
> > >  net/vmw_vsock/virtio_transport_common.c | 83 ++++++++++---------------
> > >  1 file changed, 34 insertions(+), 49 deletions(-)
> > > 
> > > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > > index 989cc252d3d3..1e3409d28164 100644
> > > --- a/net/vmw_vsock/virtio_transport_common.c
> > > +++ b/net/vmw_vsock/virtio_transport_common.c
> > > @@ -70,34 +70,6 @@ static bool virtio_transport_can_zcopy(const struct virtio_transport *t_ops,
> > >  	return true;
> > >  }
> > > 
> > > -static int virtio_transport_init_zcopy_skb(struct vsock_sock *vsk,
> > > -					   struct sk_buff *skb,
> > > -					   struct msghdr *msg,
> > > -					   size_t pkt_len,
> > > -					   bool zerocopy)
> > > -{
> > > -	struct ubuf_info *uarg;
> > > -
> > > -	if (msg->msg_ubuf) {
> > > -		uarg = msg->msg_ubuf;
> > > -		net_zcopy_get(uarg);
> > > -	} else {
> > > -		struct ubuf_info_msgzc *uarg_zc;
> > > -
> > > -		uarg = msg_zerocopy_realloc(sk_vsock(vsk),
> > > -					    pkt_len, NULL, false);
> > > -		if (!uarg)
> > > -			return -1;
> > > -
> > > -		uarg_zc = uarg_to_msgzc(uarg);
> > > -		uarg_zc->zerocopy = zerocopy ? 1 : 0;
> > > -	}
> > > -
> > > -	skb_zcopy_init(skb, uarg);
> > > -
> > > -	return 0;
> > > -}
> > > -
> > >  static int virtio_transport_fill_skb(struct sk_buff *skb,
> > >  				     struct virtio_vsock_pkt_info *info,
> > >  				     size_t len,
> > > @@ -317,8 +289,10 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> > >  	u32 src_cid, src_port, dst_cid, dst_port;
> > >  	const struct virtio_transport *t_ops;
> > >  	struct virtio_vsock_sock *vvs;
> > > +	struct ubuf_info *uarg = NULL;
> > >  	u32 pkt_len = info->pkt_len;
> > >  	bool can_zcopy = false;
> > > +	bool have_uref = false;
> > >  	u32 rest_len;
> > >  	int ret;
> > > 
> > > @@ -360,6 +334,25 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> > >  		if (can_zcopy)
> > >  			max_skb_len = min_t(u32, VIRTIO_VSOCK_MAX_PKT_BUF_SIZE,
> > >  					    (MAX_SKB_FRAGS * PAGE_SIZE));
> > > +
> > > +		if (info->msg->msg_flags & MSG_ZEROCOPY &&
> > > +		    info->op == VIRTIO_VSOCK_OP_RW) {
> > > +			uarg = info->msg->msg_ubuf;
> > > +
> > > +			if (!uarg) {
> > > +				uarg = msg_zerocopy_realloc(sk_vsock(vsk),
> > > +							    pkt_len, NULL, false);
> > > +				if (!uarg) {
> > > +					virtio_transport_put_credit(vvs, pkt_len);
> > > +					return -ENOMEM;
> > > +				}
> > > +
> > > +				if (!can_zcopy)
> > > +					uarg_to_msgzc(uarg)->zerocopy = 0;
> > > +
> > > +				have_uref = true;
> > > +			}
> > > +		}
> > 
> > Surely that block should only be done if can_zcopy is true?
> > And shouldn't something unset it if info->op != VIRTIO_VSOCK_OP_RW ?
> > If the msg_zerocopy_realloc() fails then can't you just set can_zcopy to false.
> > 
> > It info->msg->msg_buf is already set then I think you have to disable zero-copy.
> > The caller has already requested a callback - and you can't add another.
> > 
> > In any case by the end of this can_zcopy and have_uref are really the same flag.
> 
> I kept the same approach we had before, trying to make as few changes as
> possible.
> 
> All these potential issues seem to be pre-existing and should be eventually
> addressed in other patches IMHO. This patch one only resolves the main issue
> of calling `skb_zcopy_set()` for every skb to avoid leaking pages, etc.

the patch is upstream now, right? So pretty much have to be patches on
top.

> @Arseniy can you help on this?
> 
> > 
> > >  	}
> > > 
> > >  	rest_len = pkt_len;
> > > @@ -378,27 +371,7 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> > >  			break;
> > >  		}
> > > 
> > > -		/* We process buffer part by part, allocating skb on
> > > -		 * each iteration. If this is last skb for this buffer
> > > -		 * and MSG_ZEROCOPY mode is in use - we must allocate
> > > -		 * completion for the current syscall.
> > > -		 *
> > > -		 * Pass pkt_len because msg iter is already consumed
> > > -		 * by virtio_transport_fill_skb(), so iter->count
> > > -		 * can not be used for RLIMIT_MEMLOCK pinned-pages
> > > -		 * accounting done by msg_zerocopy_realloc().
> > > -		 */
> > > -		if (info->msg && info->msg->msg_flags & MSG_ZEROCOPY &&
> > > -		    skb_len == rest_len && info->op == VIRTIO_VSOCK_OP_RW) {
> > > -			if (virtio_transport_init_zcopy_skb(vsk, skb,
> > > -							    info->msg,
> > > -							    pkt_len,
> > > -							    can_zcopy)) {
> > > -				kfree_skb(skb);
> > > -				ret = -ENOMEM;
> > > -				break;
> > > -			}
> > > -		}
> > > +		skb_zcopy_set(skb, uarg, NULL);
> > > 
> > >  		virtio_transport_inc_tx_pkt(vvs, skb);
> > > 
> > > @@ -422,6 +395,18 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> > > 
> > >  	virtio_transport_put_credit(vvs, rest_len);
> > > 
> > > +	/* msg_zerocopy_realloc() initializes the ubuf_info refcnt to 1.
> > > +	 * skb_zcopy_set() increases it for each skb, so we can drop that
> >                                                            ^ must
> > 
> > > +	 * initial reference to keep it balanced.
> > > +	 */
> > > +	if (have_uref) {
> > > +		if (rest_len == pkt_len)
> > > +			/* No data sent, abort the notification. */
> > > +			net_zcopy_put_abort(uarg, true);
> > 
> > Is it worth optimising for the 'nothing sent' case ?
> 
> What do you suggest doing?
> 
> I followed what TCP does.
> 
> Thanks,
> Stefano
> 
> > 
> > -- David
> > 
> > > +		else
> > > +			net_zcopy_put(uarg);
> > > +	}
> > > +
> > >  	/* Return number of bytes, if any data has been sent. */
> > >  	if (rest_len != pkt_len)
> > >  		ret = pkt_len - rest_len;
> > 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] vsock/virtio: fix zerocopy completion for multi-skb sends
  2026-05-18  9:33     ` Michael S. Tsirkin
@ 2026-05-18  9:54       ` Stefano Garzarella
  2026-05-18 10:50         ` David Laight
  0 siblings, 1 reply; 13+ messages in thread
From: Stefano Garzarella @ 2026-05-18  9:54 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: David Laight, netdev, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Arseniy Krasnov, Stefan Hajnoczi, kvm, Eric Dumazet,
	Eugenio Pérez, Xuan Zhuo, virtualization, David S. Miller,
	Jason Wang, linux-kernel, Maher Azzouzi

On Mon, May 18, 2026 at 05:33:08AM -0400, Michael S. Tsirkin wrote:
>On Mon, May 18, 2026 at 11:18:24AM +0200, Stefano Garzarella wrote:
>> On Sat, May 16, 2026 at 12:53:29PM +0100, David Laight wrote:
>> > On Thu, 14 May 2026 11:29:48 +0200
>> > Stefano Garzarella <sgarzare@redhat.com> wrote:
>> >
>> > > From: Stefano Garzarella <sgarzare@redhat.com>
>> > >
>> > > When a large message is fragmented into multiple skbs, the zerocopy
>> > > uarg is only allocated and attached to the last skb in the loop.
>> > > Non-final skbs carry pinned user pages with no completion tracking,
>> > > so the kernel has no way to notify userspace when those pages are safe
>> > > to reuse. If the loop breaks early the uarg is never allocated at all,
>> > > leaking pinned pages with no completion notification.
>> > >
>> > > Fix this by following the approach used by TCP: allocate the zerocopy
>> > > uarg (if not provided by the caller) before the send loop and attach
>> > > it to every skb via skb_zcopy_set(), which takes a reference per skb.
>> > > Each skb's completion properly decrements the refcount, and the
>> > > notification only fires after the last skb is freed.
>> > > On failure, if no data was sent, the uarg is cleanly aborted via
>> > > net_zcopy_put_abort().
>> > >
>> > > This issue was initially discovered by sashiko while reviewing commit
>> > > 1cb36e252211 ("vsock/virtio: fix MSG_ZEROCOPY pinned-pages accounting")
>> > > but was pre-existing.
>> > >
>> > > Fixes: 581512a6dc93 ("vsock/virtio: MSG_ZEROCOPY flag support")
>> > > Cc: Arseniy Krasnov <avkrasnov@salutedevices.com>
>> > > Closes: https://sashiko.dev/#/patchset/20260420132051.217589-1-sgarzare%40redhat.com
>> > > Reported-by: Maher Azzouzi <maherazz04@gmail.com>
>> > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>> > > ---
>> > >  net/vmw_vsock/virtio_transport_common.c | 83 ++++++++++---------------
>> > >  1 file changed, 34 insertions(+), 49 deletions(-)
>> > >
>> > > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>> > > index 989cc252d3d3..1e3409d28164 100644
>> > > --- a/net/vmw_vsock/virtio_transport_common.c
>> > > +++ b/net/vmw_vsock/virtio_transport_common.c
>> > > @@ -70,34 +70,6 @@ static bool virtio_transport_can_zcopy(const struct virtio_transport *t_ops,
>> > >  	return true;
>> > >  }
>> > >
>> > > -static int virtio_transport_init_zcopy_skb(struct vsock_sock *vsk,
>> > > -					   struct sk_buff *skb,
>> > > -					   struct msghdr *msg,
>> > > -					   size_t pkt_len,
>> > > -					   bool zerocopy)
>> > > -{
>> > > -	struct ubuf_info *uarg;
>> > > -
>> > > -	if (msg->msg_ubuf) {
>> > > -		uarg = msg->msg_ubuf;
>> > > -		net_zcopy_get(uarg);
>> > > -	} else {
>> > > -		struct ubuf_info_msgzc *uarg_zc;
>> > > -
>> > > -		uarg = msg_zerocopy_realloc(sk_vsock(vsk),
>> > > -					    pkt_len, NULL, false);
>> > > -		if (!uarg)
>> > > -			return -1;
>> > > -
>> > > -		uarg_zc = uarg_to_msgzc(uarg);
>> > > -		uarg_zc->zerocopy = zerocopy ? 1 : 0;
>> > > -	}
>> > > -
>> > > -	skb_zcopy_init(skb, uarg);
>> > > -
>> > > -	return 0;
>> > > -}
>> > > -
>> > >  static int virtio_transport_fill_skb(struct sk_buff *skb,
>> > >  				     struct virtio_vsock_pkt_info *info,
>> > >  				     size_t len,
>> > > @@ -317,8 +289,10 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>> > >  	u32 src_cid, src_port, dst_cid, dst_port;
>> > >  	const struct virtio_transport *t_ops;
>> > >  	struct virtio_vsock_sock *vvs;
>> > > +	struct ubuf_info *uarg = NULL;
>> > >  	u32 pkt_len = info->pkt_len;
>> > >  	bool can_zcopy = false;
>> > > +	bool have_uref = false;
>> > >  	u32 rest_len;
>> > >  	int ret;
>> > >
>> > > @@ -360,6 +334,25 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>> > >  		if (can_zcopy)
>> > >  			max_skb_len = min_t(u32, VIRTIO_VSOCK_MAX_PKT_BUF_SIZE,
>> > >  					    (MAX_SKB_FRAGS * PAGE_SIZE));
>> > > +
>> > > +		if (info->msg->msg_flags & MSG_ZEROCOPY &&
>> > > +		    info->op == VIRTIO_VSOCK_OP_RW) {
>> > > +			uarg = info->msg->msg_ubuf;
>> > > +
>> > > +			if (!uarg) {
>> > > +				uarg = msg_zerocopy_realloc(sk_vsock(vsk),
>> > > +							    pkt_len, NULL, false);
>> > > +				if (!uarg) {
>> > > +					virtio_transport_put_credit(vvs, pkt_len);
>> > > +					return -ENOMEM;
>> > > +				}
>> > > +
>> > > +				if (!can_zcopy)
>> > > +					uarg_to_msgzc(uarg)->zerocopy = 0;
>> > > +
>> > > +				have_uref = true;
>> > > +			}
>> > > +		}
>> >
>> > Surely that block should only be done if can_zcopy is true?
>> > And shouldn't something unset it if info->op != VIRTIO_VSOCK_OP_RW ?
>> > If the msg_zerocopy_realloc() fails then can't you just set can_zcopy to false.
>> >
>> > It info->msg->msg_buf is already set then I think you have to disable zero-copy.
>> > The caller has already requested a callback - and you can't add another.
>> >
>> > In any case by the end of this can_zcopy and have_uref are really the same flag.
>>
>> I kept the same approach we had before, trying to make as few changes as
>> possible.
>>
>> All these potential issues seem to be pre-existing and should be eventually
>> addressed in other patches IMHO. This patch one only resolves the main issue
>> of calling `skb_zcopy_set()` for every skb to avoid leaking pages, etc.
>
>the patch is upstream now, right? So pretty much have to be patches on
>top.

If those are actual issues, then yes. TBH, I didn’t look into that 
aspect and left it as it was before. We should take a closer look at how 
MSG_ZEROCOPY is handled.

David, if you think it needs fixing and you have time, feel free to send 
patches on top.

Thanks,
Stefano

>
>> @Arseniy can you help on this?
>>
>> >
>> > >  	}
>> > >
>> > >  	rest_len = pkt_len;
>> > > @@ -378,27 +371,7 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>> > >  			break;
>> > >  		}
>> > >
>> > > -		/* We process buffer part by part, allocating skb on
>> > > -		 * each iteration. If this is last skb for this buffer
>> > > -		 * and MSG_ZEROCOPY mode is in use - we must allocate
>> > > -		 * completion for the current syscall.
>> > > -		 *
>> > > -		 * Pass pkt_len because msg iter is already consumed
>> > > -		 * by virtio_transport_fill_skb(), so iter->count
>> > > -		 * can not be used for RLIMIT_MEMLOCK pinned-pages
>> > > -		 * accounting done by msg_zerocopy_realloc().
>> > > -		 */
>> > > -		if (info->msg && info->msg->msg_flags & MSG_ZEROCOPY &&
>> > > -		    skb_len == rest_len && info->op == VIRTIO_VSOCK_OP_RW) {
>> > > -			if (virtio_transport_init_zcopy_skb(vsk, skb,
>> > > -							    info->msg,
>> > > -							    pkt_len,
>> > > -							    can_zcopy)) {
>> > > -				kfree_skb(skb);
>> > > -				ret = -ENOMEM;
>> > > -				break;
>> > > -			}
>> > > -		}
>> > > +		skb_zcopy_set(skb, uarg, NULL);
>> > >
>> > >  		virtio_transport_inc_tx_pkt(vvs, skb);
>> > >
>> > > @@ -422,6 +395,18 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>> > >
>> > >  	virtio_transport_put_credit(vvs, rest_len);
>> > >
>> > > +	/* msg_zerocopy_realloc() initializes the ubuf_info refcnt to 1.
>> > > +	 * skb_zcopy_set() increases it for each skb, so we can drop that
>> >                                                            ^ must
>> >
>> > > +	 * initial reference to keep it balanced.
>> > > +	 */
>> > > +	if (have_uref) {
>> > > +		if (rest_len == pkt_len)
>> > > +			/* No data sent, abort the notification. */
>> > > +			net_zcopy_put_abort(uarg, true);
>> >
>> > Is it worth optimising for the 'nothing sent' case ?
>>
>> What do you suggest doing?
>>
>> I followed what TCP does.
>>
>> Thanks,
>> Stefano
>>
>> >
>> > -- David
>> >
>> > > +		else
>> > > +			net_zcopy_put(uarg);
>> > > +	}
>> > > +
>> > >  	/* Return number of bytes, if any data has been sent. */
>> > >  	if (rest_len != pkt_len)
>> > >  		ret = pkt_len - rest_len;
>> >
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] vsock/virtio: fix zerocopy completion for multi-skb sends
  2026-05-18  9:54       ` Stefano Garzarella
@ 2026-05-18 10:50         ` David Laight
  2026-05-18 11:08           ` Stefano Garzarella
       [not found]           ` <20260519053951.1C60440015@mx4.sberdevices.ru>
  0 siblings, 2 replies; 13+ messages in thread
From: David Laight @ 2026-05-18 10:50 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Michael S. Tsirkin, netdev, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Arseniy Krasnov, Stefan Hajnoczi, kvm, Eric Dumazet,
	Eugenio Pérez, Xuan Zhuo, virtualization, David S. Miller,
	Jason Wang, linux-kernel, Maher Azzouzi

On Mon, 18 May 2026 11:54:19 +0200
Stefano Garzarella <sgarzare@redhat.com> wrote:

> On Mon, May 18, 2026 at 05:33:08AM -0400, Michael S. Tsirkin wrote:
> >On Mon, May 18, 2026 at 11:18:24AM +0200, Stefano Garzarella wrote:  
> >> On Sat, May 16, 2026 at 12:53:29PM +0100, David Laight wrote:  
> >> > On Thu, 14 May 2026 11:29:48 +0200
> >> > Stefano Garzarella <sgarzare@redhat.com> wrote:
> >> >  
> >> > > From: Stefano Garzarella <sgarzare@redhat.com>
> >> > >
> >> > > When a large message is fragmented into multiple skbs, the zerocopy
> >> > > uarg is only allocated and attached to the last skb in the loop.
> >> > > Non-final skbs carry pinned user pages with no completion tracking,
> >> > > so the kernel has no way to notify userspace when those pages are safe
> >> > > to reuse. If the loop breaks early the uarg is never allocated at all,
> >> > > leaking pinned pages with no completion notification.
> >> > >
> >> > > Fix this by following the approach used by TCP: allocate the zerocopy
> >> > > uarg (if not provided by the caller) before the send loop and attach
> >> > > it to every skb via skb_zcopy_set(), which takes a reference per skb.
> >> > > Each skb's completion properly decrements the refcount, and the
> >> > > notification only fires after the last skb is freed.
> >> > > On failure, if no data was sent, the uarg is cleanly aborted via
> >> > > net_zcopy_put_abort().
> >> > >
> >> > > This issue was initially discovered by sashiko while reviewing commit
> >> > > 1cb36e252211 ("vsock/virtio: fix MSG_ZEROCOPY pinned-pages accounting")
> >> > > but was pre-existing.
> >> > >
> >> > > Fixes: 581512a6dc93 ("vsock/virtio: MSG_ZEROCOPY flag support")
> >> > > Cc: Arseniy Krasnov <avkrasnov@salutedevices.com>
> >> > > Closes: https://sashiko.dev/#/patchset/20260420132051.217589-1-sgarzare%40redhat.com
> >> > > Reported-by: Maher Azzouzi <maherazz04@gmail.com>
> >> > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> >> > > ---
> >> > >  net/vmw_vsock/virtio_transport_common.c | 83 ++++++++++---------------
> >> > >  1 file changed, 34 insertions(+), 49 deletions(-)
> >> > >
> >> > > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> >> > > index 989cc252d3d3..1e3409d28164 100644
> >> > > --- a/net/vmw_vsock/virtio_transport_common.c
> >> > > +++ b/net/vmw_vsock/virtio_transport_common.c
> >> > > @@ -70,34 +70,6 @@ static bool virtio_transport_can_zcopy(const struct virtio_transport *t_ops,
> >> > >  	return true;
> >> > >  }
> >> > >
> >> > > -static int virtio_transport_init_zcopy_skb(struct vsock_sock *vsk,
> >> > > -					   struct sk_buff *skb,
> >> > > -					   struct msghdr *msg,
> >> > > -					   size_t pkt_len,
> >> > > -					   bool zerocopy)
> >> > > -{
> >> > > -	struct ubuf_info *uarg;
> >> > > -
> >> > > -	if (msg->msg_ubuf) {
> >> > > -		uarg = msg->msg_ubuf;
> >> > > -		net_zcopy_get(uarg);
> >> > > -	} else {
> >> > > -		struct ubuf_info_msgzc *uarg_zc;
> >> > > -
> >> > > -		uarg = msg_zerocopy_realloc(sk_vsock(vsk),
> >> > > -					    pkt_len, NULL, false);
> >> > > -		if (!uarg)
> >> > > -			return -1;
> >> > > -
> >> > > -		uarg_zc = uarg_to_msgzc(uarg);
> >> > > -		uarg_zc->zerocopy = zerocopy ? 1 : 0;
> >> > > -	}
> >> > > -
> >> > > -	skb_zcopy_init(skb, uarg);
> >> > > -
> >> > > -	return 0;
> >> > > -}
> >> > > -
> >> > >  static int virtio_transport_fill_skb(struct sk_buff *skb,
> >> > >  				     struct virtio_vsock_pkt_info *info,
> >> > >  				     size_t len,
> >> > > @@ -317,8 +289,10 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> >> > >  	u32 src_cid, src_port, dst_cid, dst_port;
> >> > >  	const struct virtio_transport *t_ops;
> >> > >  	struct virtio_vsock_sock *vvs;
> >> > > +	struct ubuf_info *uarg = NULL;
> >> > >  	u32 pkt_len = info->pkt_len;
> >> > >  	bool can_zcopy = false;
> >> > > +	bool have_uref = false;
> >> > >  	u32 rest_len;
> >> > >  	int ret;
> >> > >
> >> > > @@ -360,6 +334,25 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> >> > >  		if (can_zcopy)
> >> > >  			max_skb_len = min_t(u32, VIRTIO_VSOCK_MAX_PKT_BUF_SIZE,
> >> > >  					    (MAX_SKB_FRAGS * PAGE_SIZE));
> >> > > +
> >> > > +		if (info->msg->msg_flags & MSG_ZEROCOPY &&
> >> > > +		    info->op == VIRTIO_VSOCK_OP_RW) {
> >> > > +			uarg = info->msg->msg_ubuf;
> >> > > +
> >> > > +			if (!uarg) {
> >> > > +				uarg = msg_zerocopy_realloc(sk_vsock(vsk),
> >> > > +							    pkt_len, NULL, false);
> >> > > +				if (!uarg) {
> >> > > +					virtio_transport_put_credit(vvs, pkt_len);
> >> > > +					return -ENOMEM;
> >> > > +				}
> >> > > +
> >> > > +				if (!can_zcopy)
> >> > > +					uarg_to_msgzc(uarg)->zerocopy = 0;
> >> > > +
> >> > > +				have_uref = true;
> >> > > +			}
> >> > > +		}  
> >> >
> >> > Surely that block should only be done if can_zcopy is true?
> >> > And shouldn't something unset it if info->op != VIRTIO_VSOCK_OP_RW ?
> >> > If the msg_zerocopy_realloc() fails then can't you just set can_zcopy to false.
> >> >
> >> > It info->msg->msg_buf is already set then I think you have to disable zero-copy.
> >> > The caller has already requested a callback - and you can't add another.
> >> >
> >> > In any case by the end of this can_zcopy and have_uref are really the same flag.  
> >>
> >> I kept the same approach we had before, trying to make as few changes as
> >> possible.
> >>
> >> All these potential issues seem to be pre-existing and should be eventually
> >> addressed in other patches IMHO. This patch one only resolves the main issue
> >> of calling `skb_zcopy_set()` for every skb to avoid leaking pages, etc.  
> >
> >the patch is upstream now, right? So pretty much have to be patches on
> >top.  
> 
> If those are actual issues, then yes. TBH, I didn’t look into that 
> aspect and left it as it was before. We should take a closer look at how 
> MSG_ZEROCOPY is handled.
> 
> David, if you think it needs fixing and you have time, feel free to send 
> patches on top.

I'm not fully sure how it all works.
Especially the paths where msg->msg_ubuf is non-NULL, I suspect it should
be added to all the skb even if the ZEROCOPY flag isn't set.
I was just reading the one function.
But there did look like some very dodgy conditionals.

-- David

> 
> Thanks,
> Stefano
> 
> >  
> >> @Arseniy can you help on this?
> >>  
> >> >  
> >> > >  	}
> >> > >
> >> > >  	rest_len = pkt_len;
> >> > > @@ -378,27 +371,7 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> >> > >  			break;
> >> > >  		}
> >> > >
> >> > > -		/* We process buffer part by part, allocating skb on
> >> > > -		 * each iteration. If this is last skb for this buffer
> >> > > -		 * and MSG_ZEROCOPY mode is in use - we must allocate
> >> > > -		 * completion for the current syscall.
> >> > > -		 *
> >> > > -		 * Pass pkt_len because msg iter is already consumed
> >> > > -		 * by virtio_transport_fill_skb(), so iter->count
> >> > > -		 * can not be used for RLIMIT_MEMLOCK pinned-pages
> >> > > -		 * accounting done by msg_zerocopy_realloc().
> >> > > -		 */
> >> > > -		if (info->msg && info->msg->msg_flags & MSG_ZEROCOPY &&
> >> > > -		    skb_len == rest_len && info->op == VIRTIO_VSOCK_OP_RW) {
> >> > > -			if (virtio_transport_init_zcopy_skb(vsk, skb,
> >> > > -							    info->msg,
> >> > > -							    pkt_len,
> >> > > -							    can_zcopy)) {
> >> > > -				kfree_skb(skb);
> >> > > -				ret = -ENOMEM;
> >> > > -				break;
> >> > > -			}
> >> > > -		}
> >> > > +		skb_zcopy_set(skb, uarg, NULL);
> >> > >
> >> > >  		virtio_transport_inc_tx_pkt(vvs, skb);
> >> > >
> >> > > @@ -422,6 +395,18 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> >> > >
> >> > >  	virtio_transport_put_credit(vvs, rest_len);
> >> > >
> >> > > +	/* msg_zerocopy_realloc() initializes the ubuf_info refcnt to 1.
> >> > > +	 * skb_zcopy_set() increases it for each skb, so we can drop that  
> >> >                                                            ^ must
> >> >  
> >> > > +	 * initial reference to keep it balanced.
> >> > > +	 */
> >> > > +	if (have_uref) {
> >> > > +		if (rest_len == pkt_len)
> >> > > +			/* No data sent, abort the notification. */
> >> > > +			net_zcopy_put_abort(uarg, true);  
> >> >
> >> > Is it worth optimising for the 'nothing sent' case ?  
> >>
> >> What do you suggest doing?
> >>
> >> I followed what TCP does.
> >>
> >> Thanks,
> >> Stefano
> >>  
> >> >
> >> > -- David
> >> >  
> >> > > +		else
> >> > > +			net_zcopy_put(uarg);
> >> > > +	}
> >> > > +
> >> > >  	/* Return number of bytes, if any data has been sent. */
> >> > >  	if (rest_len != pkt_len)
> >> > >  		ret = pkt_len - rest_len;  
> >> >  
> >  
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] vsock/virtio: fix zerocopy completion for multi-skb sends
  2026-05-18 10:50         ` David Laight
@ 2026-05-18 11:08           ` Stefano Garzarella
       [not found]           ` <20260519053951.1C60440015@mx4.sberdevices.ru>
  1 sibling, 0 replies; 13+ messages in thread
From: Stefano Garzarella @ 2026-05-18 11:08 UTC (permalink / raw)
  To: David Laight
  Cc: Michael S. Tsirkin, netdev, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Arseniy Krasnov, Stefan Hajnoczi, kvm, Eric Dumazet,
	Eugenio Pérez, Xuan Zhuo, virtualization, David S. Miller,
	Jason Wang, linux-kernel, Maher Azzouzi

On Mon, May 18, 2026 at 11:50:05AM +0100, David Laight wrote:
>On Mon, 18 May 2026 11:54:19 +0200
>Stefano Garzarella <sgarzare@redhat.com> wrote:
>
>> On Mon, May 18, 2026 at 05:33:08AM -0400, Michael S. Tsirkin wrote:
>> >On Mon, May 18, 2026 at 11:18:24AM +0200, Stefano Garzarella wrote:
>> >> On Sat, May 16, 2026 at 12:53:29PM +0100, David Laight wrote:
>> >> > On Thu, 14 May 2026 11:29:48 +0200
>> >> > Stefano Garzarella <sgarzare@redhat.com> wrote:
>> >> >
>> >> > > From: Stefano Garzarella <sgarzare@redhat.com>
>> >> > >
>> >> > > When a large message is fragmented into multiple skbs, the zerocopy
>> >> > > uarg is only allocated and attached to the last skb in the loop.
>> >> > > Non-final skbs carry pinned user pages with no completion tracking,
>> >> > > so the kernel has no way to notify userspace when those pages are safe
>> >> > > to reuse. If the loop breaks early the uarg is never allocated at all,
>> >> > > leaking pinned pages with no completion notification.
>> >> > >
>> >> > > Fix this by following the approach used by TCP: allocate the zerocopy
>> >> > > uarg (if not provided by the caller) before the send loop and attach
>> >> > > it to every skb via skb_zcopy_set(), which takes a reference per skb.
>> >> > > Each skb's completion properly decrements the refcount, and the
>> >> > > notification only fires after the last skb is freed.
>> >> > > On failure, if no data was sent, the uarg is cleanly aborted via
>> >> > > net_zcopy_put_abort().
>> >> > >
>> >> > > This issue was initially discovered by sashiko while reviewing commit
>> >> > > 1cb36e252211 ("vsock/virtio: fix MSG_ZEROCOPY pinned-pages accounting")
>> >> > > but was pre-existing.
>> >> > >
>> >> > > Fixes: 581512a6dc93 ("vsock/virtio: MSG_ZEROCOPY flag support")
>> >> > > Cc: Arseniy Krasnov <avkrasnov@salutedevices.com>
>> >> > > Closes: https://sashiko.dev/#/patchset/20260420132051.217589-1-sgarzare%40redhat.com
>> >> > > Reported-by: Maher Azzouzi <maherazz04@gmail.com>
>> >> > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>> >> > > ---
>> >> > >  net/vmw_vsock/virtio_transport_common.c | 83 ++++++++++---------------
>> >> > >  1 file changed, 34 insertions(+), 49 deletions(-)
>> >> > >
>> >> > > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>> >> > > index 989cc252d3d3..1e3409d28164 100644
>> >> > > --- a/net/vmw_vsock/virtio_transport_common.c
>> >> > > +++ b/net/vmw_vsock/virtio_transport_common.c
>> >> > > @@ -70,34 +70,6 @@ static bool virtio_transport_can_zcopy(const struct virtio_transport *t_ops,
>> >> > >  	return true;
>> >> > >  }
>> >> > >
>> >> > > -static int virtio_transport_init_zcopy_skb(struct vsock_sock *vsk,
>> >> > > -					   struct sk_buff *skb,
>> >> > > -					   struct msghdr *msg,
>> >> > > -					   size_t pkt_len,
>> >> > > -					   bool zerocopy)
>> >> > > -{
>> >> > > -	struct ubuf_info *uarg;
>> >> > > -
>> >> > > -	if (msg->msg_ubuf) {
>> >> > > -		uarg = msg->msg_ubuf;
>> >> > > -		net_zcopy_get(uarg);
>> >> > > -	} else {
>> >> > > -		struct ubuf_info_msgzc *uarg_zc;
>> >> > > -
>> >> > > -		uarg = msg_zerocopy_realloc(sk_vsock(vsk),
>> >> > > -					    pkt_len, NULL, false);
>> >> > > -		if (!uarg)
>> >> > > -			return -1;
>> >> > > -
>> >> > > -		uarg_zc = uarg_to_msgzc(uarg);
>> >> > > -		uarg_zc->zerocopy = zerocopy ? 1 : 0;
>> >> > > -	}
>> >> > > -
>> >> > > -	skb_zcopy_init(skb, uarg);
>> >> > > -
>> >> > > -	return 0;
>> >> > > -}
>> >> > > -
>> >> > >  static int virtio_transport_fill_skb(struct sk_buff *skb,
>> >> > >  				     struct virtio_vsock_pkt_info *info,
>> >> > >  				     size_t len,
>> >> > > @@ -317,8 +289,10 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>> >> > >  	u32 src_cid, src_port, dst_cid, dst_port;
>> >> > >  	const struct virtio_transport *t_ops;
>> >> > >  	struct virtio_vsock_sock *vvs;
>> >> > > +	struct ubuf_info *uarg = NULL;
>> >> > >  	u32 pkt_len = info->pkt_len;
>> >> > >  	bool can_zcopy = false;
>> >> > > +	bool have_uref = false;
>> >> > >  	u32 rest_len;
>> >> > >  	int ret;
>> >> > >
>> >> > > @@ -360,6 +334,25 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>> >> > >  		if (can_zcopy)
>> >> > >  			max_skb_len = min_t(u32, VIRTIO_VSOCK_MAX_PKT_BUF_SIZE,
>> >> > >  					    (MAX_SKB_FRAGS * PAGE_SIZE));
>> >> > > +
>> >> > > +		if (info->msg->msg_flags & MSG_ZEROCOPY &&
>> >> > > +		    info->op == VIRTIO_VSOCK_OP_RW) {
>> >> > > +			uarg = info->msg->msg_ubuf;
>> >> > > +
>> >> > > +			if (!uarg) {
>> >> > > +				uarg = msg_zerocopy_realloc(sk_vsock(vsk),
>> >> > > +							    pkt_len, NULL, false);
>> >> > > +				if (!uarg) {
>> >> > > +					virtio_transport_put_credit(vvs, pkt_len);
>> >> > > +					return -ENOMEM;
>> >> > > +				}
>> >> > > +
>> >> > > +				if (!can_zcopy)
>> >> > > +					uarg_to_msgzc(uarg)->zerocopy = 0;
>> >> > > +
>> >> > > +				have_uref = true;
>> >> > > +			}
>> >> > > +		}
>> >> >
>> >> > Surely that block should only be done if can_zcopy is true?
>> >> > And shouldn't something unset it if info->op != VIRTIO_VSOCK_OP_RW ?
>> >> > If the msg_zerocopy_realloc() fails then can't you just set can_zcopy to false.
>> >> >
>> >> > It info->msg->msg_buf is already set then I think you have to disable zero-copy.
>> >> > The caller has already requested a callback - and you can't add another.
>> >> >
>> >> > In any case by the end of this can_zcopy and have_uref are really the same flag.
>> >>
>> >> I kept the same approach we had before, trying to make as few changes as
>> >> possible.
>> >>
>> >> All these potential issues seem to be pre-existing and should be eventually
>> >> addressed in other patches IMHO. This patch one only resolves the main issue
>> >> of calling `skb_zcopy_set()` for every skb to avoid leaking pages, etc.
>> >
>> >the patch is upstream now, right? So pretty much have to be patches on
>> >top.
>>
>> If those are actual issues, then yes. TBH, I didn’t look into that
>> aspect and left it as it was before. We should take a closer look at how
>> MSG_ZEROCOPY is handled.
>>
>> David, if you think it needs fixing and you have time, feel free to send
>> patches on top.
>
>I'm not fully sure how it all works.

Same here, so I pinged Arseniy who worked on that, since it seemed 
deliberate to have `can_zcopy` (and set `uarg->zerocopy` accordingly) 
only when it was supported by the transport.

>Especially the paths where msg->msg_ubuf is non-NULL, I suspect it should
>be added to all the skb even if the ZEROCOPY flag isn't set.
>I was just reading the one function.
>But there did look like some very dodgy conditionals.

I see, let's wait for Arseniy's feedback; otherwise, I'll try to fix it 
next week. As mentioned, this issue existed before this patch, so it 
shouldn't be a regression.

Thanks,
Stefano


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] vsock/virtio: fix zerocopy completion for multi-skb sends
       [not found]           ` <20260519053951.1C60440015@mx4.sberdevices.ru>
@ 2026-05-19  6:37             ` Arseniy Krasnov
  2026-05-19  9:49               ` Stefano Garzarella
  0 siblings, 1 reply; 13+ messages in thread
From: Arseniy Krasnov @ 2026-05-19  6:37 UTC (permalink / raw)
  To: Stefano Garzarella, David Laight
  Cc: Michael S. Tsirkin, netdev, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Stefan Hajnoczi, kvm, Eric Dumazet,
	Eugenio Pérez, Xuan Zhuo, virtualization, David S. Miller,
	Jason Wang, linux-kernel, Maher Azzouzi



On 18/05/2026 14:08, Stefano Garzarella wrote:
> On Mon, May 18, 2026 at 11:50:05AM +0100, David Laight wrote:
>> On Mon, 18 May 2026 11:54:19 +0200
>> Stefano Garzarella <sgarzare@redhat.com> wrote:
>>
>>> On Mon, May 18, 2026 at 05:33:08AM -0400, Michael S. Tsirkin wrote:
>>> >On Mon, May 18, 2026 at 11:18:24AM +0200, Stefano Garzarella wrote:
>>> >> On Sat, May 16, 2026 at 12:53:29PM +0100, David Laight wrote:
>>> >> > On Thu, 14 May 2026 11:29:48 +0200
>>> >> > Stefano Garzarella <sgarzare@redhat.com> wrote:
>>> >> >
>>> >> > > From: Stefano Garzarella <sgarzare@redhat.com>
>>> >> > >
>>> >> > > When a large message is fragmented into multiple skbs, the zerocopy
>>> >> > > uarg is only allocated and attached to the last skb in the loop.
>>> >> > > Non-final skbs carry pinned user pages with no completion tracking,
>>> >> > > so the kernel has no way to notify userspace when those pages are safe
>>> >> > > to reuse. If the loop breaks early the uarg is never allocated at all,
>>> >> > > leaking pinned pages with no completion notification.
>>> >> > >
>>> >> > > Fix this by following the approach used by TCP: allocate the zerocopy
>>> >> > > uarg (if not provided by the caller) before the send loop and attach
>>> >> > > it to every skb via skb_zcopy_set(), which takes a reference per skb.
>>> >> > > Each skb's completion properly decrements the refcount, and the
>>> >> > > notification only fires after the last skb is freed.
>>> >> > > On failure, if no data was sent, the uarg is cleanly aborted via
>>> >> > > net_zcopy_put_abort().
>>> >> > >
>>> >> > > This issue was initially discovered by sashiko while reviewing commit
>>> >> > > 1cb36e252211 ("vsock/virtio: fix MSG_ZEROCOPY pinned-pages accounting")
>>> >> > > but was pre-existing.
>>> >> > >
>>> >> > > Fixes: 581512a6dc93 ("vsock/virtio: MSG_ZEROCOPY flag support")
>>> >> > > Cc: Arseniy Krasnov <avkrasnov@salutedevices.com>
>>> >> > > Closes: https://sashiko.dev/#/patchset/20260420132051.217589-1-sgarzare%40redhat.com
>>> >> > > Reported-by: Maher Azzouzi <maherazz04@gmail.com>
>>> >> > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>>> >> > > ---
>>> >> > >  net/vmw_vsock/virtio_transport_common.c | 83 ++++++++++---------------
>>> >> > >  1 file changed, 34 insertions(+), 49 deletions(-)
>>> >> > >
>>> >> > > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>>> >> > > index 989cc252d3d3..1e3409d28164 100644
>>> >> > > --- a/net/vmw_vsock/virtio_transport_common.c
>>> >> > > +++ b/net/vmw_vsock/virtio_transport_common.c
>>> >> > > @@ -70,34 +70,6 @@ static bool virtio_transport_can_zcopy(const struct virtio_transport *t_ops,
>>> >> > >      return true;
>>> >> > >  }
>>> >> > >
>>> >> > > -static int virtio_transport_init_zcopy_skb(struct vsock_sock *vsk,
>>> >> > > -                       struct sk_buff *skb,
>>> >> > > -                       struct msghdr *msg,
>>> >> > > -                       size_t pkt_len,
>>> >> > > -                       bool zerocopy)
>>> >> > > -{
>>> >> > > -    struct ubuf_info *uarg;
>>> >> > > -
>>> >> > > -    if (msg->msg_ubuf) {
>>> >> > > -        uarg = msg->msg_ubuf;
>>> >> > > -        net_zcopy_get(uarg);
>>> >> > > -    } else {
>>> >> > > -        struct ubuf_info_msgzc *uarg_zc;
>>> >> > > -
>>> >> > > -        uarg = msg_zerocopy_realloc(sk_vsock(vsk),
>>> >> > > -                        pkt_len, NULL, false);
>>> >> > > -        if (!uarg)
>>> >> > > -            return -1;
>>> >> > > -
>>> >> > > -        uarg_zc = uarg_to_msgzc(uarg);
>>> >> > > -        uarg_zc->zerocopy = zerocopy ? 1 : 0;
>>> >> > > -    }
>>> >> > > -
>>> >> > > -    skb_zcopy_init(skb, uarg);
>>> >> > > -
>>> >> > > -    return 0;
>>> >> > > -}
>>> >> > > -
>>> >> > >  static int virtio_transport_fill_skb(struct sk_buff *skb,
>>> >> > >                       struct virtio_vsock_pkt_info *info,
>>> >> > >                       size_t len,
>>> >> > > @@ -317,8 +289,10 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>>> >> > >      u32 src_cid, src_port, dst_cid, dst_port;
>>> >> > >      const struct virtio_transport *t_ops;
>>> >> > >      struct virtio_vsock_sock *vvs;
>>> >> > > +    struct ubuf_info *uarg = NULL;
>>> >> > >      u32 pkt_len = info->pkt_len;
>>> >> > >      bool can_zcopy = false;
>>> >> > > +    bool have_uref = false;
>>> >> > >      u32 rest_len;
>>> >> > >      int ret;
>>> >> > >
>>> >> > > @@ -360,6 +334,25 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>>> >> > >          if (can_zcopy)
>>> >> > >              max_skb_len = min_t(u32, VIRTIO_VSOCK_MAX_PKT_BUF_SIZE,
>>> >> > >                          (MAX_SKB_FRAGS * PAGE_SIZE));
>>> >> > > +
>>> >> > > +        if (info->msg->msg_flags & MSG_ZEROCOPY &&
>>> >> > > +            info->op == VIRTIO_VSOCK_OP_RW) {
>>> >> > > +            uarg = info->msg->msg_ubuf;
>>> >> > > +
>>> >> > > +            if (!uarg) {
>>> >> > > +                uarg = msg_zerocopy_realloc(sk_vsock(vsk),
>>> >> > > +                                pkt_len, NULL, false);
>>> >> > > +                if (!uarg) {
>>> >> > > +                    virtio_transport_put_credit(vvs, pkt_len);
>>> >> > > +                    return -ENOMEM;
>>> >> > > +                }
>>> >> > > +
>>> >> > > +                if (!can_zcopy)
>>> >> > > +                    uarg_to_msgzc(uarg)->zerocopy = 0;
>>> >> > > +
>>> >> > > +                have_uref = true;
>>> >> > > +            }
>>> >> > > +        }
>>> >> >

Hi guys! Just some replies after quick look:

>>> >> > Surely that block should only be done if can_zcopy is true?

I guess no, since TCP also allocates uarg even if zerocopy is impossible -
it just sets uarg_to_msgzc(uarg)->zerocopy = 0;

>>> >> > And shouldn't something unset it if info->op != VIRTIO_VSOCK_OP_RW ?

Hm, we can't enter block 'if (info->msg) {' when 'info->op' is not equal to 'VIRTIO_VSOCK_OP_RW',
because 'msg' is not NULL only for 'VIRTIO_VSOCK_OP_RW'. But anyway You point to right thing - check for
'VIRTIO_VSOCK_OP_RW' could be removed here. Just with comment why.

>>> >> > If the msg_zerocopy_realloc() fails then can't you just set can_zcopy to false.

Here I guess it is better to follow TCP way to make same behaviour, because exact
logic for MSG_ZEROCOPY is not documented. TCP also returns error and stops tx loop.

>>> >> >
>>> >> > It info->msg->msg_buf is already set then I think you have to disable zero-copy.
>>> >> > The caller has already requested a callback - and you can't add another.

In TCP implementation if 'msg_ubuf' is set they just use it and check for zerocopy in
the same way as 'msg_ubuf' is NULL.

>>> >> >
>>> >> > In any case by the end of this can_zcopy and have_uref are really the same flag.
>>> >>

Need to check it more. But 'can_zcopy' means that we fill frags in skb, have_uref means that we
allocated completion (but it could be reported with not set 'SO_EE_CODE_ZEROCOPY_COPIED' if
'can_zcopy' was false).


@Stefano, I guess current implementation differs from TCP in two cases (at least from first
view):
1) When 'msg_ubuf' is set: in TCP, already set 'msg_ubuf' is passed to 'skb_zerocopy_iter_stream()'
   (if zerocopy is possible) where it is used as in vsock in call 'skb_zcopy_set()'. In vsock case if
   'msg_ubuf' is not NULL we will pass just NULL to 'skb_zcopy_set()'. Yes this is will be
   no-effect call today (due to checks in 'skb_zcopy_set()'), but anyway - in future may be not.
2) Also i see that 'skb_zerocopy_iter_stream()' in TCP version has some extra checks which are
   missed in vsock - we only just call '__zerocopy_sg_from_iter()' to fill skb in zerocopy way.

But, I think, instead of trying to compare vsock and TCP versions best way is to just copy
current TCP flow as close as possible:
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/tree/net/ipv4/tcp.c#n1144
1) Use same flow of checks for 'msg_flags', 'msg_ubuf', SOCK_ZEROCOPY etc.
2) To copy data we can use 'skb_zerocopy_iter_stream()'.
3) The only thing that we don't need in vsock is dev mem related code from TCP implementation.

I can take this task, but pls need some time - may be two/three weeks due to another tasks. 

What do You think?

Thanks

>>> >> I kept the same approach we had before, trying to make as few changes as
>>> >> possible.
>>> >>
>>> >> All these potential issues seem to be pre-existing and should be eventually
>>> >> addressed in other patches IMHO. This patch one only resolves the main issue
>>> >> of calling `skb_zcopy_set()` for every skb to avoid leaking pages, etc.
>>> >
>>> >the patch is upstream now, right? So pretty much have to be patches on
>>> >top.
>>>
>>> If those are actual issues, then yes. TBH, I didn’t look into that
>>> aspect and left it as it was before. We should take a closer look at how
>>> MSG_ZEROCOPY is handled.
>>>
>>> David, if you think it needs fixing and you have time, feel free to send
>>> patches on top.
>>
>> I'm not fully sure how it all works.
> 
> Same here, so I pinged Arseniy who worked on that, since it seemed deliberate to have `can_zcopy` (and set `uarg->zerocopy` accordingly) only when it was supported by the transport.
> 
>> Especially the paths where msg->msg_ubuf is non-NULL, I suspect it should
>> be added to all the skb even if the ZEROCOPY flag isn't set.
>> I was just reading the one function.
>> But there did look like some very dodgy conditionals.
> 
> I see, let's wait for Arseniy's feedback; otherwise, I'll try to fix it next week. As mentioned, this issue existed before this patch, so it shouldn't be a regression.
> 
> Thanks,
> Stefano
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] vsock/virtio: fix zerocopy completion for multi-skb sends
  2026-05-19  6:37             ` Arseniy Krasnov
@ 2026-05-19  9:49               ` Stefano Garzarella
  2026-05-19 10:40                 ` Arseniy Krasnov
  0 siblings, 1 reply; 13+ messages in thread
From: Stefano Garzarella @ 2026-05-19  9:49 UTC (permalink / raw)
  To: Arseniy Krasnov
  Cc: David Laight, Michael S. Tsirkin, netdev, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Stefan Hajnoczi, kvm, Eric Dumazet,
	Eugenio Pérez, Xuan Zhuo, virtualization, David S. Miller,
	Jason Wang, linux-kernel, Maher Azzouzi

On Tue, May 19, 2026 at 09:37:23AM +0300, Arseniy Krasnov wrote:
>On 18/05/2026 14:08, Stefano Garzarella wrote:
>> On Mon, May 18, 2026 at 11:50:05AM +0100, David Laight wrote:
>>> On Mon, 18 May 2026 11:54:19 +0200
>>> Stefano Garzarella <sgarzare@redhat.com> wrote:

[...]

>
>Hi guys! Just some replies after quick look:
>
>>>> >> > Surely that block should only be done if can_zcopy is true?
>
>I guess no, since TCP also allocates uarg even if zerocopy is impossible -
>it just sets uarg_to_msgzc(uarg)->zerocopy = 0;
>
>>>> >> > And shouldn't something unset it if info->op != VIRTIO_VSOCK_OP_RW ?
>
>Hm, we can't enter block 'if (info->msg) {' when 'info->op' is not equal to 'VIRTIO_VSOCK_OP_RW',
>because 'msg' is not NULL only for 'VIRTIO_VSOCK_OP_RW'. But anyway You point to right thing - check for
>'VIRTIO_VSOCK_OP_RW' could be removed here. Just with comment why.
>
>>>> >> > If the msg_zerocopy_realloc() fails then can't you just set can_zcopy to false.
>
>Here I guess it is better to follow TCP way to make same behaviour, because exact
>logic for MSG_ZEROCOPY is not documented. TCP also returns error and stops tx loop.
>
>>>> >> >
>>>> >> > It info->msg->msg_buf is already set then I think you have to disable zero-copy.
>>>> >> > The caller has already requested a callback - and you can't add another.
>
>In TCP implementation if 'msg_ubuf' is set they just use it and check for zerocopy in
>the same way as 'msg_ubuf' is NULL.
>
>>>> >> >
>>>> >> > In any case by the end of this can_zcopy and have_uref are really the same flag.
>>>> >>
>
>Need to check it more. But 'can_zcopy' means that we fill frags in skb, have_uref means that we
>allocated completion (but it could be reported with not set 'SO_EE_CODE_ZEROCOPY_COPIED' if
>'can_zcopy' was false).
>
>
>@Stefano, I guess current implementation differs from TCP in two cases (at least from first
>view):
>1) When 'msg_ubuf' is set: in TCP, already set 'msg_ubuf' is passed to 'skb_zerocopy_iter_stream()'
>   (if zerocopy is possible) where it is used as in vsock in call 'skb_zcopy_set()'. In vsock case if
>   'msg_ubuf' is not NULL we will pass just NULL to 'skb_zcopy_set()'. Yes this is will be
>   no-effect call today (due to checks in 'skb_zcopy_set()'), but anyway - in future may be not.
>2) Also i see that 'skb_zerocopy_iter_stream()' in TCP version has some 
>extra checks which are
>   missed in vsock - we only just call '__zerocopy_sg_from_iter()' to fill skb in zerocopy way.
>
>But, I think, instead of trying to compare vsock and TCP versions best way is to just copy
>current TCP flow as close as possible:
>https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/tree/net/ipv4/tcp.c#n1144
>1) Use same flow of checks for 'msg_flags', 'msg_ubuf', SOCK_ZEROCOPY etc.
>2) To copy data we can use 'skb_zerocopy_iter_stream()'.
>3) The only thing that we don't need in vsock is dev mem related code from TCP implementation.
>
>I can take this task, but pls need some time - may be two/three weeks due to another tasks.
>
>What do You think?

If you can work on it, please go head. They seems all pre-existing, so 
2/3 weeks are fine. Please let me know if you can't and I'll try to 
allocate some time.

Thanks for the help!

Stefano


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net] vsock/virtio: fix zerocopy completion for multi-skb sends
  2026-05-19  9:49               ` Stefano Garzarella
@ 2026-05-19 10:40                 ` Arseniy Krasnov
  0 siblings, 0 replies; 13+ messages in thread
From: Arseniy Krasnov @ 2026-05-19 10:40 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: David Laight, Michael S. Tsirkin, netdev, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Stefan Hajnoczi, kvm, Eric Dumazet,
	Eugenio Pérez, Xuan Zhuo, virtualization, David S. Miller,
	Jason Wang, linux-kernel, Maher Azzouzi



On 19/05/2026 12:49, Stefano Garzarella wrote:
> On Tue, May 19, 2026 at 09:37:23AM +0300, Arseniy Krasnov wrote:
>> On 18/05/2026 14:08, Stefano Garzarella wrote:
>>> On Mon, May 18, 2026 at 11:50:05AM +0100, David Laight wrote:
>>>> On Mon, 18 May 2026 11:54:19 +0200
>>>> Stefano Garzarella <sgarzare@redhat.com> wrote:
> 
> [...]
> 
>>
>> Hi guys! Just some replies after quick look:
>>
>>>>> >> > Surely that block should only be done if can_zcopy is true?
>>
>> I guess no, since TCP also allocates uarg even if zerocopy is impossible -
>> it just sets uarg_to_msgzc(uarg)->zerocopy = 0;
>>
>>>>> >> > And shouldn't something unset it if info->op != VIRTIO_VSOCK_OP_RW ?
>>
>> Hm, we can't enter block 'if (info->msg) {' when 'info->op' is not equal to 'VIRTIO_VSOCK_OP_RW',
>> because 'msg' is not NULL only for 'VIRTIO_VSOCK_OP_RW'. But anyway You point to right thing - check for
>> 'VIRTIO_VSOCK_OP_RW' could be removed here. Just with comment why.
>>
>>>>> >> > If the msg_zerocopy_realloc() fails then can't you just set can_zcopy to false.
>>
>> Here I guess it is better to follow TCP way to make same behaviour, because exact
>> logic for MSG_ZEROCOPY is not documented. TCP also returns error and stops tx loop.
>>
>>>>> >> >
>>>>> >> > It info->msg->msg_buf is already set then I think you have to disable zero-copy.
>>>>> >> > The caller has already requested a callback - and you can't add another.
>>
>> In TCP implementation if 'msg_ubuf' is set they just use it and check for zerocopy in
>> the same way as 'msg_ubuf' is NULL.
>>
>>>>> >> >
>>>>> >> > In any case by the end of this can_zcopy and have_uref are really the same flag.
>>>>> >>
>>
>> Need to check it more. But 'can_zcopy' means that we fill frags in skb, have_uref means that we
>> allocated completion (but it could be reported with not set 'SO_EE_CODE_ZEROCOPY_COPIED' if
>> 'can_zcopy' was false).
>>
>>
>> @Stefano, I guess current implementation differs from TCP in two cases (at least from first
>> view):
>> 1) When 'msg_ubuf' is set: in TCP, already set 'msg_ubuf' is passed to 'skb_zerocopy_iter_stream()'
>>   (if zerocopy is possible) where it is used as in vsock in call 'skb_zcopy_set()'. In vsock case if
>>   'msg_ubuf' is not NULL we will pass just NULL to 'skb_zcopy_set()'. Yes this is will be
>>   no-effect call today (due to checks in 'skb_zcopy_set()'), but anyway - in future may be not.
>> 2) Also i see that 'skb_zerocopy_iter_stream()' in TCP version has some extra checks which are
>>   missed in vsock - we only just call '__zerocopy_sg_from_iter()' to fill skb in zerocopy way.
>>
>> But, I think, instead of trying to compare vsock and TCP versions best way is to just copy
>> current TCP flow as close as possible:
>> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/tree/net/ipv4/tcp.c#n1144
>> 1) Use same flow of checks for 'msg_flags', 'msg_ubuf', SOCK_ZEROCOPY etc.
>> 2) To copy data we can use 'skb_zerocopy_iter_stream()'.
>> 3) The only thing that we don't need in vsock is dev mem related code from TCP implementation.
>>
>> I can take this task, but pls need some time - may be two/three weeks due to another tasks.
>>
>> What do You think?
> 
> If you can work on it, please go head. They seems all pre-existing, so 2/3 weeks are fine. Please let me know if you can't and I'll try to allocate some time.

No problem I'll take it. If something goes wrong or stuck - i'll let you know

Thanks

> 
> Thanks for the help!
> 
> Stefano
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2026-05-19 10:40 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-14  9:29 [PATCH net] vsock/virtio: fix zerocopy completion for multi-skb sends Stefano Garzarella
2026-05-14 14:07 ` Michael S. Tsirkin
2026-05-15 17:18 ` Arseniy Krasnov
2026-05-16  0:50 ` patchwork-bot+netdevbpf
2026-05-16 11:53 ` David Laight
2026-05-18  9:18   ` Stefano Garzarella
2026-05-18  9:33     ` Michael S. Tsirkin
2026-05-18  9:54       ` Stefano Garzarella
2026-05-18 10:50         ` David Laight
2026-05-18 11:08           ` Stefano Garzarella
     [not found]           ` <20260519053951.1C60440015@mx4.sberdevices.ru>
2026-05-19  6:37             ` Arseniy Krasnov
2026-05-19  9:49               ` Stefano Garzarella
2026-05-19 10:40                 ` Arseniy Krasnov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox