* Re: [PATCH net] vsock/virtio: fix zerocopy completion for multi-skb sends
2026-05-14 9:29 [PATCH net] vsock/virtio: fix zerocopy completion for multi-skb sends Stefano Garzarella
@ 2026-05-14 14:07 ` Michael S. Tsirkin
2026-05-15 17:18 ` Arseniy Krasnov
` (2 subsequent siblings)
3 siblings, 0 replies; 13+ messages in thread
From: Michael S. Tsirkin @ 2026-05-14 14:07 UTC (permalink / raw)
To: Stefano Garzarella
Cc: netdev, Jakub Kicinski, Paolo Abeni, Simon Horman,
Arseniy Krasnov, Stefan Hajnoczi, kvm, Eric Dumazet,
Eugenio Pérez, Xuan Zhuo, virtualization, David S. Miller,
Jason Wang, linux-kernel, Maher Azzouzi
On Thu, May 14, 2026 at 11:29:48AM +0200, Stefano Garzarella wrote:
> From: Stefano Garzarella <sgarzare@redhat.com>
>
> When a large message is fragmented into multiple skbs, the zerocopy
> uarg is only allocated and attached to the last skb in the loop.
> Non-final skbs carry pinned user pages with no completion tracking,
> so the kernel has no way to notify userspace when those pages are safe
> to reuse. If the loop breaks early the uarg is never allocated at all,
> leaking pinned pages with no completion notification.
>
> Fix this by following the approach used by TCP: allocate the zerocopy
> uarg (if not provided by the caller) before the send loop and attach
> it to every skb via skb_zcopy_set(), which takes a reference per skb.
> Each skb's completion properly decrements the refcount, and the
> notification only fires after the last skb is freed.
> On failure, if no data was sent, the uarg is cleanly aborted via
> net_zcopy_put_abort().
>
> This issue was initially discovered by sashiko while reviewing commit
> 1cb36e252211 ("vsock/virtio: fix MSG_ZEROCOPY pinned-pages accounting")
> but was pre-existing.
>
> Fixes: 581512a6dc93 ("vsock/virtio: MSG_ZEROCOPY flag support")
> Cc: Arseniy Krasnov <avkrasnov@salutedevices.com>
> Closes: https://sashiko.dev/#/patchset/20260420132051.217589-1-sgarzare%40redhat.com
> Reported-by: Maher Azzouzi <maherazz04@gmail.com>
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> net/vmw_vsock/virtio_transport_common.c | 83 ++++++++++---------------
> 1 file changed, 34 insertions(+), 49 deletions(-)
>
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index 989cc252d3d3..1e3409d28164 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -70,34 +70,6 @@ static bool virtio_transport_can_zcopy(const struct virtio_transport *t_ops,
> return true;
> }
>
> -static int virtio_transport_init_zcopy_skb(struct vsock_sock *vsk,
> - struct sk_buff *skb,
> - struct msghdr *msg,
> - size_t pkt_len,
> - bool zerocopy)
> -{
> - struct ubuf_info *uarg;
> -
> - if (msg->msg_ubuf) {
> - uarg = msg->msg_ubuf;
> - net_zcopy_get(uarg);
> - } else {
> - struct ubuf_info_msgzc *uarg_zc;
> -
> - uarg = msg_zerocopy_realloc(sk_vsock(vsk),
> - pkt_len, NULL, false);
> - if (!uarg)
> - return -1;
> -
> - uarg_zc = uarg_to_msgzc(uarg);
> - uarg_zc->zerocopy = zerocopy ? 1 : 0;
> - }
> -
> - skb_zcopy_init(skb, uarg);
> -
> - return 0;
> -}
> -
> static int virtio_transport_fill_skb(struct sk_buff *skb,
> struct virtio_vsock_pkt_info *info,
> size_t len,
> @@ -317,8 +289,10 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> u32 src_cid, src_port, dst_cid, dst_port;
> const struct virtio_transport *t_ops;
> struct virtio_vsock_sock *vvs;
> + struct ubuf_info *uarg = NULL;
> u32 pkt_len = info->pkt_len;
> bool can_zcopy = false;
> + bool have_uref = false;
> u32 rest_len;
> int ret;
>
> @@ -360,6 +334,25 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> if (can_zcopy)
> max_skb_len = min_t(u32, VIRTIO_VSOCK_MAX_PKT_BUF_SIZE,
> (MAX_SKB_FRAGS * PAGE_SIZE));
> +
> + if (info->msg->msg_flags & MSG_ZEROCOPY &&
> + info->op == VIRTIO_VSOCK_OP_RW) {
> + uarg = info->msg->msg_ubuf;
> +
> + if (!uarg) {
> + uarg = msg_zerocopy_realloc(sk_vsock(vsk),
> + pkt_len, NULL, false);
> + if (!uarg) {
> + virtio_transport_put_credit(vvs, pkt_len);
> + return -ENOMEM;
> + }
> +
> + if (!can_zcopy)
> + uarg_to_msgzc(uarg)->zerocopy = 0;
> +
> + have_uref = true;
> + }
> + }
> }
>
> rest_len = pkt_len;
> @@ -378,27 +371,7 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> break;
> }
>
> - /* We process buffer part by part, allocating skb on
> - * each iteration. If this is last skb for this buffer
> - * and MSG_ZEROCOPY mode is in use - we must allocate
> - * completion for the current syscall.
> - *
> - * Pass pkt_len because msg iter is already consumed
> - * by virtio_transport_fill_skb(), so iter->count
> - * can not be used for RLIMIT_MEMLOCK pinned-pages
> - * accounting done by msg_zerocopy_realloc().
> - */
> - if (info->msg && info->msg->msg_flags & MSG_ZEROCOPY &&
> - skb_len == rest_len && info->op == VIRTIO_VSOCK_OP_RW) {
> - if (virtio_transport_init_zcopy_skb(vsk, skb,
> - info->msg,
> - pkt_len,
> - can_zcopy)) {
> - kfree_skb(skb);
> - ret = -ENOMEM;
> - break;
> - }
> - }
> + skb_zcopy_set(skb, uarg, NULL);
>
> virtio_transport_inc_tx_pkt(vvs, skb);
>
> @@ -422,6 +395,18 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>
> virtio_transport_put_credit(vvs, rest_len);
>
> + /* msg_zerocopy_realloc() initializes the ubuf_info refcnt to 1.
> + * skb_zcopy_set() increases it for each skb, so we can drop that
> + * initial reference to keep it balanced.
> + */
> + if (have_uref) {
> + if (rest_len == pkt_len)
> + /* No data sent, abort the notification. */
> + net_zcopy_put_abort(uarg, true);
> + else
> + net_zcopy_put(uarg);
> + }
> +
> /* Return number of bytes, if any data has been sent. */
> if (rest_len != pkt_len)
> ret = pkt_len - rest_len;
> --
> 2.54.0
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH net] vsock/virtio: fix zerocopy completion for multi-skb sends
2026-05-14 9:29 [PATCH net] vsock/virtio: fix zerocopy completion for multi-skb sends Stefano Garzarella
2026-05-14 14:07 ` Michael S. Tsirkin
@ 2026-05-15 17:18 ` Arseniy Krasnov
2026-05-16 0:50 ` patchwork-bot+netdevbpf
2026-05-16 11:53 ` David Laight
3 siblings, 0 replies; 13+ messages in thread
From: Arseniy Krasnov @ 2026-05-15 17:18 UTC (permalink / raw)
To: Stefano Garzarella
Cc: netdev, Jakub Kicinski, Paolo Abeni,, Simon Horman,
Stefan Hajnoczi, kvm, Eric Dumazet, Xuan Zhuo,, virtualization,
David S., Miller, Jason Wang, linux-kernel, Maher Azzouzi,
eperezma
> From: Stefano Garzarella <sgarzare@redhat.com>
>
> When a large message is fragmented into multiple skbs, the zerocopy
> uarg is only allocated and attached to the last skb in the loop.
> Non-final skbs carry pinned user pages with no completion tracking,
> so the kernel has no way to notify userspace when those pages are safe
> to reuse. If the loop breaks early the uarg is never allocated at all,
> leaking pinned pages with no completion notification.
>
> Fix this by following the approach used by TCP: allocate the zerocopy
> uarg (if not provided by the caller) before the send loop and attach
> it to every skb via skb_zcopy_set(), which takes a reference per skb.
> Each skb's completion properly decrements the refcount, and the
> notification only fires after the last skb is freed.
> On failure, if no data was sent, the uarg is cleanly aborted via
> net_zcopy_put_abort().
>
> This issue was initially discovered by sashiko while reviewing commit
> 1cb36e252211 ("vsock/virtio: fix MSG_ZEROCOPY pinned-pages accounting")
> but was pre-existing.
>
> Fixes: 581512a6dc93 ("vsock/virtio: MSG_ZEROCOPY flag support")
> Cc: Arseniy Krasnov <avkrasnov@salutedevices.com>
> Closes: https://sashiko.dev/#/patchset/20260420132051.217589-1-sgarzare%40redhat.com
> Reported-by: Maher Azzouzi <maherazz04@gmail.com>
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
> ---
> net/vmw_vsock/virtio_transport_common.c | 83 ++++++++++---------------
> 1 file changed, 34 insertions(+), 49 deletions(-)
>
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index 989cc252d3d3..1e3409d28164 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -70,34 +70,6 @@ static bool virtio_transport_can_zcopy(const struct virtio_transport *t_ops,
> return true;
> }
>
> -static int virtio_transport_init_zcopy_skb(struct vsock_sock *vsk,
> - struct sk_buff *skb,
> - struct msghdr *msg,
> - size_t pkt_len,
> - bool zerocopy)
> -{
> - struct ubuf_info *uarg;
> -
> - if (msg->msg_ubuf) {
> - uarg = msg->msg_ubuf;
> - net_zcopy_get(uarg);
> - } else {
> - struct ubuf_info_msgzc *uarg_zc;
> -
> - uarg = msg_zerocopy_realloc(sk_vsock(vsk),
> - pkt_len, NULL, false);
> - if (!uarg)
> - return -1;
> -
> - uarg_zc = uarg_to_msgzc(uarg);
> - uarg_zc->zerocopy = zerocopy ? 1 : 0;
> - }
> -
> - skb_zcopy_init(skb, uarg);
> -
> - return 0;
> -}
> -
> static int virtio_transport_fill_skb(struct sk_buff *skb,
> struct virtio_vsock_pkt_info *info,
> size_t len,
> @@ -317,8 +289,10 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> u32 src_cid, src_port, dst_cid, dst_port;
> const struct virtio_transport *t_ops;
> struct virtio_vsock_sock *vvs;
> + struct ubuf_info *uarg = NULL;
> u32 pkt_len = info->pkt_len;
> bool can_zcopy = false;
> + bool have_uref = false;
> u32 rest_len;
> int ret;
>
> @@ -360,6 +334,25 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> if (can_zcopy)
> max_skb_len = min_t(u32, VIRTIO_VSOCK_MAX_PKT_BUF_SIZE,
> (MAX_SKB_FRAGS * PAGE_SIZE));
> +
> + if (info->msg->msg_flags & MSG_ZEROCOPY &&
> + info->op == VIRTIO_VSOCK_OP_RW) {
> + uarg = info->msg->msg_ubuf;
> +
> + if (!uarg) {
> + uarg = msg_zerocopy_realloc(sk_vsock(vsk),
> + pkt_len, NULL, false);
> + if (!uarg) {
> + virtio_transport_put_credit(vvs, pkt_len);
> + return -ENOMEM;
> + }
> +
> + if (!can_zcopy)
> + uarg_to_msgzc(uarg)->zerocopy = 0;
> +
> + have_uref = true;
> + }
> + }
> }
>
> rest_len = pkt_len;
> @@ -378,27 +371,7 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> break;
> }
>
> - /* We process buffer part by part, allocating skb on
> - * each iteration. If this is last skb for this buffer
> - * and MSG_ZEROCOPY mode is in use - we must allocate
> - * completion for the current syscall.
> - *
> - * Pass pkt_len because msg iter is already consumed
> - * by virtio_transport_fill_skb(), so iter->count
> - * can not be used for RLIMIT_MEMLOCK pinned-pages
> - * accounting done by msg_zerocopy_realloc().
> - */
> - if (info->msg && info->msg->msg_flags & MSG_ZEROCOPY &&
> - skb_len == rest_len && info->op == VIRTIO_VSOCK_OP_RW) {
> - if (virtio_transport_init_zcopy_skb(vsk, skb,
> - info->msg,
> - pkt_len,
> - can_zcopy)) {
> - kfree_skb(skb);
> - ret = -ENOMEM;
> - break;
> - }
> - }
> + skb_zcopy_set(skb, uarg, NULL);
>
> virtio_transport_inc_tx_pkt(vvs, skb);
>
> @@ -422,6 +395,18 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>
> virtio_transport_put_credit(vvs, rest_len);
>
> + /* msg_zerocopy_realloc() initializes the ubuf_info refcnt to 1.
> + * skb_zcopy_set() increases it for each skb, so we can drop that
> + * initial reference to keep it balanced.
> + */
> + if (have_uref) {
> + if (rest_len == pkt_len)
> + /* No data sent, abort the notification. */
> + net_zcopy_put_abort(uarg, true);
> + else
> + net_zcopy_put(uarg);
> + }
> +
> /* Return number of bytes, if any data has been sent. */
> if (rest_len != pkt_len)
> ret = pkt_len - rest_len;
> --
> 2.54.0
>
>
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH net] vsock/virtio: fix zerocopy completion for multi-skb sends
2026-05-14 9:29 [PATCH net] vsock/virtio: fix zerocopy completion for multi-skb sends Stefano Garzarella
2026-05-14 14:07 ` Michael S. Tsirkin
2026-05-15 17:18 ` Arseniy Krasnov
@ 2026-05-16 0:50 ` patchwork-bot+netdevbpf
2026-05-16 11:53 ` David Laight
3 siblings, 0 replies; 13+ messages in thread
From: patchwork-bot+netdevbpf @ 2026-05-16 0:50 UTC (permalink / raw)
To: Stefano Garzarella
Cc: netdev, kuba, pabeni, horms, avkrasnov, stefanha, kvm, edumazet,
eperezma, mst, xuanzhuo, virtualization, davem, jasowang,
linux-kernel, maherazz04
Hello:
This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Thu, 14 May 2026 11:29:48 +0200 you wrote:
> From: Stefano Garzarella <sgarzare@redhat.com>
>
> When a large message is fragmented into multiple skbs, the zerocopy
> uarg is only allocated and attached to the last skb in the loop.
> Non-final skbs carry pinned user pages with no completion tracking,
> so the kernel has no way to notify userspace when those pages are safe
> to reuse. If the loop breaks early the uarg is never allocated at all,
> leaking pinned pages with no completion notification.
>
> [...]
Here is the summary with links:
- [net] vsock/virtio: fix zerocopy completion for multi-skb sends
https://git.kernel.org/netdev/net/c/ae38d9179190
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH net] vsock/virtio: fix zerocopy completion for multi-skb sends
2026-05-14 9:29 [PATCH net] vsock/virtio: fix zerocopy completion for multi-skb sends Stefano Garzarella
` (2 preceding siblings ...)
2026-05-16 0:50 ` patchwork-bot+netdevbpf
@ 2026-05-16 11:53 ` David Laight
2026-05-18 9:18 ` Stefano Garzarella
3 siblings, 1 reply; 13+ messages in thread
From: David Laight @ 2026-05-16 11:53 UTC (permalink / raw)
To: Stefano Garzarella
Cc: netdev, Jakub Kicinski, Paolo Abeni, Simon Horman,
Arseniy Krasnov, Stefan Hajnoczi, kvm, Eric Dumazet,
Eugenio Pérez, Michael S. Tsirkin, Xuan Zhuo, virtualization,
David S. Miller, Jason Wang, linux-kernel, Maher Azzouzi
On Thu, 14 May 2026 11:29:48 +0200
Stefano Garzarella <sgarzare@redhat.com> wrote:
> From: Stefano Garzarella <sgarzare@redhat.com>
>
> When a large message is fragmented into multiple skbs, the zerocopy
> uarg is only allocated and attached to the last skb in the loop.
> Non-final skbs carry pinned user pages with no completion tracking,
> so the kernel has no way to notify userspace when those pages are safe
> to reuse. If the loop breaks early the uarg is never allocated at all,
> leaking pinned pages with no completion notification.
>
> Fix this by following the approach used by TCP: allocate the zerocopy
> uarg (if not provided by the caller) before the send loop and attach
> it to every skb via skb_zcopy_set(), which takes a reference per skb.
> Each skb's completion properly decrements the refcount, and the
> notification only fires after the last skb is freed.
> On failure, if no data was sent, the uarg is cleanly aborted via
> net_zcopy_put_abort().
>
> This issue was initially discovered by sashiko while reviewing commit
> 1cb36e252211 ("vsock/virtio: fix MSG_ZEROCOPY pinned-pages accounting")
> but was pre-existing.
>
> Fixes: 581512a6dc93 ("vsock/virtio: MSG_ZEROCOPY flag support")
> Cc: Arseniy Krasnov <avkrasnov@salutedevices.com>
> Closes: https://sashiko.dev/#/patchset/20260420132051.217589-1-sgarzare%40redhat.com
> Reported-by: Maher Azzouzi <maherazz04@gmail.com>
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
> net/vmw_vsock/virtio_transport_common.c | 83 ++++++++++---------------
> 1 file changed, 34 insertions(+), 49 deletions(-)
>
> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> index 989cc252d3d3..1e3409d28164 100644
> --- a/net/vmw_vsock/virtio_transport_common.c
> +++ b/net/vmw_vsock/virtio_transport_common.c
> @@ -70,34 +70,6 @@ static bool virtio_transport_can_zcopy(const struct virtio_transport *t_ops,
> return true;
> }
>
> -static int virtio_transport_init_zcopy_skb(struct vsock_sock *vsk,
> - struct sk_buff *skb,
> - struct msghdr *msg,
> - size_t pkt_len,
> - bool zerocopy)
> -{
> - struct ubuf_info *uarg;
> -
> - if (msg->msg_ubuf) {
> - uarg = msg->msg_ubuf;
> - net_zcopy_get(uarg);
> - } else {
> - struct ubuf_info_msgzc *uarg_zc;
> -
> - uarg = msg_zerocopy_realloc(sk_vsock(vsk),
> - pkt_len, NULL, false);
> - if (!uarg)
> - return -1;
> -
> - uarg_zc = uarg_to_msgzc(uarg);
> - uarg_zc->zerocopy = zerocopy ? 1 : 0;
> - }
> -
> - skb_zcopy_init(skb, uarg);
> -
> - return 0;
> -}
> -
> static int virtio_transport_fill_skb(struct sk_buff *skb,
> struct virtio_vsock_pkt_info *info,
> size_t len,
> @@ -317,8 +289,10 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> u32 src_cid, src_port, dst_cid, dst_port;
> const struct virtio_transport *t_ops;
> struct virtio_vsock_sock *vvs;
> + struct ubuf_info *uarg = NULL;
> u32 pkt_len = info->pkt_len;
> bool can_zcopy = false;
> + bool have_uref = false;
> u32 rest_len;
> int ret;
>
> @@ -360,6 +334,25 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> if (can_zcopy)
> max_skb_len = min_t(u32, VIRTIO_VSOCK_MAX_PKT_BUF_SIZE,
> (MAX_SKB_FRAGS * PAGE_SIZE));
> +
> + if (info->msg->msg_flags & MSG_ZEROCOPY &&
> + info->op == VIRTIO_VSOCK_OP_RW) {
> + uarg = info->msg->msg_ubuf;
> +
> + if (!uarg) {
> + uarg = msg_zerocopy_realloc(sk_vsock(vsk),
> + pkt_len, NULL, false);
> + if (!uarg) {
> + virtio_transport_put_credit(vvs, pkt_len);
> + return -ENOMEM;
> + }
> +
> + if (!can_zcopy)
> + uarg_to_msgzc(uarg)->zerocopy = 0;
> +
> + have_uref = true;
> + }
> + }
Surely that block should only be done if can_zcopy is true?
And shouldn't something unset it if info->op != VIRTIO_VSOCK_OP_RW ?
If the msg_zerocopy_realloc() fails then can't you just set can_zcopy to false.
It info->msg->msg_buf is already set then I think you have to disable zero-copy.
The caller has already requested a callback - and you can't add another.
In any case by the end of this can_zcopy and have_uref are really the same flag.
> }
>
> rest_len = pkt_len;
> @@ -378,27 +371,7 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> break;
> }
>
> - /* We process buffer part by part, allocating skb on
> - * each iteration. If this is last skb for this buffer
> - * and MSG_ZEROCOPY mode is in use - we must allocate
> - * completion for the current syscall.
> - *
> - * Pass pkt_len because msg iter is already consumed
> - * by virtio_transport_fill_skb(), so iter->count
> - * can not be used for RLIMIT_MEMLOCK pinned-pages
> - * accounting done by msg_zerocopy_realloc().
> - */
> - if (info->msg && info->msg->msg_flags & MSG_ZEROCOPY &&
> - skb_len == rest_len && info->op == VIRTIO_VSOCK_OP_RW) {
> - if (virtio_transport_init_zcopy_skb(vsk, skb,
> - info->msg,
> - pkt_len,
> - can_zcopy)) {
> - kfree_skb(skb);
> - ret = -ENOMEM;
> - break;
> - }
> - }
> + skb_zcopy_set(skb, uarg, NULL);
>
> virtio_transport_inc_tx_pkt(vvs, skb);
>
> @@ -422,6 +395,18 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>
> virtio_transport_put_credit(vvs, rest_len);
>
> + /* msg_zerocopy_realloc() initializes the ubuf_info refcnt to 1.
> + * skb_zcopy_set() increases it for each skb, so we can drop that
^ must
> + * initial reference to keep it balanced.
> + */
> + if (have_uref) {
> + if (rest_len == pkt_len)
> + /* No data sent, abort the notification. */
> + net_zcopy_put_abort(uarg, true);
Is it worth optimising for the 'nothing sent' case ?
-- David
> + else
> + net_zcopy_put(uarg);
> + }
> +
> /* Return number of bytes, if any data has been sent. */
> if (rest_len != pkt_len)
> ret = pkt_len - rest_len;
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH net] vsock/virtio: fix zerocopy completion for multi-skb sends
2026-05-16 11:53 ` David Laight
@ 2026-05-18 9:18 ` Stefano Garzarella
2026-05-18 9:33 ` Michael S. Tsirkin
0 siblings, 1 reply; 13+ messages in thread
From: Stefano Garzarella @ 2026-05-18 9:18 UTC (permalink / raw)
To: David Laight
Cc: netdev, Jakub Kicinski, Paolo Abeni, Simon Horman,
Arseniy Krasnov, Stefan Hajnoczi, kvm, Eric Dumazet,
Eugenio Pérez, Michael S. Tsirkin, Xuan Zhuo, virtualization,
David S. Miller, Jason Wang, linux-kernel, Maher Azzouzi
On Sat, May 16, 2026 at 12:53:29PM +0100, David Laight wrote:
>On Thu, 14 May 2026 11:29:48 +0200
>Stefano Garzarella <sgarzare@redhat.com> wrote:
>
>> From: Stefano Garzarella <sgarzare@redhat.com>
>>
>> When a large message is fragmented into multiple skbs, the zerocopy
>> uarg is only allocated and attached to the last skb in the loop.
>> Non-final skbs carry pinned user pages with no completion tracking,
>> so the kernel has no way to notify userspace when those pages are safe
>> to reuse. If the loop breaks early the uarg is never allocated at all,
>> leaking pinned pages with no completion notification.
>>
>> Fix this by following the approach used by TCP: allocate the zerocopy
>> uarg (if not provided by the caller) before the send loop and attach
>> it to every skb via skb_zcopy_set(), which takes a reference per skb.
>> Each skb's completion properly decrements the refcount, and the
>> notification only fires after the last skb is freed.
>> On failure, if no data was sent, the uarg is cleanly aborted via
>> net_zcopy_put_abort().
>>
>> This issue was initially discovered by sashiko while reviewing commit
>> 1cb36e252211 ("vsock/virtio: fix MSG_ZEROCOPY pinned-pages accounting")
>> but was pre-existing.
>>
>> Fixes: 581512a6dc93 ("vsock/virtio: MSG_ZEROCOPY flag support")
>> Cc: Arseniy Krasnov <avkrasnov@salutedevices.com>
>> Closes: https://sashiko.dev/#/patchset/20260420132051.217589-1-sgarzare%40redhat.com
>> Reported-by: Maher Azzouzi <maherazz04@gmail.com>
>> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>> ---
>> net/vmw_vsock/virtio_transport_common.c | 83 ++++++++++---------------
>> 1 file changed, 34 insertions(+), 49 deletions(-)
>>
>> diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>> index 989cc252d3d3..1e3409d28164 100644
>> --- a/net/vmw_vsock/virtio_transport_common.c
>> +++ b/net/vmw_vsock/virtio_transport_common.c
>> @@ -70,34 +70,6 @@ static bool virtio_transport_can_zcopy(const struct virtio_transport *t_ops,
>> return true;
>> }
>>
>> -static int virtio_transport_init_zcopy_skb(struct vsock_sock *vsk,
>> - struct sk_buff *skb,
>> - struct msghdr *msg,
>> - size_t pkt_len,
>> - bool zerocopy)
>> -{
>> - struct ubuf_info *uarg;
>> -
>> - if (msg->msg_ubuf) {
>> - uarg = msg->msg_ubuf;
>> - net_zcopy_get(uarg);
>> - } else {
>> - struct ubuf_info_msgzc *uarg_zc;
>> -
>> - uarg = msg_zerocopy_realloc(sk_vsock(vsk),
>> - pkt_len, NULL, false);
>> - if (!uarg)
>> - return -1;
>> -
>> - uarg_zc = uarg_to_msgzc(uarg);
>> - uarg_zc->zerocopy = zerocopy ? 1 : 0;
>> - }
>> -
>> - skb_zcopy_init(skb, uarg);
>> -
>> - return 0;
>> -}
>> -
>> static int virtio_transport_fill_skb(struct sk_buff *skb,
>> struct virtio_vsock_pkt_info *info,
>> size_t len,
>> @@ -317,8 +289,10 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>> u32 src_cid, src_port, dst_cid, dst_port;
>> const struct virtio_transport *t_ops;
>> struct virtio_vsock_sock *vvs;
>> + struct ubuf_info *uarg = NULL;
>> u32 pkt_len = info->pkt_len;
>> bool can_zcopy = false;
>> + bool have_uref = false;
>> u32 rest_len;
>> int ret;
>>
>> @@ -360,6 +334,25 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>> if (can_zcopy)
>> max_skb_len = min_t(u32, VIRTIO_VSOCK_MAX_PKT_BUF_SIZE,
>> (MAX_SKB_FRAGS * PAGE_SIZE));
>> +
>> + if (info->msg->msg_flags & MSG_ZEROCOPY &&
>> + info->op == VIRTIO_VSOCK_OP_RW) {
>> + uarg = info->msg->msg_ubuf;
>> +
>> + if (!uarg) {
>> + uarg = msg_zerocopy_realloc(sk_vsock(vsk),
>> + pkt_len, NULL, false);
>> + if (!uarg) {
>> + virtio_transport_put_credit(vvs, pkt_len);
>> + return -ENOMEM;
>> + }
>> +
>> + if (!can_zcopy)
>> + uarg_to_msgzc(uarg)->zerocopy = 0;
>> +
>> + have_uref = true;
>> + }
>> + }
>
>Surely that block should only be done if can_zcopy is true?
>And shouldn't something unset it if info->op != VIRTIO_VSOCK_OP_RW ?
>If the msg_zerocopy_realloc() fails then can't you just set can_zcopy to false.
>
>It info->msg->msg_buf is already set then I think you have to disable zero-copy.
>The caller has already requested a callback - and you can't add another.
>
>In any case by the end of this can_zcopy and have_uref are really the same flag.
I kept the same approach we had before, trying to make as few changes as
possible.
All these potential issues seem to be pre-existing and should be
eventually addressed in other patches IMHO. This patch one only
resolves the main issue of calling `skb_zcopy_set()` for every skb to
avoid leaking pages, etc.
@Arseniy can you help on this?
>
>> }
>>
>> rest_len = pkt_len;
>> @@ -378,27 +371,7 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>> break;
>> }
>>
>> - /* We process buffer part by part, allocating skb on
>> - * each iteration. If this is last skb for this buffer
>> - * and MSG_ZEROCOPY mode is in use - we must allocate
>> - * completion for the current syscall.
>> - *
>> - * Pass pkt_len because msg iter is already consumed
>> - * by virtio_transport_fill_skb(), so iter->count
>> - * can not be used for RLIMIT_MEMLOCK pinned-pages
>> - * accounting done by msg_zerocopy_realloc().
>> - */
>> - if (info->msg && info->msg->msg_flags & MSG_ZEROCOPY &&
>> - skb_len == rest_len && info->op == VIRTIO_VSOCK_OP_RW) {
>> - if (virtio_transport_init_zcopy_skb(vsk, skb,
>> - info->msg,
>> - pkt_len,
>> - can_zcopy)) {
>> - kfree_skb(skb);
>> - ret = -ENOMEM;
>> - break;
>> - }
>> - }
>> + skb_zcopy_set(skb, uarg, NULL);
>>
>> virtio_transport_inc_tx_pkt(vvs, skb);
>>
>> @@ -422,6 +395,18 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>>
>> virtio_transport_put_credit(vvs, rest_len);
>>
>> + /* msg_zerocopy_realloc() initializes the ubuf_info refcnt to 1.
>> + * skb_zcopy_set() increases it for each skb, so we can drop that
> ^ must
>
>> + * initial reference to keep it balanced.
>> + */
>> + if (have_uref) {
>> + if (rest_len == pkt_len)
>> + /* No data sent, abort the notification. */
>> + net_zcopy_put_abort(uarg, true);
>
>Is it worth optimising for the 'nothing sent' case ?
What do you suggest doing?
I followed what TCP does.
Thanks,
Stefano
>
>-- David
>
>> + else
>> + net_zcopy_put(uarg);
>> + }
>> +
>> /* Return number of bytes, if any data has been sent. */
>> if (rest_len != pkt_len)
>> ret = pkt_len - rest_len;
>
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH net] vsock/virtio: fix zerocopy completion for multi-skb sends
2026-05-18 9:18 ` Stefano Garzarella
@ 2026-05-18 9:33 ` Michael S. Tsirkin
2026-05-18 9:54 ` Stefano Garzarella
0 siblings, 1 reply; 13+ messages in thread
From: Michael S. Tsirkin @ 2026-05-18 9:33 UTC (permalink / raw)
To: Stefano Garzarella
Cc: David Laight, netdev, Jakub Kicinski, Paolo Abeni, Simon Horman,
Arseniy Krasnov, Stefan Hajnoczi, kvm, Eric Dumazet,
Eugenio Pérez, Xuan Zhuo, virtualization, David S. Miller,
Jason Wang, linux-kernel, Maher Azzouzi
On Mon, May 18, 2026 at 11:18:24AM +0200, Stefano Garzarella wrote:
> On Sat, May 16, 2026 at 12:53:29PM +0100, David Laight wrote:
> > On Thu, 14 May 2026 11:29:48 +0200
> > Stefano Garzarella <sgarzare@redhat.com> wrote:
> >
> > > From: Stefano Garzarella <sgarzare@redhat.com>
> > >
> > > When a large message is fragmented into multiple skbs, the zerocopy
> > > uarg is only allocated and attached to the last skb in the loop.
> > > Non-final skbs carry pinned user pages with no completion tracking,
> > > so the kernel has no way to notify userspace when those pages are safe
> > > to reuse. If the loop breaks early the uarg is never allocated at all,
> > > leaking pinned pages with no completion notification.
> > >
> > > Fix this by following the approach used by TCP: allocate the zerocopy
> > > uarg (if not provided by the caller) before the send loop and attach
> > > it to every skb via skb_zcopy_set(), which takes a reference per skb.
> > > Each skb's completion properly decrements the refcount, and the
> > > notification only fires after the last skb is freed.
> > > On failure, if no data was sent, the uarg is cleanly aborted via
> > > net_zcopy_put_abort().
> > >
> > > This issue was initially discovered by sashiko while reviewing commit
> > > 1cb36e252211 ("vsock/virtio: fix MSG_ZEROCOPY pinned-pages accounting")
> > > but was pre-existing.
> > >
> > > Fixes: 581512a6dc93 ("vsock/virtio: MSG_ZEROCOPY flag support")
> > > Cc: Arseniy Krasnov <avkrasnov@salutedevices.com>
> > > Closes: https://sashiko.dev/#/patchset/20260420132051.217589-1-sgarzare%40redhat.com
> > > Reported-by: Maher Azzouzi <maherazz04@gmail.com>
> > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > > ---
> > > net/vmw_vsock/virtio_transport_common.c | 83 ++++++++++---------------
> > > 1 file changed, 34 insertions(+), 49 deletions(-)
> > >
> > > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > > index 989cc252d3d3..1e3409d28164 100644
> > > --- a/net/vmw_vsock/virtio_transport_common.c
> > > +++ b/net/vmw_vsock/virtio_transport_common.c
> > > @@ -70,34 +70,6 @@ static bool virtio_transport_can_zcopy(const struct virtio_transport *t_ops,
> > > return true;
> > > }
> > >
> > > -static int virtio_transport_init_zcopy_skb(struct vsock_sock *vsk,
> > > - struct sk_buff *skb,
> > > - struct msghdr *msg,
> > > - size_t pkt_len,
> > > - bool zerocopy)
> > > -{
> > > - struct ubuf_info *uarg;
> > > -
> > > - if (msg->msg_ubuf) {
> > > - uarg = msg->msg_ubuf;
> > > - net_zcopy_get(uarg);
> > > - } else {
> > > - struct ubuf_info_msgzc *uarg_zc;
> > > -
> > > - uarg = msg_zerocopy_realloc(sk_vsock(vsk),
> > > - pkt_len, NULL, false);
> > > - if (!uarg)
> > > - return -1;
> > > -
> > > - uarg_zc = uarg_to_msgzc(uarg);
> > > - uarg_zc->zerocopy = zerocopy ? 1 : 0;
> > > - }
> > > -
> > > - skb_zcopy_init(skb, uarg);
> > > -
> > > - return 0;
> > > -}
> > > -
> > > static int virtio_transport_fill_skb(struct sk_buff *skb,
> > > struct virtio_vsock_pkt_info *info,
> > > size_t len,
> > > @@ -317,8 +289,10 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> > > u32 src_cid, src_port, dst_cid, dst_port;
> > > const struct virtio_transport *t_ops;
> > > struct virtio_vsock_sock *vvs;
> > > + struct ubuf_info *uarg = NULL;
> > > u32 pkt_len = info->pkt_len;
> > > bool can_zcopy = false;
> > > + bool have_uref = false;
> > > u32 rest_len;
> > > int ret;
> > >
> > > @@ -360,6 +334,25 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> > > if (can_zcopy)
> > > max_skb_len = min_t(u32, VIRTIO_VSOCK_MAX_PKT_BUF_SIZE,
> > > (MAX_SKB_FRAGS * PAGE_SIZE));
> > > +
> > > + if (info->msg->msg_flags & MSG_ZEROCOPY &&
> > > + info->op == VIRTIO_VSOCK_OP_RW) {
> > > + uarg = info->msg->msg_ubuf;
> > > +
> > > + if (!uarg) {
> > > + uarg = msg_zerocopy_realloc(sk_vsock(vsk),
> > > + pkt_len, NULL, false);
> > > + if (!uarg) {
> > > + virtio_transport_put_credit(vvs, pkt_len);
> > > + return -ENOMEM;
> > > + }
> > > +
> > > + if (!can_zcopy)
> > > + uarg_to_msgzc(uarg)->zerocopy = 0;
> > > +
> > > + have_uref = true;
> > > + }
> > > + }
> >
> > Surely that block should only be done if can_zcopy is true?
> > And shouldn't something unset it if info->op != VIRTIO_VSOCK_OP_RW ?
> > If the msg_zerocopy_realloc() fails then can't you just set can_zcopy to false.
> >
> > It info->msg->msg_buf is already set then I think you have to disable zero-copy.
> > The caller has already requested a callback - and you can't add another.
> >
> > In any case by the end of this can_zcopy and have_uref are really the same flag.
>
> I kept the same approach we had before, trying to make as few changes as
> possible.
>
> All these potential issues seem to be pre-existing and should be eventually
> addressed in other patches IMHO. This patch one only resolves the main issue
> of calling `skb_zcopy_set()` for every skb to avoid leaking pages, etc.
the patch is upstream now, right? So pretty much have to be patches on
top.
> @Arseniy can you help on this?
>
> >
> > > }
> > >
> > > rest_len = pkt_len;
> > > @@ -378,27 +371,7 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> > > break;
> > > }
> > >
> > > - /* We process buffer part by part, allocating skb on
> > > - * each iteration. If this is last skb for this buffer
> > > - * and MSG_ZEROCOPY mode is in use - we must allocate
> > > - * completion for the current syscall.
> > > - *
> > > - * Pass pkt_len because msg iter is already consumed
> > > - * by virtio_transport_fill_skb(), so iter->count
> > > - * can not be used for RLIMIT_MEMLOCK pinned-pages
> > > - * accounting done by msg_zerocopy_realloc().
> > > - */
> > > - if (info->msg && info->msg->msg_flags & MSG_ZEROCOPY &&
> > > - skb_len == rest_len && info->op == VIRTIO_VSOCK_OP_RW) {
> > > - if (virtio_transport_init_zcopy_skb(vsk, skb,
> > > - info->msg,
> > > - pkt_len,
> > > - can_zcopy)) {
> > > - kfree_skb(skb);
> > > - ret = -ENOMEM;
> > > - break;
> > > - }
> > > - }
> > > + skb_zcopy_set(skb, uarg, NULL);
> > >
> > > virtio_transport_inc_tx_pkt(vvs, skb);
> > >
> > > @@ -422,6 +395,18 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> > >
> > > virtio_transport_put_credit(vvs, rest_len);
> > >
> > > + /* msg_zerocopy_realloc() initializes the ubuf_info refcnt to 1.
> > > + * skb_zcopy_set() increases it for each skb, so we can drop that
> > ^ must
> >
> > > + * initial reference to keep it balanced.
> > > + */
> > > + if (have_uref) {
> > > + if (rest_len == pkt_len)
> > > + /* No data sent, abort the notification. */
> > > + net_zcopy_put_abort(uarg, true);
> >
> > Is it worth optimising for the 'nothing sent' case ?
>
> What do you suggest doing?
>
> I followed what TCP does.
>
> Thanks,
> Stefano
>
> >
> > -- David
> >
> > > + else
> > > + net_zcopy_put(uarg);
> > > + }
> > > +
> > > /* Return number of bytes, if any data has been sent. */
> > > if (rest_len != pkt_len)
> > > ret = pkt_len - rest_len;
> >
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH net] vsock/virtio: fix zerocopy completion for multi-skb sends
2026-05-18 9:33 ` Michael S. Tsirkin
@ 2026-05-18 9:54 ` Stefano Garzarella
2026-05-18 10:50 ` David Laight
0 siblings, 1 reply; 13+ messages in thread
From: Stefano Garzarella @ 2026-05-18 9:54 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: David Laight, netdev, Jakub Kicinski, Paolo Abeni, Simon Horman,
Arseniy Krasnov, Stefan Hajnoczi, kvm, Eric Dumazet,
Eugenio Pérez, Xuan Zhuo, virtualization, David S. Miller,
Jason Wang, linux-kernel, Maher Azzouzi
On Mon, May 18, 2026 at 05:33:08AM -0400, Michael S. Tsirkin wrote:
>On Mon, May 18, 2026 at 11:18:24AM +0200, Stefano Garzarella wrote:
>> On Sat, May 16, 2026 at 12:53:29PM +0100, David Laight wrote:
>> > On Thu, 14 May 2026 11:29:48 +0200
>> > Stefano Garzarella <sgarzare@redhat.com> wrote:
>> >
>> > > From: Stefano Garzarella <sgarzare@redhat.com>
>> > >
>> > > When a large message is fragmented into multiple skbs, the zerocopy
>> > > uarg is only allocated and attached to the last skb in the loop.
>> > > Non-final skbs carry pinned user pages with no completion tracking,
>> > > so the kernel has no way to notify userspace when those pages are safe
>> > > to reuse. If the loop breaks early the uarg is never allocated at all,
>> > > leaking pinned pages with no completion notification.
>> > >
>> > > Fix this by following the approach used by TCP: allocate the zerocopy
>> > > uarg (if not provided by the caller) before the send loop and attach
>> > > it to every skb via skb_zcopy_set(), which takes a reference per skb.
>> > > Each skb's completion properly decrements the refcount, and the
>> > > notification only fires after the last skb is freed.
>> > > On failure, if no data was sent, the uarg is cleanly aborted via
>> > > net_zcopy_put_abort().
>> > >
>> > > This issue was initially discovered by sashiko while reviewing commit
>> > > 1cb36e252211 ("vsock/virtio: fix MSG_ZEROCOPY pinned-pages accounting")
>> > > but was pre-existing.
>> > >
>> > > Fixes: 581512a6dc93 ("vsock/virtio: MSG_ZEROCOPY flag support")
>> > > Cc: Arseniy Krasnov <avkrasnov@salutedevices.com>
>> > > Closes: https://sashiko.dev/#/patchset/20260420132051.217589-1-sgarzare%40redhat.com
>> > > Reported-by: Maher Azzouzi <maherazz04@gmail.com>
>> > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>> > > ---
>> > > net/vmw_vsock/virtio_transport_common.c | 83 ++++++++++---------------
>> > > 1 file changed, 34 insertions(+), 49 deletions(-)
>> > >
>> > > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>> > > index 989cc252d3d3..1e3409d28164 100644
>> > > --- a/net/vmw_vsock/virtio_transport_common.c
>> > > +++ b/net/vmw_vsock/virtio_transport_common.c
>> > > @@ -70,34 +70,6 @@ static bool virtio_transport_can_zcopy(const struct virtio_transport *t_ops,
>> > > return true;
>> > > }
>> > >
>> > > -static int virtio_transport_init_zcopy_skb(struct vsock_sock *vsk,
>> > > - struct sk_buff *skb,
>> > > - struct msghdr *msg,
>> > > - size_t pkt_len,
>> > > - bool zerocopy)
>> > > -{
>> > > - struct ubuf_info *uarg;
>> > > -
>> > > - if (msg->msg_ubuf) {
>> > > - uarg = msg->msg_ubuf;
>> > > - net_zcopy_get(uarg);
>> > > - } else {
>> > > - struct ubuf_info_msgzc *uarg_zc;
>> > > -
>> > > - uarg = msg_zerocopy_realloc(sk_vsock(vsk),
>> > > - pkt_len, NULL, false);
>> > > - if (!uarg)
>> > > - return -1;
>> > > -
>> > > - uarg_zc = uarg_to_msgzc(uarg);
>> > > - uarg_zc->zerocopy = zerocopy ? 1 : 0;
>> > > - }
>> > > -
>> > > - skb_zcopy_init(skb, uarg);
>> > > -
>> > > - return 0;
>> > > -}
>> > > -
>> > > static int virtio_transport_fill_skb(struct sk_buff *skb,
>> > > struct virtio_vsock_pkt_info *info,
>> > > size_t len,
>> > > @@ -317,8 +289,10 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>> > > u32 src_cid, src_port, dst_cid, dst_port;
>> > > const struct virtio_transport *t_ops;
>> > > struct virtio_vsock_sock *vvs;
>> > > + struct ubuf_info *uarg = NULL;
>> > > u32 pkt_len = info->pkt_len;
>> > > bool can_zcopy = false;
>> > > + bool have_uref = false;
>> > > u32 rest_len;
>> > > int ret;
>> > >
>> > > @@ -360,6 +334,25 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>> > > if (can_zcopy)
>> > > max_skb_len = min_t(u32, VIRTIO_VSOCK_MAX_PKT_BUF_SIZE,
>> > > (MAX_SKB_FRAGS * PAGE_SIZE));
>> > > +
>> > > + if (info->msg->msg_flags & MSG_ZEROCOPY &&
>> > > + info->op == VIRTIO_VSOCK_OP_RW) {
>> > > + uarg = info->msg->msg_ubuf;
>> > > +
>> > > + if (!uarg) {
>> > > + uarg = msg_zerocopy_realloc(sk_vsock(vsk),
>> > > + pkt_len, NULL, false);
>> > > + if (!uarg) {
>> > > + virtio_transport_put_credit(vvs, pkt_len);
>> > > + return -ENOMEM;
>> > > + }
>> > > +
>> > > + if (!can_zcopy)
>> > > + uarg_to_msgzc(uarg)->zerocopy = 0;
>> > > +
>> > > + have_uref = true;
>> > > + }
>> > > + }
>> >
>> > Surely that block should only be done if can_zcopy is true?
>> > And shouldn't something unset it if info->op != VIRTIO_VSOCK_OP_RW ?
>> > If the msg_zerocopy_realloc() fails then can't you just set can_zcopy to false.
>> >
>> > It info->msg->msg_buf is already set then I think you have to disable zero-copy.
>> > The caller has already requested a callback - and you can't add another.
>> >
>> > In any case by the end of this can_zcopy and have_uref are really the same flag.
>>
>> I kept the same approach we had before, trying to make as few changes as
>> possible.
>>
>> All these potential issues seem to be pre-existing and should be eventually
>> addressed in other patches IMHO. This patch one only resolves the main issue
>> of calling `skb_zcopy_set()` for every skb to avoid leaking pages, etc.
>
>the patch is upstream now, right? So pretty much have to be patches on
>top.
If those are actual issues, then yes. TBH, I didn’t look into that
aspect and left it as it was before. We should take a closer look at how
MSG_ZEROCOPY is handled.
David, if you think it needs fixing and you have time, feel free to send
patches on top.
Thanks,
Stefano
>
>> @Arseniy can you help on this?
>>
>> >
>> > > }
>> > >
>> > > rest_len = pkt_len;
>> > > @@ -378,27 +371,7 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>> > > break;
>> > > }
>> > >
>> > > - /* We process buffer part by part, allocating skb on
>> > > - * each iteration. If this is last skb for this buffer
>> > > - * and MSG_ZEROCOPY mode is in use - we must allocate
>> > > - * completion for the current syscall.
>> > > - *
>> > > - * Pass pkt_len because msg iter is already consumed
>> > > - * by virtio_transport_fill_skb(), so iter->count
>> > > - * can not be used for RLIMIT_MEMLOCK pinned-pages
>> > > - * accounting done by msg_zerocopy_realloc().
>> > > - */
>> > > - if (info->msg && info->msg->msg_flags & MSG_ZEROCOPY &&
>> > > - skb_len == rest_len && info->op == VIRTIO_VSOCK_OP_RW) {
>> > > - if (virtio_transport_init_zcopy_skb(vsk, skb,
>> > > - info->msg,
>> > > - pkt_len,
>> > > - can_zcopy)) {
>> > > - kfree_skb(skb);
>> > > - ret = -ENOMEM;
>> > > - break;
>> > > - }
>> > > - }
>> > > + skb_zcopy_set(skb, uarg, NULL);
>> > >
>> > > virtio_transport_inc_tx_pkt(vvs, skb);
>> > >
>> > > @@ -422,6 +395,18 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>> > >
>> > > virtio_transport_put_credit(vvs, rest_len);
>> > >
>> > > + /* msg_zerocopy_realloc() initializes the ubuf_info refcnt to 1.
>> > > + * skb_zcopy_set() increases it for each skb, so we can drop that
>> > ^ must
>> >
>> > > + * initial reference to keep it balanced.
>> > > + */
>> > > + if (have_uref) {
>> > > + if (rest_len == pkt_len)
>> > > + /* No data sent, abort the notification. */
>> > > + net_zcopy_put_abort(uarg, true);
>> >
>> > Is it worth optimising for the 'nothing sent' case ?
>>
>> What do you suggest doing?
>>
>> I followed what TCP does.
>>
>> Thanks,
>> Stefano
>>
>> >
>> > -- David
>> >
>> > > + else
>> > > + net_zcopy_put(uarg);
>> > > + }
>> > > +
>> > > /* Return number of bytes, if any data has been sent. */
>> > > if (rest_len != pkt_len)
>> > > ret = pkt_len - rest_len;
>> >
>
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH net] vsock/virtio: fix zerocopy completion for multi-skb sends
2026-05-18 9:54 ` Stefano Garzarella
@ 2026-05-18 10:50 ` David Laight
2026-05-18 11:08 ` Stefano Garzarella
[not found] ` <20260519053951.1C60440015@mx4.sberdevices.ru>
0 siblings, 2 replies; 13+ messages in thread
From: David Laight @ 2026-05-18 10:50 UTC (permalink / raw)
To: Stefano Garzarella
Cc: Michael S. Tsirkin, netdev, Jakub Kicinski, Paolo Abeni,
Simon Horman, Arseniy Krasnov, Stefan Hajnoczi, kvm, Eric Dumazet,
Eugenio Pérez, Xuan Zhuo, virtualization, David S. Miller,
Jason Wang, linux-kernel, Maher Azzouzi
On Mon, 18 May 2026 11:54:19 +0200
Stefano Garzarella <sgarzare@redhat.com> wrote:
> On Mon, May 18, 2026 at 05:33:08AM -0400, Michael S. Tsirkin wrote:
> >On Mon, May 18, 2026 at 11:18:24AM +0200, Stefano Garzarella wrote:
> >> On Sat, May 16, 2026 at 12:53:29PM +0100, David Laight wrote:
> >> > On Thu, 14 May 2026 11:29:48 +0200
> >> > Stefano Garzarella <sgarzare@redhat.com> wrote:
> >> >
> >> > > From: Stefano Garzarella <sgarzare@redhat.com>
> >> > >
> >> > > When a large message is fragmented into multiple skbs, the zerocopy
> >> > > uarg is only allocated and attached to the last skb in the loop.
> >> > > Non-final skbs carry pinned user pages with no completion tracking,
> >> > > so the kernel has no way to notify userspace when those pages are safe
> >> > > to reuse. If the loop breaks early the uarg is never allocated at all,
> >> > > leaking pinned pages with no completion notification.
> >> > >
> >> > > Fix this by following the approach used by TCP: allocate the zerocopy
> >> > > uarg (if not provided by the caller) before the send loop and attach
> >> > > it to every skb via skb_zcopy_set(), which takes a reference per skb.
> >> > > Each skb's completion properly decrements the refcount, and the
> >> > > notification only fires after the last skb is freed.
> >> > > On failure, if no data was sent, the uarg is cleanly aborted via
> >> > > net_zcopy_put_abort().
> >> > >
> >> > > This issue was initially discovered by sashiko while reviewing commit
> >> > > 1cb36e252211 ("vsock/virtio: fix MSG_ZEROCOPY pinned-pages accounting")
> >> > > but was pre-existing.
> >> > >
> >> > > Fixes: 581512a6dc93 ("vsock/virtio: MSG_ZEROCOPY flag support")
> >> > > Cc: Arseniy Krasnov <avkrasnov@salutedevices.com>
> >> > > Closes: https://sashiko.dev/#/patchset/20260420132051.217589-1-sgarzare%40redhat.com
> >> > > Reported-by: Maher Azzouzi <maherazz04@gmail.com>
> >> > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> >> > > ---
> >> > > net/vmw_vsock/virtio_transport_common.c | 83 ++++++++++---------------
> >> > > 1 file changed, 34 insertions(+), 49 deletions(-)
> >> > >
> >> > > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> >> > > index 989cc252d3d3..1e3409d28164 100644
> >> > > --- a/net/vmw_vsock/virtio_transport_common.c
> >> > > +++ b/net/vmw_vsock/virtio_transport_common.c
> >> > > @@ -70,34 +70,6 @@ static bool virtio_transport_can_zcopy(const struct virtio_transport *t_ops,
> >> > > return true;
> >> > > }
> >> > >
> >> > > -static int virtio_transport_init_zcopy_skb(struct vsock_sock *vsk,
> >> > > - struct sk_buff *skb,
> >> > > - struct msghdr *msg,
> >> > > - size_t pkt_len,
> >> > > - bool zerocopy)
> >> > > -{
> >> > > - struct ubuf_info *uarg;
> >> > > -
> >> > > - if (msg->msg_ubuf) {
> >> > > - uarg = msg->msg_ubuf;
> >> > > - net_zcopy_get(uarg);
> >> > > - } else {
> >> > > - struct ubuf_info_msgzc *uarg_zc;
> >> > > -
> >> > > - uarg = msg_zerocopy_realloc(sk_vsock(vsk),
> >> > > - pkt_len, NULL, false);
> >> > > - if (!uarg)
> >> > > - return -1;
> >> > > -
> >> > > - uarg_zc = uarg_to_msgzc(uarg);
> >> > > - uarg_zc->zerocopy = zerocopy ? 1 : 0;
> >> > > - }
> >> > > -
> >> > > - skb_zcopy_init(skb, uarg);
> >> > > -
> >> > > - return 0;
> >> > > -}
> >> > > -
> >> > > static int virtio_transport_fill_skb(struct sk_buff *skb,
> >> > > struct virtio_vsock_pkt_info *info,
> >> > > size_t len,
> >> > > @@ -317,8 +289,10 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> >> > > u32 src_cid, src_port, dst_cid, dst_port;
> >> > > const struct virtio_transport *t_ops;
> >> > > struct virtio_vsock_sock *vvs;
> >> > > + struct ubuf_info *uarg = NULL;
> >> > > u32 pkt_len = info->pkt_len;
> >> > > bool can_zcopy = false;
> >> > > + bool have_uref = false;
> >> > > u32 rest_len;
> >> > > int ret;
> >> > >
> >> > > @@ -360,6 +334,25 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> >> > > if (can_zcopy)
> >> > > max_skb_len = min_t(u32, VIRTIO_VSOCK_MAX_PKT_BUF_SIZE,
> >> > > (MAX_SKB_FRAGS * PAGE_SIZE));
> >> > > +
> >> > > + if (info->msg->msg_flags & MSG_ZEROCOPY &&
> >> > > + info->op == VIRTIO_VSOCK_OP_RW) {
> >> > > + uarg = info->msg->msg_ubuf;
> >> > > +
> >> > > + if (!uarg) {
> >> > > + uarg = msg_zerocopy_realloc(sk_vsock(vsk),
> >> > > + pkt_len, NULL, false);
> >> > > + if (!uarg) {
> >> > > + virtio_transport_put_credit(vvs, pkt_len);
> >> > > + return -ENOMEM;
> >> > > + }
> >> > > +
> >> > > + if (!can_zcopy)
> >> > > + uarg_to_msgzc(uarg)->zerocopy = 0;
> >> > > +
> >> > > + have_uref = true;
> >> > > + }
> >> > > + }
> >> >
> >> > Surely that block should only be done if can_zcopy is true?
> >> > And shouldn't something unset it if info->op != VIRTIO_VSOCK_OP_RW ?
> >> > If the msg_zerocopy_realloc() fails then can't you just set can_zcopy to false.
> >> >
> >> > It info->msg->msg_buf is already set then I think you have to disable zero-copy.
> >> > The caller has already requested a callback - and you can't add another.
> >> >
> >> > In any case by the end of this can_zcopy and have_uref are really the same flag.
> >>
> >> I kept the same approach we had before, trying to make as few changes as
> >> possible.
> >>
> >> All these potential issues seem to be pre-existing and should be eventually
> >> addressed in other patches IMHO. This patch one only resolves the main issue
> >> of calling `skb_zcopy_set()` for every skb to avoid leaking pages, etc.
> >
> >the patch is upstream now, right? So pretty much have to be patches on
> >top.
>
> If those are actual issues, then yes. TBH, I didn’t look into that
> aspect and left it as it was before. We should take a closer look at how
> MSG_ZEROCOPY is handled.
>
> David, if you think it needs fixing and you have time, feel free to send
> patches on top.
I'm not fully sure how it all works.
Especially the paths where msg->msg_ubuf is non-NULL, I suspect it should
be added to all the skb even if the ZEROCOPY flag isn't set.
I was just reading the one function.
But there did look like some very dodgy conditionals.
-- David
>
> Thanks,
> Stefano
>
> >
> >> @Arseniy can you help on this?
> >>
> >> >
> >> > > }
> >> > >
> >> > > rest_len = pkt_len;
> >> > > @@ -378,27 +371,7 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> >> > > break;
> >> > > }
> >> > >
> >> > > - /* We process buffer part by part, allocating skb on
> >> > > - * each iteration. If this is last skb for this buffer
> >> > > - * and MSG_ZEROCOPY mode is in use - we must allocate
> >> > > - * completion for the current syscall.
> >> > > - *
> >> > > - * Pass pkt_len because msg iter is already consumed
> >> > > - * by virtio_transport_fill_skb(), so iter->count
> >> > > - * can not be used for RLIMIT_MEMLOCK pinned-pages
> >> > > - * accounting done by msg_zerocopy_realloc().
> >> > > - */
> >> > > - if (info->msg && info->msg->msg_flags & MSG_ZEROCOPY &&
> >> > > - skb_len == rest_len && info->op == VIRTIO_VSOCK_OP_RW) {
> >> > > - if (virtio_transport_init_zcopy_skb(vsk, skb,
> >> > > - info->msg,
> >> > > - pkt_len,
> >> > > - can_zcopy)) {
> >> > > - kfree_skb(skb);
> >> > > - ret = -ENOMEM;
> >> > > - break;
> >> > > - }
> >> > > - }
> >> > > + skb_zcopy_set(skb, uarg, NULL);
> >> > >
> >> > > virtio_transport_inc_tx_pkt(vvs, skb);
> >> > >
> >> > > @@ -422,6 +395,18 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> >> > >
> >> > > virtio_transport_put_credit(vvs, rest_len);
> >> > >
> >> > > + /* msg_zerocopy_realloc() initializes the ubuf_info refcnt to 1.
> >> > > + * skb_zcopy_set() increases it for each skb, so we can drop that
> >> > ^ must
> >> >
> >> > > + * initial reference to keep it balanced.
> >> > > + */
> >> > > + if (have_uref) {
> >> > > + if (rest_len == pkt_len)
> >> > > + /* No data sent, abort the notification. */
> >> > > + net_zcopy_put_abort(uarg, true);
> >> >
> >> > Is it worth optimising for the 'nothing sent' case ?
> >>
> >> What do you suggest doing?
> >>
> >> I followed what TCP does.
> >>
> >> Thanks,
> >> Stefano
> >>
> >> >
> >> > -- David
> >> >
> >> > > + else
> >> > > + net_zcopy_put(uarg);
> >> > > + }
> >> > > +
> >> > > /* Return number of bytes, if any data has been sent. */
> >> > > if (rest_len != pkt_len)
> >> > > ret = pkt_len - rest_len;
> >> >
> >
>
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH net] vsock/virtio: fix zerocopy completion for multi-skb sends
2026-05-18 10:50 ` David Laight
@ 2026-05-18 11:08 ` Stefano Garzarella
[not found] ` <20260519053951.1C60440015@mx4.sberdevices.ru>
1 sibling, 0 replies; 13+ messages in thread
From: Stefano Garzarella @ 2026-05-18 11:08 UTC (permalink / raw)
To: David Laight
Cc: Michael S. Tsirkin, netdev, Jakub Kicinski, Paolo Abeni,
Simon Horman, Arseniy Krasnov, Stefan Hajnoczi, kvm, Eric Dumazet,
Eugenio Pérez, Xuan Zhuo, virtualization, David S. Miller,
Jason Wang, linux-kernel, Maher Azzouzi
On Mon, May 18, 2026 at 11:50:05AM +0100, David Laight wrote:
>On Mon, 18 May 2026 11:54:19 +0200
>Stefano Garzarella <sgarzare@redhat.com> wrote:
>
>> On Mon, May 18, 2026 at 05:33:08AM -0400, Michael S. Tsirkin wrote:
>> >On Mon, May 18, 2026 at 11:18:24AM +0200, Stefano Garzarella wrote:
>> >> On Sat, May 16, 2026 at 12:53:29PM +0100, David Laight wrote:
>> >> > On Thu, 14 May 2026 11:29:48 +0200
>> >> > Stefano Garzarella <sgarzare@redhat.com> wrote:
>> >> >
>> >> > > From: Stefano Garzarella <sgarzare@redhat.com>
>> >> > >
>> >> > > When a large message is fragmented into multiple skbs, the zerocopy
>> >> > > uarg is only allocated and attached to the last skb in the loop.
>> >> > > Non-final skbs carry pinned user pages with no completion tracking,
>> >> > > so the kernel has no way to notify userspace when those pages are safe
>> >> > > to reuse. If the loop breaks early the uarg is never allocated at all,
>> >> > > leaking pinned pages with no completion notification.
>> >> > >
>> >> > > Fix this by following the approach used by TCP: allocate the zerocopy
>> >> > > uarg (if not provided by the caller) before the send loop and attach
>> >> > > it to every skb via skb_zcopy_set(), which takes a reference per skb.
>> >> > > Each skb's completion properly decrements the refcount, and the
>> >> > > notification only fires after the last skb is freed.
>> >> > > On failure, if no data was sent, the uarg is cleanly aborted via
>> >> > > net_zcopy_put_abort().
>> >> > >
>> >> > > This issue was initially discovered by sashiko while reviewing commit
>> >> > > 1cb36e252211 ("vsock/virtio: fix MSG_ZEROCOPY pinned-pages accounting")
>> >> > > but was pre-existing.
>> >> > >
>> >> > > Fixes: 581512a6dc93 ("vsock/virtio: MSG_ZEROCOPY flag support")
>> >> > > Cc: Arseniy Krasnov <avkrasnov@salutedevices.com>
>> >> > > Closes: https://sashiko.dev/#/patchset/20260420132051.217589-1-sgarzare%40redhat.com
>> >> > > Reported-by: Maher Azzouzi <maherazz04@gmail.com>
>> >> > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>> >> > > ---
>> >> > > net/vmw_vsock/virtio_transport_common.c | 83 ++++++++++---------------
>> >> > > 1 file changed, 34 insertions(+), 49 deletions(-)
>> >> > >
>> >> > > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>> >> > > index 989cc252d3d3..1e3409d28164 100644
>> >> > > --- a/net/vmw_vsock/virtio_transport_common.c
>> >> > > +++ b/net/vmw_vsock/virtio_transport_common.c
>> >> > > @@ -70,34 +70,6 @@ static bool virtio_transport_can_zcopy(const struct virtio_transport *t_ops,
>> >> > > return true;
>> >> > > }
>> >> > >
>> >> > > -static int virtio_transport_init_zcopy_skb(struct vsock_sock *vsk,
>> >> > > - struct sk_buff *skb,
>> >> > > - struct msghdr *msg,
>> >> > > - size_t pkt_len,
>> >> > > - bool zerocopy)
>> >> > > -{
>> >> > > - struct ubuf_info *uarg;
>> >> > > -
>> >> > > - if (msg->msg_ubuf) {
>> >> > > - uarg = msg->msg_ubuf;
>> >> > > - net_zcopy_get(uarg);
>> >> > > - } else {
>> >> > > - struct ubuf_info_msgzc *uarg_zc;
>> >> > > -
>> >> > > - uarg = msg_zerocopy_realloc(sk_vsock(vsk),
>> >> > > - pkt_len, NULL, false);
>> >> > > - if (!uarg)
>> >> > > - return -1;
>> >> > > -
>> >> > > - uarg_zc = uarg_to_msgzc(uarg);
>> >> > > - uarg_zc->zerocopy = zerocopy ? 1 : 0;
>> >> > > - }
>> >> > > -
>> >> > > - skb_zcopy_init(skb, uarg);
>> >> > > -
>> >> > > - return 0;
>> >> > > -}
>> >> > > -
>> >> > > static int virtio_transport_fill_skb(struct sk_buff *skb,
>> >> > > struct virtio_vsock_pkt_info *info,
>> >> > > size_t len,
>> >> > > @@ -317,8 +289,10 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>> >> > > u32 src_cid, src_port, dst_cid, dst_port;
>> >> > > const struct virtio_transport *t_ops;
>> >> > > struct virtio_vsock_sock *vvs;
>> >> > > + struct ubuf_info *uarg = NULL;
>> >> > > u32 pkt_len = info->pkt_len;
>> >> > > bool can_zcopy = false;
>> >> > > + bool have_uref = false;
>> >> > > u32 rest_len;
>> >> > > int ret;
>> >> > >
>> >> > > @@ -360,6 +334,25 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>> >> > > if (can_zcopy)
>> >> > > max_skb_len = min_t(u32, VIRTIO_VSOCK_MAX_PKT_BUF_SIZE,
>> >> > > (MAX_SKB_FRAGS * PAGE_SIZE));
>> >> > > +
>> >> > > + if (info->msg->msg_flags & MSG_ZEROCOPY &&
>> >> > > + info->op == VIRTIO_VSOCK_OP_RW) {
>> >> > > + uarg = info->msg->msg_ubuf;
>> >> > > +
>> >> > > + if (!uarg) {
>> >> > > + uarg = msg_zerocopy_realloc(sk_vsock(vsk),
>> >> > > + pkt_len, NULL, false);
>> >> > > + if (!uarg) {
>> >> > > + virtio_transport_put_credit(vvs, pkt_len);
>> >> > > + return -ENOMEM;
>> >> > > + }
>> >> > > +
>> >> > > + if (!can_zcopy)
>> >> > > + uarg_to_msgzc(uarg)->zerocopy = 0;
>> >> > > +
>> >> > > + have_uref = true;
>> >> > > + }
>> >> > > + }
>> >> >
>> >> > Surely that block should only be done if can_zcopy is true?
>> >> > And shouldn't something unset it if info->op != VIRTIO_VSOCK_OP_RW ?
>> >> > If the msg_zerocopy_realloc() fails then can't you just set can_zcopy to false.
>> >> >
>> >> > It info->msg->msg_buf is already set then I think you have to disable zero-copy.
>> >> > The caller has already requested a callback - and you can't add another.
>> >> >
>> >> > In any case by the end of this can_zcopy and have_uref are really the same flag.
>> >>
>> >> I kept the same approach we had before, trying to make as few changes as
>> >> possible.
>> >>
>> >> All these potential issues seem to be pre-existing and should be eventually
>> >> addressed in other patches IMHO. This patch one only resolves the main issue
>> >> of calling `skb_zcopy_set()` for every skb to avoid leaking pages, etc.
>> >
>> >the patch is upstream now, right? So pretty much have to be patches on
>> >top.
>>
>> If those are actual issues, then yes. TBH, I didn’t look into that
>> aspect and left it as it was before. We should take a closer look at how
>> MSG_ZEROCOPY is handled.
>>
>> David, if you think it needs fixing and you have time, feel free to send
>> patches on top.
>
>I'm not fully sure how it all works.
Same here, so I pinged Arseniy who worked on that, since it seemed
deliberate to have `can_zcopy` (and set `uarg->zerocopy` accordingly)
only when it was supported by the transport.
>Especially the paths where msg->msg_ubuf is non-NULL, I suspect it should
>be added to all the skb even if the ZEROCOPY flag isn't set.
>I was just reading the one function.
>But there did look like some very dodgy conditionals.
I see, let's wait for Arseniy's feedback; otherwise, I'll try to fix it
next week. As mentioned, this issue existed before this patch, so it
shouldn't be a regression.
Thanks,
Stefano
^ permalink raw reply [flat|nested] 13+ messages in thread[parent not found: <20260519053951.1C60440015@mx4.sberdevices.ru>]
* Re: [PATCH net] vsock/virtio: fix zerocopy completion for multi-skb sends
[not found] ` <20260519053951.1C60440015@mx4.sberdevices.ru>
@ 2026-05-19 6:37 ` Arseniy Krasnov
2026-05-19 9:49 ` Stefano Garzarella
0 siblings, 1 reply; 13+ messages in thread
From: Arseniy Krasnov @ 2026-05-19 6:37 UTC (permalink / raw)
To: Stefano Garzarella, David Laight
Cc: Michael S. Tsirkin, netdev, Jakub Kicinski, Paolo Abeni,
Simon Horman, Stefan Hajnoczi, kvm, Eric Dumazet,
Eugenio Pérez, Xuan Zhuo, virtualization, David S. Miller,
Jason Wang, linux-kernel, Maher Azzouzi
On 18/05/2026 14:08, Stefano Garzarella wrote:
> On Mon, May 18, 2026 at 11:50:05AM +0100, David Laight wrote:
>> On Mon, 18 May 2026 11:54:19 +0200
>> Stefano Garzarella <sgarzare@redhat.com> wrote:
>>
>>> On Mon, May 18, 2026 at 05:33:08AM -0400, Michael S. Tsirkin wrote:
>>> >On Mon, May 18, 2026 at 11:18:24AM +0200, Stefano Garzarella wrote:
>>> >> On Sat, May 16, 2026 at 12:53:29PM +0100, David Laight wrote:
>>> >> > On Thu, 14 May 2026 11:29:48 +0200
>>> >> > Stefano Garzarella <sgarzare@redhat.com> wrote:
>>> >> >
>>> >> > > From: Stefano Garzarella <sgarzare@redhat.com>
>>> >> > >
>>> >> > > When a large message is fragmented into multiple skbs, the zerocopy
>>> >> > > uarg is only allocated and attached to the last skb in the loop.
>>> >> > > Non-final skbs carry pinned user pages with no completion tracking,
>>> >> > > so the kernel has no way to notify userspace when those pages are safe
>>> >> > > to reuse. If the loop breaks early the uarg is never allocated at all,
>>> >> > > leaking pinned pages with no completion notification.
>>> >> > >
>>> >> > > Fix this by following the approach used by TCP: allocate the zerocopy
>>> >> > > uarg (if not provided by the caller) before the send loop and attach
>>> >> > > it to every skb via skb_zcopy_set(), which takes a reference per skb.
>>> >> > > Each skb's completion properly decrements the refcount, and the
>>> >> > > notification only fires after the last skb is freed.
>>> >> > > On failure, if no data was sent, the uarg is cleanly aborted via
>>> >> > > net_zcopy_put_abort().
>>> >> > >
>>> >> > > This issue was initially discovered by sashiko while reviewing commit
>>> >> > > 1cb36e252211 ("vsock/virtio: fix MSG_ZEROCOPY pinned-pages accounting")
>>> >> > > but was pre-existing.
>>> >> > >
>>> >> > > Fixes: 581512a6dc93 ("vsock/virtio: MSG_ZEROCOPY flag support")
>>> >> > > Cc: Arseniy Krasnov <avkrasnov@salutedevices.com>
>>> >> > > Closes: https://sashiko.dev/#/patchset/20260420132051.217589-1-sgarzare%40redhat.com
>>> >> > > Reported-by: Maher Azzouzi <maherazz04@gmail.com>
>>> >> > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>>> >> > > ---
>>> >> > > net/vmw_vsock/virtio_transport_common.c | 83 ++++++++++---------------
>>> >> > > 1 file changed, 34 insertions(+), 49 deletions(-)
>>> >> > >
>>> >> > > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>>> >> > > index 989cc252d3d3..1e3409d28164 100644
>>> >> > > --- a/net/vmw_vsock/virtio_transport_common.c
>>> >> > > +++ b/net/vmw_vsock/virtio_transport_common.c
>>> >> > > @@ -70,34 +70,6 @@ static bool virtio_transport_can_zcopy(const struct virtio_transport *t_ops,
>>> >> > > return true;
>>> >> > > }
>>> >> > >
>>> >> > > -static int virtio_transport_init_zcopy_skb(struct vsock_sock *vsk,
>>> >> > > - struct sk_buff *skb,
>>> >> > > - struct msghdr *msg,
>>> >> > > - size_t pkt_len,
>>> >> > > - bool zerocopy)
>>> >> > > -{
>>> >> > > - struct ubuf_info *uarg;
>>> >> > > -
>>> >> > > - if (msg->msg_ubuf) {
>>> >> > > - uarg = msg->msg_ubuf;
>>> >> > > - net_zcopy_get(uarg);
>>> >> > > - } else {
>>> >> > > - struct ubuf_info_msgzc *uarg_zc;
>>> >> > > -
>>> >> > > - uarg = msg_zerocopy_realloc(sk_vsock(vsk),
>>> >> > > - pkt_len, NULL, false);
>>> >> > > - if (!uarg)
>>> >> > > - return -1;
>>> >> > > -
>>> >> > > - uarg_zc = uarg_to_msgzc(uarg);
>>> >> > > - uarg_zc->zerocopy = zerocopy ? 1 : 0;
>>> >> > > - }
>>> >> > > -
>>> >> > > - skb_zcopy_init(skb, uarg);
>>> >> > > -
>>> >> > > - return 0;
>>> >> > > -}
>>> >> > > -
>>> >> > > static int virtio_transport_fill_skb(struct sk_buff *skb,
>>> >> > > struct virtio_vsock_pkt_info *info,
>>> >> > > size_t len,
>>> >> > > @@ -317,8 +289,10 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>>> >> > > u32 src_cid, src_port, dst_cid, dst_port;
>>> >> > > const struct virtio_transport *t_ops;
>>> >> > > struct virtio_vsock_sock *vvs;
>>> >> > > + struct ubuf_info *uarg = NULL;
>>> >> > > u32 pkt_len = info->pkt_len;
>>> >> > > bool can_zcopy = false;
>>> >> > > + bool have_uref = false;
>>> >> > > u32 rest_len;
>>> >> > > int ret;
>>> >> > >
>>> >> > > @@ -360,6 +334,25 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
>>> >> > > if (can_zcopy)
>>> >> > > max_skb_len = min_t(u32, VIRTIO_VSOCK_MAX_PKT_BUF_SIZE,
>>> >> > > (MAX_SKB_FRAGS * PAGE_SIZE));
>>> >> > > +
>>> >> > > + if (info->msg->msg_flags & MSG_ZEROCOPY &&
>>> >> > > + info->op == VIRTIO_VSOCK_OP_RW) {
>>> >> > > + uarg = info->msg->msg_ubuf;
>>> >> > > +
>>> >> > > + if (!uarg) {
>>> >> > > + uarg = msg_zerocopy_realloc(sk_vsock(vsk),
>>> >> > > + pkt_len, NULL, false);
>>> >> > > + if (!uarg) {
>>> >> > > + virtio_transport_put_credit(vvs, pkt_len);
>>> >> > > + return -ENOMEM;
>>> >> > > + }
>>> >> > > +
>>> >> > > + if (!can_zcopy)
>>> >> > > + uarg_to_msgzc(uarg)->zerocopy = 0;
>>> >> > > +
>>> >> > > + have_uref = true;
>>> >> > > + }
>>> >> > > + }
>>> >> >
Hi guys! Just some replies after quick look:
>>> >> > Surely that block should only be done if can_zcopy is true?
I guess no, since TCP also allocates uarg even if zerocopy is impossible -
it just sets uarg_to_msgzc(uarg)->zerocopy = 0;
>>> >> > And shouldn't something unset it if info->op != VIRTIO_VSOCK_OP_RW ?
Hm, we can't enter block 'if (info->msg) {' when 'info->op' is not equal to 'VIRTIO_VSOCK_OP_RW',
because 'msg' is not NULL only for 'VIRTIO_VSOCK_OP_RW'. But anyway You point to right thing - check for
'VIRTIO_VSOCK_OP_RW' could be removed here. Just with comment why.
>>> >> > If the msg_zerocopy_realloc() fails then can't you just set can_zcopy to false.
Here I guess it is better to follow TCP way to make same behaviour, because exact
logic for MSG_ZEROCOPY is not documented. TCP also returns error and stops tx loop.
>>> >> >
>>> >> > It info->msg->msg_buf is already set then I think you have to disable zero-copy.
>>> >> > The caller has already requested a callback - and you can't add another.
In TCP implementation if 'msg_ubuf' is set they just use it and check for zerocopy in
the same way as 'msg_ubuf' is NULL.
>>> >> >
>>> >> > In any case by the end of this can_zcopy and have_uref are really the same flag.
>>> >>
Need to check it more. But 'can_zcopy' means that we fill frags in skb, have_uref means that we
allocated completion (but it could be reported with not set 'SO_EE_CODE_ZEROCOPY_COPIED' if
'can_zcopy' was false).
@Stefano, I guess current implementation differs from TCP in two cases (at least from first
view):
1) When 'msg_ubuf' is set: in TCP, already set 'msg_ubuf' is passed to 'skb_zerocopy_iter_stream()'
(if zerocopy is possible) where it is used as in vsock in call 'skb_zcopy_set()'. In vsock case if
'msg_ubuf' is not NULL we will pass just NULL to 'skb_zcopy_set()'. Yes this is will be
no-effect call today (due to checks in 'skb_zcopy_set()'), but anyway - in future may be not.
2) Also i see that 'skb_zerocopy_iter_stream()' in TCP version has some extra checks which are
missed in vsock - we only just call '__zerocopy_sg_from_iter()' to fill skb in zerocopy way.
But, I think, instead of trying to compare vsock and TCP versions best way is to just copy
current TCP flow as close as possible:
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/tree/net/ipv4/tcp.c#n1144
1) Use same flow of checks for 'msg_flags', 'msg_ubuf', SOCK_ZEROCOPY etc.
2) To copy data we can use 'skb_zerocopy_iter_stream()'.
3) The only thing that we don't need in vsock is dev mem related code from TCP implementation.
I can take this task, but pls need some time - may be two/three weeks due to another tasks.
What do You think?
Thanks
>>> >> I kept the same approach we had before, trying to make as few changes as
>>> >> possible.
>>> >>
>>> >> All these potential issues seem to be pre-existing and should be eventually
>>> >> addressed in other patches IMHO. This patch one only resolves the main issue
>>> >> of calling `skb_zcopy_set()` for every skb to avoid leaking pages, etc.
>>> >
>>> >the patch is upstream now, right? So pretty much have to be patches on
>>> >top.
>>>
>>> If those are actual issues, then yes. TBH, I didn’t look into that
>>> aspect and left it as it was before. We should take a closer look at how
>>> MSG_ZEROCOPY is handled.
>>>
>>> David, if you think it needs fixing and you have time, feel free to send
>>> patches on top.
>>
>> I'm not fully sure how it all works.
>
> Same here, so I pinged Arseniy who worked on that, since it seemed deliberate to have `can_zcopy` (and set `uarg->zerocopy` accordingly) only when it was supported by the transport.
>
>> Especially the paths where msg->msg_ubuf is non-NULL, I suspect it should
>> be added to all the skb even if the ZEROCOPY flag isn't set.
>> I was just reading the one function.
>> But there did look like some very dodgy conditionals.
>
> I see, let's wait for Arseniy's feedback; otherwise, I'll try to fix it next week. As mentioned, this issue existed before this patch, so it shouldn't be a regression.
>
> Thanks,
> Stefano
>
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH net] vsock/virtio: fix zerocopy completion for multi-skb sends
2026-05-19 6:37 ` Arseniy Krasnov
@ 2026-05-19 9:49 ` Stefano Garzarella
2026-05-19 10:40 ` Arseniy Krasnov
0 siblings, 1 reply; 13+ messages in thread
From: Stefano Garzarella @ 2026-05-19 9:49 UTC (permalink / raw)
To: Arseniy Krasnov
Cc: David Laight, Michael S. Tsirkin, netdev, Jakub Kicinski,
Paolo Abeni, Simon Horman, Stefan Hajnoczi, kvm, Eric Dumazet,
Eugenio Pérez, Xuan Zhuo, virtualization, David S. Miller,
Jason Wang, linux-kernel, Maher Azzouzi
On Tue, May 19, 2026 at 09:37:23AM +0300, Arseniy Krasnov wrote:
>On 18/05/2026 14:08, Stefano Garzarella wrote:
>> On Mon, May 18, 2026 at 11:50:05AM +0100, David Laight wrote:
>>> On Mon, 18 May 2026 11:54:19 +0200
>>> Stefano Garzarella <sgarzare@redhat.com> wrote:
[...]
>
>Hi guys! Just some replies after quick look:
>
>>>> >> > Surely that block should only be done if can_zcopy is true?
>
>I guess no, since TCP also allocates uarg even if zerocopy is impossible -
>it just sets uarg_to_msgzc(uarg)->zerocopy = 0;
>
>>>> >> > And shouldn't something unset it if info->op != VIRTIO_VSOCK_OP_RW ?
>
>Hm, we can't enter block 'if (info->msg) {' when 'info->op' is not equal to 'VIRTIO_VSOCK_OP_RW',
>because 'msg' is not NULL only for 'VIRTIO_VSOCK_OP_RW'. But anyway You point to right thing - check for
>'VIRTIO_VSOCK_OP_RW' could be removed here. Just with comment why.
>
>>>> >> > If the msg_zerocopy_realloc() fails then can't you just set can_zcopy to false.
>
>Here I guess it is better to follow TCP way to make same behaviour, because exact
>logic for MSG_ZEROCOPY is not documented. TCP also returns error and stops tx loop.
>
>>>> >> >
>>>> >> > It info->msg->msg_buf is already set then I think you have to disable zero-copy.
>>>> >> > The caller has already requested a callback - and you can't add another.
>
>In TCP implementation if 'msg_ubuf' is set they just use it and check for zerocopy in
>the same way as 'msg_ubuf' is NULL.
>
>>>> >> >
>>>> >> > In any case by the end of this can_zcopy and have_uref are really the same flag.
>>>> >>
>
>Need to check it more. But 'can_zcopy' means that we fill frags in skb, have_uref means that we
>allocated completion (but it could be reported with not set 'SO_EE_CODE_ZEROCOPY_COPIED' if
>'can_zcopy' was false).
>
>
>@Stefano, I guess current implementation differs from TCP in two cases (at least from first
>view):
>1) When 'msg_ubuf' is set: in TCP, already set 'msg_ubuf' is passed to 'skb_zerocopy_iter_stream()'
> (if zerocopy is possible) where it is used as in vsock in call 'skb_zcopy_set()'. In vsock case if
> 'msg_ubuf' is not NULL we will pass just NULL to 'skb_zcopy_set()'. Yes this is will be
> no-effect call today (due to checks in 'skb_zcopy_set()'), but anyway - in future may be not.
>2) Also i see that 'skb_zerocopy_iter_stream()' in TCP version has some
>extra checks which are
> missed in vsock - we only just call '__zerocopy_sg_from_iter()' to fill skb in zerocopy way.
>
>But, I think, instead of trying to compare vsock and TCP versions best way is to just copy
>current TCP flow as close as possible:
>https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/tree/net/ipv4/tcp.c#n1144
>1) Use same flow of checks for 'msg_flags', 'msg_ubuf', SOCK_ZEROCOPY etc.
>2) To copy data we can use 'skb_zerocopy_iter_stream()'.
>3) The only thing that we don't need in vsock is dev mem related code from TCP implementation.
>
>I can take this task, but pls need some time - may be two/three weeks due to another tasks.
>
>What do You think?
If you can work on it, please go head. They seems all pre-existing, so
2/3 weeks are fine. Please let me know if you can't and I'll try to
allocate some time.
Thanks for the help!
Stefano
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH net] vsock/virtio: fix zerocopy completion for multi-skb sends
2026-05-19 9:49 ` Stefano Garzarella
@ 2026-05-19 10:40 ` Arseniy Krasnov
0 siblings, 0 replies; 13+ messages in thread
From: Arseniy Krasnov @ 2026-05-19 10:40 UTC (permalink / raw)
To: Stefano Garzarella
Cc: David Laight, Michael S. Tsirkin, netdev, Jakub Kicinski,
Paolo Abeni, Simon Horman, Stefan Hajnoczi, kvm, Eric Dumazet,
Eugenio Pérez, Xuan Zhuo, virtualization, David S. Miller,
Jason Wang, linux-kernel, Maher Azzouzi
On 19/05/2026 12:49, Stefano Garzarella wrote:
> On Tue, May 19, 2026 at 09:37:23AM +0300, Arseniy Krasnov wrote:
>> On 18/05/2026 14:08, Stefano Garzarella wrote:
>>> On Mon, May 18, 2026 at 11:50:05AM +0100, David Laight wrote:
>>>> On Mon, 18 May 2026 11:54:19 +0200
>>>> Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> [...]
>
>>
>> Hi guys! Just some replies after quick look:
>>
>>>>> >> > Surely that block should only be done if can_zcopy is true?
>>
>> I guess no, since TCP also allocates uarg even if zerocopy is impossible -
>> it just sets uarg_to_msgzc(uarg)->zerocopy = 0;
>>
>>>>> >> > And shouldn't something unset it if info->op != VIRTIO_VSOCK_OP_RW ?
>>
>> Hm, we can't enter block 'if (info->msg) {' when 'info->op' is not equal to 'VIRTIO_VSOCK_OP_RW',
>> because 'msg' is not NULL only for 'VIRTIO_VSOCK_OP_RW'. But anyway You point to right thing - check for
>> 'VIRTIO_VSOCK_OP_RW' could be removed here. Just with comment why.
>>
>>>>> >> > If the msg_zerocopy_realloc() fails then can't you just set can_zcopy to false.
>>
>> Here I guess it is better to follow TCP way to make same behaviour, because exact
>> logic for MSG_ZEROCOPY is not documented. TCP also returns error and stops tx loop.
>>
>>>>> >> >
>>>>> >> > It info->msg->msg_buf is already set then I think you have to disable zero-copy.
>>>>> >> > The caller has already requested a callback - and you can't add another.
>>
>> In TCP implementation if 'msg_ubuf' is set they just use it and check for zerocopy in
>> the same way as 'msg_ubuf' is NULL.
>>
>>>>> >> >
>>>>> >> > In any case by the end of this can_zcopy and have_uref are really the same flag.
>>>>> >>
>>
>> Need to check it more. But 'can_zcopy' means that we fill frags in skb, have_uref means that we
>> allocated completion (but it could be reported with not set 'SO_EE_CODE_ZEROCOPY_COPIED' if
>> 'can_zcopy' was false).
>>
>>
>> @Stefano, I guess current implementation differs from TCP in two cases (at least from first
>> view):
>> 1) When 'msg_ubuf' is set: in TCP, already set 'msg_ubuf' is passed to 'skb_zerocopy_iter_stream()'
>> (if zerocopy is possible) where it is used as in vsock in call 'skb_zcopy_set()'. In vsock case if
>> 'msg_ubuf' is not NULL we will pass just NULL to 'skb_zcopy_set()'. Yes this is will be
>> no-effect call today (due to checks in 'skb_zcopy_set()'), but anyway - in future may be not.
>> 2) Also i see that 'skb_zerocopy_iter_stream()' in TCP version has some extra checks which are
>> missed in vsock - we only just call '__zerocopy_sg_from_iter()' to fill skb in zerocopy way.
>>
>> But, I think, instead of trying to compare vsock and TCP versions best way is to just copy
>> current TCP flow as close as possible:
>> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/tree/net/ipv4/tcp.c#n1144
>> 1) Use same flow of checks for 'msg_flags', 'msg_ubuf', SOCK_ZEROCOPY etc.
>> 2) To copy data we can use 'skb_zerocopy_iter_stream()'.
>> 3) The only thing that we don't need in vsock is dev mem related code from TCP implementation.
>>
>> I can take this task, but pls need some time - may be two/three weeks due to another tasks.
>>
>> What do You think?
>
> If you can work on it, please go head. They seems all pre-existing, so 2/3 weeks are fine. Please let me know if you can't and I'll try to allocate some time.
No problem I'll take it. If something goes wrong or stuck - i'll let you know
Thanks
>
> Thanks for the help!
>
> Stefano
>
^ permalink raw reply [flat|nested] 13+ messages in thread