From: "Michael S. Tsirkin" <mst@redhat.com>
To: Stefano Garzarella <sgarzare@redhat.com>
Cc: "David Laight" <david.laight.linux@gmail.com>,
netdev@vger.kernel.org, "Jakub Kicinski" <kuba@kernel.org>,
"Paolo Abeni" <pabeni@redhat.com>,
"Simon Horman" <horms@kernel.org>,
"Arseniy Krasnov" <avkrasnov@salutedevices.com>,
"Stefan Hajnoczi" <stefanha@redhat.com>,
kvm@vger.kernel.org, "Eric Dumazet" <edumazet@google.com>,
"Eugenio Pérez" <eperezma@redhat.com>,
"Xuan Zhuo" <xuanzhuo@linux.alibaba.com>,
virtualization@lists.linux.dev,
"David S. Miller" <davem@davemloft.net>,
"Jason Wang" <jasowang@redhat.com>,
linux-kernel@vger.kernel.org,
"Maher Azzouzi" <maherazz04@gmail.com>
Subject: Re: [PATCH net] vsock/virtio: fix zerocopy completion for multi-skb sends
Date: Mon, 18 May 2026 05:33:08 -0400 [thread overview]
Message-ID: <20260518053223-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <agrXvCtNRlySlwgx@sgarzare-redhat>
On Mon, May 18, 2026 at 11:18:24AM +0200, Stefano Garzarella wrote:
> On Sat, May 16, 2026 at 12:53:29PM +0100, David Laight wrote:
> > On Thu, 14 May 2026 11:29:48 +0200
> > Stefano Garzarella <sgarzare@redhat.com> wrote:
> >
> > > From: Stefano Garzarella <sgarzare@redhat.com>
> > >
> > > When a large message is fragmented into multiple skbs, the zerocopy
> > > uarg is only allocated and attached to the last skb in the loop.
> > > Non-final skbs carry pinned user pages with no completion tracking,
> > > so the kernel has no way to notify userspace when those pages are safe
> > > to reuse. If the loop breaks early the uarg is never allocated at all,
> > > leaking pinned pages with no completion notification.
> > >
> > > Fix this by following the approach used by TCP: allocate the zerocopy
> > > uarg (if not provided by the caller) before the send loop and attach
> > > it to every skb via skb_zcopy_set(), which takes a reference per skb.
> > > Each skb's completion properly decrements the refcount, and the
> > > notification only fires after the last skb is freed.
> > > On failure, if no data was sent, the uarg is cleanly aborted via
> > > net_zcopy_put_abort().
> > >
> > > This issue was initially discovered by sashiko while reviewing commit
> > > 1cb36e252211 ("vsock/virtio: fix MSG_ZEROCOPY pinned-pages accounting")
> > > but was pre-existing.
> > >
> > > Fixes: 581512a6dc93 ("vsock/virtio: MSG_ZEROCOPY flag support")
> > > Cc: Arseniy Krasnov <avkrasnov@salutedevices.com>
> > > Closes: https://sashiko.dev/#/patchset/20260420132051.217589-1-sgarzare%40redhat.com
> > > Reported-by: Maher Azzouzi <maherazz04@gmail.com>
> > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > > ---
> > > net/vmw_vsock/virtio_transport_common.c | 83 ++++++++++---------------
> > > 1 file changed, 34 insertions(+), 49 deletions(-)
> > >
> > > diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
> > > index 989cc252d3d3..1e3409d28164 100644
> > > --- a/net/vmw_vsock/virtio_transport_common.c
> > > +++ b/net/vmw_vsock/virtio_transport_common.c
> > > @@ -70,34 +70,6 @@ static bool virtio_transport_can_zcopy(const struct virtio_transport *t_ops,
> > > return true;
> > > }
> > >
> > > -static int virtio_transport_init_zcopy_skb(struct vsock_sock *vsk,
> > > - struct sk_buff *skb,
> > > - struct msghdr *msg,
> > > - size_t pkt_len,
> > > - bool zerocopy)
> > > -{
> > > - struct ubuf_info *uarg;
> > > -
> > > - if (msg->msg_ubuf) {
> > > - uarg = msg->msg_ubuf;
> > > - net_zcopy_get(uarg);
> > > - } else {
> > > - struct ubuf_info_msgzc *uarg_zc;
> > > -
> > > - uarg = msg_zerocopy_realloc(sk_vsock(vsk),
> > > - pkt_len, NULL, false);
> > > - if (!uarg)
> > > - return -1;
> > > -
> > > - uarg_zc = uarg_to_msgzc(uarg);
> > > - uarg_zc->zerocopy = zerocopy ? 1 : 0;
> > > - }
> > > -
> > > - skb_zcopy_init(skb, uarg);
> > > -
> > > - return 0;
> > > -}
> > > -
> > > static int virtio_transport_fill_skb(struct sk_buff *skb,
> > > struct virtio_vsock_pkt_info *info,
> > > size_t len,
> > > @@ -317,8 +289,10 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> > > u32 src_cid, src_port, dst_cid, dst_port;
> > > const struct virtio_transport *t_ops;
> > > struct virtio_vsock_sock *vvs;
> > > + struct ubuf_info *uarg = NULL;
> > > u32 pkt_len = info->pkt_len;
> > > bool can_zcopy = false;
> > > + bool have_uref = false;
> > > u32 rest_len;
> > > int ret;
> > >
> > > @@ -360,6 +334,25 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> > > if (can_zcopy)
> > > max_skb_len = min_t(u32, VIRTIO_VSOCK_MAX_PKT_BUF_SIZE,
> > > (MAX_SKB_FRAGS * PAGE_SIZE));
> > > +
> > > + if (info->msg->msg_flags & MSG_ZEROCOPY &&
> > > + info->op == VIRTIO_VSOCK_OP_RW) {
> > > + uarg = info->msg->msg_ubuf;
> > > +
> > > + if (!uarg) {
> > > + uarg = msg_zerocopy_realloc(sk_vsock(vsk),
> > > + pkt_len, NULL, false);
> > > + if (!uarg) {
> > > + virtio_transport_put_credit(vvs, pkt_len);
> > > + return -ENOMEM;
> > > + }
> > > +
> > > + if (!can_zcopy)
> > > + uarg_to_msgzc(uarg)->zerocopy = 0;
> > > +
> > > + have_uref = true;
> > > + }
> > > + }
> >
> > Surely that block should only be done if can_zcopy is true?
> > And shouldn't something unset it if info->op != VIRTIO_VSOCK_OP_RW ?
> > If the msg_zerocopy_realloc() fails then can't you just set can_zcopy to false.
> >
> > It info->msg->msg_buf is already set then I think you have to disable zero-copy.
> > The caller has already requested a callback - and you can't add another.
> >
> > In any case by the end of this can_zcopy and have_uref are really the same flag.
>
> I kept the same approach we had before, trying to make as few changes as
> possible.
>
> All these potential issues seem to be pre-existing and should be eventually
> addressed in other patches IMHO. This patch one only resolves the main issue
> of calling `skb_zcopy_set()` for every skb to avoid leaking pages, etc.
the patch is upstream now, right? So pretty much have to be patches on
top.
> @Arseniy can you help on this?
>
> >
> > > }
> > >
> > > rest_len = pkt_len;
> > > @@ -378,27 +371,7 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> > > break;
> > > }
> > >
> > > - /* We process buffer part by part, allocating skb on
> > > - * each iteration. If this is last skb for this buffer
> > > - * and MSG_ZEROCOPY mode is in use - we must allocate
> > > - * completion for the current syscall.
> > > - *
> > > - * Pass pkt_len because msg iter is already consumed
> > > - * by virtio_transport_fill_skb(), so iter->count
> > > - * can not be used for RLIMIT_MEMLOCK pinned-pages
> > > - * accounting done by msg_zerocopy_realloc().
> > > - */
> > > - if (info->msg && info->msg->msg_flags & MSG_ZEROCOPY &&
> > > - skb_len == rest_len && info->op == VIRTIO_VSOCK_OP_RW) {
> > > - if (virtio_transport_init_zcopy_skb(vsk, skb,
> > > - info->msg,
> > > - pkt_len,
> > > - can_zcopy)) {
> > > - kfree_skb(skb);
> > > - ret = -ENOMEM;
> > > - break;
> > > - }
> > > - }
> > > + skb_zcopy_set(skb, uarg, NULL);
> > >
> > > virtio_transport_inc_tx_pkt(vvs, skb);
> > >
> > > @@ -422,6 +395,18 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> > >
> > > virtio_transport_put_credit(vvs, rest_len);
> > >
> > > + /* msg_zerocopy_realloc() initializes the ubuf_info refcnt to 1.
> > > + * skb_zcopy_set() increases it for each skb, so we can drop that
> > ^ must
> >
> > > + * initial reference to keep it balanced.
> > > + */
> > > + if (have_uref) {
> > > + if (rest_len == pkt_len)
> > > + /* No data sent, abort the notification. */
> > > + net_zcopy_put_abort(uarg, true);
> >
> > Is it worth optimising for the 'nothing sent' case ?
>
> What do you suggest doing?
>
> I followed what TCP does.
>
> Thanks,
> Stefano
>
> >
> > -- David
> >
> > > + else
> > > + net_zcopy_put(uarg);
> > > + }
> > > +
> > > /* Return number of bytes, if any data has been sent. */
> > > if (rest_len != pkt_len)
> > > ret = pkt_len - rest_len;
> >
next prev parent reply other threads:[~2026-05-18 9:33 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-14 9:29 [PATCH net] vsock/virtio: fix zerocopy completion for multi-skb sends Stefano Garzarella
2026-05-14 14:07 ` Michael S. Tsirkin
2026-05-15 17:18 ` Arseniy Krasnov
2026-05-16 0:50 ` patchwork-bot+netdevbpf
2026-05-16 11:53 ` David Laight
2026-05-18 9:18 ` Stefano Garzarella
2026-05-18 9:33 ` Michael S. Tsirkin [this message]
2026-05-18 9:54 ` Stefano Garzarella
2026-05-18 10:50 ` David Laight
2026-05-18 11:08 ` Stefano Garzarella
[not found] ` <20260519053951.1C60440015@mx4.sberdevices.ru>
2026-05-19 6:37 ` Arseniy Krasnov
2026-05-19 9:49 ` Stefano Garzarella
2026-05-19 10:40 ` Arseniy Krasnov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260518053223-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=avkrasnov@salutedevices.com \
--cc=davem@davemloft.net \
--cc=david.laight.linux@gmail.com \
--cc=edumazet@google.com \
--cc=eperezma@redhat.com \
--cc=horms@kernel.org \
--cc=jasowang@redhat.com \
--cc=kuba@kernel.org \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=maherazz04@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sgarzare@redhat.com \
--cc=stefanha@redhat.com \
--cc=virtualization@lists.linux.dev \
--cc=xuanzhuo@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox