From: Stefano Garzarella <sgarzare@redhat.com>
To: Octavian Purdila <tavip@google.com>
Cc: netdev@vger.kernel.org,
syzbot+28e5f3d207b14bae122a@syzkaller.appspotmail.com,
"Stefan Hajnoczi" <stefanha@redhat.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
"Jason Wang" <jasowang@redhat.com>,
"Xuan Zhuo" <xuanzhuo@linux.alibaba.com>,
"Eugenio Pérez" <eperezma@redhat.com>,
"David S. Miller" <davem@davemloft.net>,
"Eric Dumazet" <edumazet@google.com>,
"Jakub Kicinski" <kuba@kernel.org>,
"Paolo Abeni" <pabeni@redhat.com>,
"Simon Horman" <horms@kernel.org>,
"Arseniy Krasnov" <avkrasnov@salutedevices.com>,
kvm@vger.kernel.org, virtualization@lists.linux.dev,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH net] vsock/virtio: restore msg_iter on transmission failure
Date: Tue, 9 Jun 2026 10:48:49 +0200 [thread overview]
Message-ID: <aifL0f_QO1kocccU@sgarzare-redhat> (raw)
In-Reply-To: <20260609004809.1285028-1-tavip@google.com>
On Tue, Jun 09, 2026 at 12:48:05AM +0000, Octavian Purdila wrote:
>When transmission fails in virtio_transport_send_pkt_info, the msg_iter
>might have been partially advanced. If we don't restore it, the next
>attempt to send data will use an incorrect iterator state, leading to
>desync and warnings like "send_pkt() returns 0, but X expected".
Thanks for the fix! I have some comments.
>
>Specifically, this can happen in the following scenario, triggered by
>the syzkaller repro:
>
>1. A write-only VMA (PROT_WRITE only) is partially populated by a
> prior TUN write that failed with -EIO but still faulted in some
> pages).
>2. A vsock sendmmsg call with MSG_ZEROCOPY requests transmission of a
> buffer from this VMA.
>3. The first packet (64KB) is sent successfully because the pages are
> populated.
>4. The second packet allocation fails because GUP fast pins the first page
> but GUP slow fails on the next unpopulated page due to PROT_WRITE-only
> permissions.
>5. The iterator is advanced by the partially successful GUP (68KB total
> advanced: 64KB from first packet + 4KB from second), but the send loop
> breaks and only reports 64KB sent. This creates a 4KB desync.
>6. The next retry starts with a non-zero iov_offset, disabling zerocopy
> and falling back to copy mode.
>7. In copy mode, the transmission succeeds for the next packets but
> exhausts the iterator early because of the desync.
>8. The final retry sees an empty iterator but zerocopy is re-enabled
> (offset resets). It attempts to send the remaining bytes with zerocopy
> but pins 0 pages, creating an empty packet.
>9. The transport sends the empty packet, triggering the warning because
> the returned bytes (header only) do not match the expected payload size.
>10. The loop continues to spin, allocating ubuf_info each time, eventually
> exhausting sysctl_optmem_max and returning -ENOMEM to userspace.
>
>Restore msg_iter to its original state before the packet allocation
>and transmission attempt if they fail.
>
>Fixes: e0718bd82e27 ("vsock: enable setting SO_ZEROCOPY")
>Reported-by: syzbot+28e5f3d207b14bae122a@syzkaller.appspotmail.com
>Closes: https://syzkaller.appspot.com/bug?extid=28e5f3d207b14bae122a
>Assisted-by: gemini:gemini-3.1-pro
>Signed-off-by: Octavian Purdila <tavip@google.com>
>---
> net/vmw_vsock/virtio_transport_common.c | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
>
>diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
>index b10666937c490..588623a3e2bbc 100644
>--- a/net/vmw_vsock/virtio_transport_common.c
>+++ b/net/vmw_vsock/virtio_transport_common.c
>@@ -367,6 +367,10 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> do {
> struct sk_buff *skb;
> size_t skb_len;
>+ struct iov_iter saved_iter;
trivial: reverse xmas tree:
https://docs.kernel.org/process/maintainer-netdev.html#local-variable-ordering-reverse-xmas-tree-rcs
>+
>+ if (info->msg)
>+ saved_iter = info->msg->msg_iter;
What about using iov_iter_save_state()/iov_iter_restore() ?
IIUC we may need to export iov_iter_restore(), so not a strong opinion,
but it looks better to use those API IMHO.
>
> skb_len = min(max_skb_len, rest_len);
>
>@@ -375,6 +379,8 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> src_cid, src_port,
> dst_cid, dst_port);
What about adding a comment on top of virtio_transport_alloc_skb() call
(or when we save the state) to explain that in specific cases it can
advance the msg_iter ?
> if (!skb) {
>+ if (info->msg)
>+ info->msg->msg_iter = saved_iter;
> ret = -ENOMEM;
> break;
> }
>@@ -382,8 +388,11 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
> virtio_transport_inc_tx_pkt(vvs, skb);
>
> ret = t_ops->send_pkt(skb, info->net);
>- if (ret < 0)
>+ if (ret < 0) {
>+ if (info->msg)
>+ info->msg->msg_iter = saved_iter;
Also, what about having a single restore point after the loop?
I mean something like this (untested):
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index b10666937c49..2f3c6c82c155 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -295,6 +295,7 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
u32 max_skb_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE;
u32 src_cid, src_port, dst_cid, dst_port;
const struct virtio_transport *t_ops;
+ struct iov_iter_state msg_iter_state;
struct virtio_vsock_sock *vvs;
struct ubuf_info *uarg = NULL;
u32 pkt_len = info->pkt_len;
@@ -368,6 +369,9 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
struct sk_buff *skb;
size_t skb_len;
+ if (info->msg)
+ iov_iter_save_state(&info->msg->msg_iter, &msg_iter_state);
+
skb_len = min(max_skb_len, rest_len);
skb = virtio_transport_alloc_skb(info, skb_len, can_zcopy,
@@ -399,6 +403,9 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
break;
} while (rest_len);
+ if (info->msg && ret < 0)
+ iov_iter_restore(&info->msg->msg_iter, &msg_iter_state);
+
virtio_transport_put_credit(vvs, rest_len);
/* msg_zerocopy_realloc() initializes the ubuf_info refcnt to 1.
Thanks,
Stefano
> break;
>+ }
>
> /* Both virtio and vhost 'send_pkt()' returns 'skb_len',
> * but for reliability use 'ret' instead of 'skb_len'.
>--
>2.54.0.1064.gd145956f57-goog
>
next prev parent reply other threads:[~2026-06-09 8:48 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-09 0:48 [PATCH net] vsock/virtio: restore msg_iter on transmission failure Octavian Purdila
2026-06-09 8:48 ` Stefano Garzarella [this message]
2026-06-09 17:58 ` Octavian Purdila
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aifL0f_QO1kocccU@sgarzare-redhat \
--to=sgarzare@redhat.com \
--cc=avkrasnov@salutedevices.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=eperezma@redhat.com \
--cc=horms@kernel.org \
--cc=jasowang@redhat.com \
--cc=kuba@kernel.org \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mst@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=stefanha@redhat.com \
--cc=syzbot+28e5f3d207b14bae122a@syzkaller.appspotmail.com \
--cc=tavip@google.com \
--cc=virtualization@lists.linux.dev \
--cc=xuanzhuo@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox