Netdev List
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Stefano Garzarella <sgarzare@redhat.com>
Cc: netdev@vger.kernel.org, "Eric Dumazet" <edumazet@google.com>,
	"Stefan Hajnoczi" <stefanha@redhat.com>,
	virtualization@lists.linux.dev,
	"David S. Miller" <davem@davemloft.net>,
	"Jason Wang" <jasowang@redhat.com>,
	"Simon Horman" <horms@kernel.org>,
	linux-kernel@vger.kernel.org, "Paolo Abeni" <pabeni@redhat.com>,
	"Xuan Zhuo" <xuanzhuo@linux.alibaba.com>,
	kvm@vger.kernel.org, "Jakub Kicinski" <kuba@kernel.org>,
	"Eugenio Pérez" <eperezma@redhat.com>
Subject: Re: [PATCH net] vsock/virtio: fix skb overhead accounting to preserve full buf_alloc
Date: Mon, 11 May 2026 08:46:20 -0400	[thread overview]
Message-ID: <20260511084551-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <CAGxU2F65pw0P1uKOOJpBc4p6KjCz8iePEh9aPybUyECBm1=otg@mail.gmail.com>

On Mon, May 11, 2026 at 12:54:44PM +0200, Stefano Garzarella wrote:
> On Fri, 8 May 2026 at 12:38, Stefano Garzarella <sgarzare@redhat.com> 
> wrote:
> >
> > On Fri, May 08, 2026 at 06:33:13AM -0400, Michael S. Tsirkin wrote:
> > >On Fri, May 08, 2026 at 12:01:50PM +0200, Stefano Garzarella wrote:
> > >> On Fri, May 08, 2026 at 05:53:07AM -0400, Michael S. Tsirkin wrote:
> > >> > On Fri, May 08, 2026 at 11:23:30AM +0200, Stefano Garzarella wrote:
> > >> > > From: Stefano Garzarella <sgarzare@redhat.com>
> > >> > >
> > >> > > After commit 059b7dbd20a6 ("vsock/virtio: fix potential unbounded skb
> > >> > > queue"), virtio_transport_inc_rx_pkt() subtracts per-skb overhead from
> > >> > > buf_alloc when checking whether a new packet fits. This reduces the
> > >> > > effective receive buffer below what the user configured via
> > >> > > SO_VM_SOCKETS_BUFFER_SIZE, causing legitimate data packets to be
> > >> > > silently dropped and applications that rely on the full buffer size
> > >> > > to deadlock.
> > >> > >
> > >> > > Also, the reduced space is not communicated to the remote peer, so
> > >> > > its credit calculation accounts more credit than the receiver will
> > >> > > actually accept, causing data loss (there is no retransmission).
> > >> > >
> > >> > > This also causes failures in tools/testing/vsock/vsock_test.c.
> > >> > > Test 18 sometimes fails, while test 22 always fails in this way:
> > >> > >     18 - SOCK_STREAM MSG_ZEROCOPY...hash mismatch
> > >> > >
> > >> > >     22 - SOCK_STREAM virtio credit update + SO_RCVLOWAT...send failed:
> > >> > >     Resource temporarily unavailable
> > >> > >
> > >> > > Fix this by introducing virtio_transport_rx_buf_size() to calculate the
> > >> > > size of the RX buffer based on the overhead. Using it in the acceptance
> > >> > > check, the advertised buf_alloc, and the credit update decision.
> > >> > > Use buf_alloc * 2 as total budget (payload + overhead), similar to how
> > >> > > SO_RCVBUF is doubled to reserve space for sk_buff metadata.
> > >> > > The function returns buf_alloc as long as overhead fits within the
> > >> > > reservation, then gradually reduces toward 0 as overhead exceeds
> > >> > > buf_alloc (e.g. under small-packet flooding), informing the peer to
> > >> > > slow down.
> > >> > >
> > >> > > Fixes: 059b7dbd20a6 ("vsock/virtio: fix potential unbounded skb queue")
> > >> > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > >> >
> > >> >
> > >> > unfortunately, this is a bit of a spec violation and there is no guarantee
> > >> > it helps.
> > >>
> > >> Loosing data like we are doing in 059b7dbd20a6 is even worse IMHO.
> > >>
> > >> >
> > >> > a spec violation because the spec says:
> > >> > Only payload bytes are counted and header bytes are not
> > >> > included
> > >> >
> > >> > and the implication is that a side can not reduce its own buf_alloc.
> > >> >
> > >> > no guarantee because the other side is not required to process your
> > >> > packets, so it might not see your buf alloc reduction.
> > >> >
> > >> > as designed in the current spec, you can only increase your buf alloc,
> > >> > not decrease it.
> > >>
> > >> We never enforced this, currently an user can reduce it by
> > >> SO_VM_SOCKETS_BUFFER_SIZE and we haven't blocked it since virtio-vsock was
> > >> introduced, should we update the spec?
> > >
> > >
> > >it's not that we need to enforce it, it's that all synchronization
> > >assumes this. as in, anyone can use an old copy until they run out
> > >of credits.
> > >
> > >
> > >> >
> > >> > what can be done:
> > >> > - more efficient storage for small packets (poc i posted)
> > >> > - reduce buf alloc ahead of the time
> > >>
> > >> That's basically what I'm doing here: I'm using twice the size of
> > >> `buf_alloc` (just like `SO_RCVBUF` does for other socket types) and telling
> > >> the other peer just `buf_alloc`.
> > >>
> > >> But then, somehow, we have to let the other person know that we're running
> > >> out of space. With this patch that only happens when the other peer isn't
> > >> behaving properly, sending so many small packets that the overhead exceeds
> > >> `buf_alloc`.
> > >>
> > >> Stefano
> > >
> > >what is "not proper" here, it is up to the application what to send.
> >
> > Sure, but here we're just slowing down the application by telling it we
> > don't have any more space.
> >
> > Again, without this patch we are just dropping data, which IMO is even
> > worse.
> >
> > So I think we should merge this for now, while we handle better the EOM.
> > If you prefer, I can drop the part where we reduce the buf_alloc
> > advertised to the other peer, but at least we should drop data after
> > `buf_alloc * 2` IMO.
> 
> Okay, I thought it over over the weekend, and I agree that this patch 
> still doesn't solve the problem and would still result in packet loss.
> So, until we resolve the issue permanently, and since 059b7dbd20a6 is 
> coming to stable, I'd like to rework this patch so that we only start 
> dropping packets when the overhead plus the queued bytes exceeds 
> `buf_alloc * 2`.
> Removing the other changes to reduce the buf_alloc advertised, but 
> terminating the connection so that both peers are aware that something 
> went wrong.
> 
> Any objections?
> 
> Stefano

Let's try to first fix it upstream properly please. Discuss backporting
later.

-- 
MST


  reply	other threads:[~2026-05-11 12:46 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-08  9:23 [PATCH net] vsock/virtio: fix skb overhead accounting to preserve full buf_alloc Stefano Garzarella
2026-05-08  9:53 ` Michael S. Tsirkin
2026-05-08 10:01   ` Stefano Garzarella
2026-05-08 10:33     ` Michael S. Tsirkin
2026-05-08 10:38       ` Stefano Garzarella
2026-05-11 10:54         ` Stefano Garzarella
2026-05-11 12:46           ` Michael S. Tsirkin [this message]
2026-05-11 13:17             ` Stefano Garzarella
2026-05-11 23:48               ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260511084551-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=eperezma@redhat.com \
    --cc=horms@kernel.org \
    --cc=jasowang@redhat.com \
    --cc=kuba@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=sgarzare@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=virtualization@lists.linux.dev \
    --cc=xuanzhuo@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox