All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Stefano Garzarella <sgarzare@redhat.com>
Cc: Alexander Graf <graf@amazon.com>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	virtualization@lists.linux.dev, kvm@vger.kernel.org,
	Asias He <asias@redhat.com>, Paolo Abeni <pabeni@redhat.com>,
	Jakub Kicinski <kuba@kernel.org>,
	Eric Dumazet <edumazet@google.com>,
	"David S . Miller" <davem@davemloft.net>,
	nh-open-source@amazon.com
Subject: Re: [PATCH v2] vsock/virtio: Remove queued_replies pushback logic
Date: Fri, 4 Apr 2025 04:37:36 -0400	[thread overview]
Message-ID: <20250404043326-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <fiyxlnv7gglcfkr7ue4tiaktqjptdkr5or6skrr6f7dof26d56@wmg3zhhqlcoj>

On Fri, Apr 04, 2025 at 10:30:43AM +0200, Stefano Garzarella wrote:
> On Fri, Apr 04, 2025 at 04:14:51AM -0400, Michael S. Tsirkin wrote:
> > On Fri, Apr 04, 2025 at 10:04:38AM +0200, Alexander Graf wrote:
> > > 
> > > On 03.04.25 14:21, Michael S. Tsirkin wrote:
> > > > On Wed, Apr 02, 2025 at 12:14:24PM -0400, Stefan Hajnoczi wrote:
> > > > > On Tue, Apr 01, 2025 at 08:13:49PM +0000, Alexander Graf wrote:
> > > > > > Ever since the introduction of the virtio vsock driver, it included
> > > > > > pushback logic that blocks it from taking any new RX packets until the
> > > > > > TX queue backlog becomes shallower than the virtqueue size.
> > > > > >
> > > > > > This logic works fine when you connect a user space application on the
> > > > > > hypervisor with a virtio-vsock target, because the guest will stop
> > > > > > receiving data until the host pulled all outstanding data from the VM.
> > > > > >
> > > > > > With Nitro Enclaves however, we connect 2 VMs directly via vsock:
> > > > > >
> > > > > >    Parent      Enclave
> > > > > >
> > > > > >      RX -------- TX
> > > > > >      TX -------- RX
> > > > > >
> > > > > > This means we now have 2 virtio-vsock backends that both have the pushback
> > > > > > logic. If the parent's TX queue runs full at the same time as the
> > > > > > Enclave's, both virtio-vsock drivers fall into the pushback path and
> > > > > > no longer accept RX traffic. However, that RX traffic is TX traffic on
> > > > > > the other side which blocks that driver from making any forward
> > > > > > progress. We're now in a deadlock.
> > > > > >
> > > > > > To resolve this, let's remove that pushback logic altogether and rely on
> > > > > > higher levels (like credits) to ensure we do not consume unbounded
> > > > > > memory.
> > > > > The reason for queued_replies is that rx packet processing may emit tx
> > > > > packets. Therefore tx virtqueue space is required in order to process
> > > > > the rx virtqueue.
> > > > >
> > > > > queued_replies puts a bound on the amount of tx packets that can be
> > > > > queued in memory so the other side cannot consume unlimited memory. Once
> > > > > that bound has been reached, rx processing stops until the other side
> > > > > frees up tx virtqueue space.
> > > > >
> > > > > It's been a while since I looked at this problem, so I don't have a
> > > > > solution ready. In fact, last time I thought about it I wondered if the
> > > > > design of virtio-vsock fundamentally suffers from deadlocks.
> > > > >
> > > > > I don't think removing queued_replies is possible without a replacement
> > > > > for the bounded memory and virtqueue exhaustion issue though. Credits
> > > > > are not a solution - they are about socket buffer space, not about
> > > > > virtqueue space, which includes control packets that are not accounted
> > > > > by socket buffer space.
> > > >
> > > > Hmm.
> > > > Actually, let's think which packets require a response.
> > > >
> > > > VIRTIO_VSOCK_OP_REQUEST
> > > > VIRTIO_VSOCK_OP_SHUTDOWN
> > > > VIRTIO_VSOCK_OP_CREDIT_REQUEST
> > > >
> > > >
> > > > the response to these always reports a state of an existing socket.
> > > > and, only one type of response is relevant for each socket.
> > > >
> > > > So here's my suggestion:
> > > > stop queueing replies on the vsock device, instead,
> > > > simply store the response on the socket, and create a list of sockets
> > > > that have replies to be transmitted
> > > >
> > > >
> > > > WDYT?
> > > 
> > > 
> > > Wouldn't that create the same problem again? The socket will eventually push
> > > back any new data that it can take because its FIFO is full. At that point,
> > > the "other side" could still have a queue full of requests on exactly that
> > > socket that need to get processed. We can now not pull those packets off the
> > > virtio queue, because we can not enqueue responses.
> > 
> > Either I don't understand what you wrote or I did not explain myself
> > clearly.
> 
> I didn't fully understand either, but with this last message of yours it's
> clear to me and I like the idea!
> 
> > 
> > In this idea there needs to be a single response enqueued
> > like this in the socket, because, no more than one ever needs to
> > be outstanding per socket.
> > 
> > For example, until VIRTIO_VSOCK_OP_REQUEST
> > is responded to, the socket is not active and does not need to
> > send anything.
> 
> One case I see is responding when we don't have the socket listening (e.g.
> the port is not open), so if before the user had a message that the port was
> not open, now instead connect() will timeout. So we could respond if we have
> space in the virtqueue, otherwise discard it without losing any important
> information or guarantee of a lossless channel.
> 
> So in summary:
> 
> - if we have an associated socket, then always respond (possibly
>   allocating memory in the intermediate queue if the virtqueue is full
>   as we already do). We need to figure out if a flood of
>   VIRTIO_VSOCK_OP_CREDIT_REQUEST would cause problems, but we can always
>   decide not to respond if we have sent this identical information
>   before.

If taking this path, need to consider not responding is within spec or not.
But again, credit update needed is just a single flag we need to set
on a socket. If we have anything we need to send, it can also update
the credits.


> - if there is no associated socket, we only respond if virtqueue has
>   space.
> 
> I like it and it seems feasible without changing anything in the
> specification.
> 
> Did I get it right?
> 
> Thanks,
> Stefano

That was the idea, yes.

-- 
MST


      reply	other threads:[~2025-04-04  8:37 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-01 20:13 [PATCH v2] vsock/virtio: Remove queued_replies pushback logic Alexander Graf
2025-04-02  9:26 ` Simon Horman
2025-04-02 13:26   ` Stefano Garzarella
2025-04-02 16:14 ` Stefan Hajnoczi
2025-04-03  8:24   ` Stefano Garzarella
2025-04-03 12:21   ` Michael S. Tsirkin
2025-04-04  8:04     ` Alexander Graf
2025-04-04  8:14       ` Michael S. Tsirkin
2025-04-04  8:30         ` Stefano Garzarella
2025-04-04  8:37           ` Michael S. Tsirkin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250404043326-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=asias@redhat.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=graf@amazon.com \
    --cc=kuba@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=nh-open-source@amazon.com \
    --cc=pabeni@redhat.com \
    --cc=sgarzare@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=virtualization@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.