From: "Michael S. Tsirkin" <mst@redhat.com>
To: Bobby Eshleman <bobbyeshleman@gmail.com>
Cc: virtio-dev@lists.oasis-open.org, cong.wang@bytedance.com,
Bobby Eshleman <bobby.eshleman@bytedance.com>,
jiang.wang@bytedance.com, cohuck@redhat.com,
virtualization@lists.linux-foundation.org, stefanha@redhat.com,
virtio-comment@lists.oasis-open.org,
arseny.krasnov@kaspersky.com
Subject: Re: [PATCH] virtio-vsock: add VIRTIO_VSOCK_F_DGRAM feature bit
Date: Sat, 2 Sep 2023 04:41:28 -0400 [thread overview]
Message-ID: <20230902043532-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <ZPK7/84QeSHtYGBQ@bullseye>
On Sat, Sep 02, 2023 at 04:37:19AM +0000, Bobby Eshleman wrote:
> On Fri, Sep 01, 2023 at 09:31:03AM -0400, Michael S. Tsirkin wrote:
> > On Wed, Aug 30, 2023 at 12:43:03AM +0000, Bobby Eshleman wrote:
> > > On Tue, Aug 29, 2023 at 06:21:35PM -0400, Michael S. Tsirkin wrote:
> > > > On Tue, Aug 29, 2023 at 09:29:45PM +0000, Bobby Eshleman wrote:
> > > > > This adds support for datagrams to the virtio-vsock device.
> > > > >
> > > > > virtio-vsock already supports stream and seqpacket types. The existing
> > > > > message types and header fields are extended to support datagrams.
> > > > > Semantic differences between the flow types are stated, as well as any
> > > > > additional requirements for devices and drivers implementing this
> > > > > feature.
> > > > >
> > > > > Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
> > > > > ---
> > > > > device-types/vsock/description.tex | 95 +++++++++++++++++++++++++++---
> > > > > 1 file changed, 88 insertions(+), 7 deletions(-)
> > > > >
> > > > > diff --git a/device-types/vsock/description.tex b/device-types/vsock/description.tex
> > > > > index 7d91d159872f..638dca8e5da1 100644
> > > > > --- a/device-types/vsock/description.tex
> > > > > +++ b/device-types/vsock/description.tex
> > > > > @@ -20,6 +20,7 @@ \subsection{Feature bits}\label{sec:Device Types / Socket Device / Feature bits}
> > > > > \item[VIRTIO_VSOCK_F_STREAM (0)] stream socket type is supported.
> > > > > \item[VIRTIO_VSOCK_F_SEQPACKET (1)] seqpacket socket type is supported.
> > > > > \item[VIRTIO_VSOCK_F_NO_IMPLIED_STREAM (2)] stream socket type is not implied.
> > > > > +\item[VIRTIO_VSOCK_F_DGRAM (3)] datagram socket type is supported.
> > > > > \end{description}
> > > > >
> > > > > \drivernormative{\subsubsection}{Feature bits}{Device Types / Socket Device / Feature bits}
> > > > > @@ -167,17 +168,22 @@ \subsubsection{Addressing}\label{sec:Device Types / Socket Device / Device Opera
> > > > > consists of a (cid, port number) tuple. The header fields used for this are
> > > > > \field{src_cid}, \field{src_port}, \field{dst_cid}, and \field{dst_port}.
> > > > >
> > > > > -Currently stream and seqpacket sockets are supported. \field{type} is 1 (VIRTIO_VSOCK_TYPE_STREAM)
> > > > > -for stream socket types, and 2 (VIRTIO_VSOCK_TYPE_SEQPACKET) for seqpacket socket types.
> > > > > +
> > > > > +Currently stream, seqpacket, and datagram sockets are supported. \field{type} is
> > > > > +1 (VIRTIO_VSOCK_TYPE_STREAM) for stream socket types, 2 (VIRTIO_VSOCK_TYPE_SEQPACKET) for
> > > > > +seqpacket socket types, and 3 (VIRTIO_VSOCK_TYPE_DGRAM) for datagram socket types.
> > > > >
> > > > > \begin{lstlisting}
> > > > > #define VIRTIO_VSOCK_TYPE_STREAM 1
> > > > > #define VIRTIO_VSOCK_TYPE_SEQPACKET 2
> > > > > +#define VIRTIO_VSOCK_TYPE_DGRAM 3
> > > > > \end{lstlisting}
> > > > >
> > > > > Stream sockets provide in-order, guaranteed, connection-oriented delivery
> > > > > without message boundaries. Seqpacket sockets provide in-order, guaranteed,
> > > > > -connection-oriented delivery with message and record boundaries.
> > > > > +connection-oriented delivery with message and record boundaries. Datagram
> > > > > +sockets provide connection-less, best-effort delivery of messages, with no
> > > > > +order or reliability guarantees.
> > > > >
> > > > > \subsubsection{Buffer Space Management}\label{sec:Device Types / Socket Device / Device Operation / Buffer Space Management}
> > > > > \field{buf_alloc} and \field{fwd_cnt} are used for buffer space management of
> > > > > @@ -203,16 +209,19 @@ \subsubsection{Buffer Space Management}\label{sec:Device Types / Socket Device /
> > > > > previously receiving a VIRTIO_VSOCK_OP_CREDIT_REQUEST packet. This allows
> > > > > communicating updates any time a change in buffer space occurs.
> > > > >
> > > > > +\field{buf_alloc} and \field{fwd_cnt} are reserved for future use by datagram
> > > > > +sockets. These fields are not used for datagram buffer space management.
> > > >
> > > > no limits on datagram size?
> > > >
> > >
> > > In the Linux proof-of-concept, it is 64KB. I can add that here too.
> >
> > or device driven maybe ...
> >
>
> Ah yes, I think Stefano was suggesting something like Laura's proposal:
> https://lists.oasis-open.org/archives/virtio-comment/202206/msg00093.html
>
> > > > also with no flow control at all there's a performance problem:
> > > > imagine queue gets full. now buffers begin to be dropped.
> > > > problem is, dropping is faster than delivering to application.
> > > > so now application sees its packets consumed super quickly
> > > > and starts sending even faster.
> > > > not good for performance.
> > > >
> > > > yes datagram expects some way to drop packets but just disabling flow
> > > > control completely is not the right thing to do I think.
> > > >
> > >
> > > On the LKML I discussed using congestion notification as a way to handle
> > > this situation, but deferred it to a future feature bit. I can build
> > > it in from the beginning though.
> >
> > as in messages to stop/start transmission? might work.
> >
>
> Yes, say for example that a flow is sending a large number of packets
> and the destination socket's receive queue becomes full. The destination
> sends a "stop" message with the destination address. More packets are
> likely to continue trailing in, and they may be dropped. The source
> device or driver will use the destination address to throttle further
> attempts of sockets to transmit to that destination, and the flow will
> stop.
>
> After the socket's receive queue is no longer full, it may notify the
> device/driver which can then send a "start" message with the destination
> address to any previously "stopped" sources. The sources can then
> reenable flows to that destination.
Ah so control is actually per socket. Another idea is that does not have
to be completely full - we can start suppressing a bit before it's full
to reduce packet drops.
>
> The other alternative I evaluated was using a single "stop for some
> time" message with an exponential or fibonacci backoff with a ceiling,
> but the downside is that a congested queue could block the "stop"
> message from reaching the source and eventually the source would become
> unthrottled even though the destination is *trying* to throttle it. I
> think this is bad for some pretty abusable cases, such as creating a
> socket that has no receiver. A stop/start message pair doesn't have this
> issue.
> > > >
> > > > > +
> > > > > \drivernormative{\paragraph}{Device Operation: Buffer Space Management}{Device Types / Socket Device / Device Operation / Buffer Space Management}
> > > > > -VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has
> > > > > -sufficient free buffer space for the payload.
> > > > > +For stream and seqpacket flows, VIRTIO_VSOCK_OP_RW data packets MUST only be
> > > > > +transmitted when the peer has sufficient free buffer space for the payload.
> > > > >
> > > > > All packets associated with a stream flow MUST contain valid information in
> > > > > \field{buf_alloc} and \field{fwd_cnt} fields.
> > > > >
> > > > > \devicenormative{\paragraph}{Device Operation: Buffer Space Management}{Device Types / Socket Device / Device Operation / Buffer Space Management}
> > > > > -VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has
> > > > > -sufficient free buffer space for the payload.
> > > > > +For stream and seqpacket flows, VIRTIO_VSOCK_OP_RW data packets MUST only be
> > > > > +transmitted when the peer has sufficient free buffer space for the payload.
> > > > >
> > > > > All packets associated with a stream flow MUST contain valid information in
> > > > > \field{buf_alloc} and \field{fwd_cnt} fields.
> > > > > @@ -299,6 +308,78 @@ \subsubsection{Seqpacket Sockets}\label{sec:Device Types / Socket Device / Devic
> > > > > #define VIRTIO_VSOCK_SEQ_EOR (1 << 1)
> > > > > \end{lstlisting}
> > > > >
> > > > > +\subsubsection{Datagram Sockets}\label{sec:Device Types / Socket Device / Device Operation / Datagram Sockets}
> > > > > +
> > > > > +\drivernormative{\paragraph}{Device Operation: Packet Fragmentation}{Device Types / Socket Device / Datagram Sockets / Fragmentation}
> > > > > +
> > > > > +Drivers MAY disassemble packets into smaller fragments. If drivers fragment a
> > > > > +packet, they MUST follow the fragmentation rules described in section
> > > > > +\ref{sec:Device Types / Socket Device / Device Operation / Datagram Sockets / Fragmentation}.
> > > > > +
> > > > > +Drivers MUST support assembly of received packet fragments according to the
> > > > > +fragmentation rules described in section
> > > > > +\ref{sec:Device Types / Socket Device / Device Operation / Datagram Sockets / Fragmentation}.
> > > > > +
> > > > > +\devicenormative{\paragraph}{Device Operation: Packet Fragmentation}{Device Types / Socket Device / Datagram Sockets / Fragmentation}
> > > > > +
> > > > > +Devices MAY disassemble packets into smaller fragments. If devices fragment a
> > > > > +packet, they MUST follow the fragmentation rules described in section
> > > > > +\ref{sec:Device Types / Socket Device / Device Operation / Datagram Sockets / Fragmentation}.
> > > > > +
> > > > > +Devices MUST support assembly of received packet fragments according to the
> > > > > +fragmentation rules described in section
> > > > > +\ref{sec:Device Types / Socket Device / Device Operation / Datagram Sockets / Fragmentation}.
> > > > > +
> > > > > +\drivernormative{\paragraph}{Device Operation: Packet Dropping}{Device Types / Socket Device / Datagram Sockets / Dropping}
> > > > > +
> > > > > +The driver MAY drop received packets with no notification to the device. This
> > > > > +can happen if, for example, there are insufficient resources or no socket
> > > > > +exists for the destination address.
> > > > > +
> > > > > +\devicenormative{\paragraph}{Device Operation: Packet Dropping}{Device Types / Socket Device / Datagram Sockets / Dropping}
> > > > > +
> > > > > +The device MAY drop received packets with no notification to the driver. This
> > > > > +can happen if, for example, there are insufficient resources or no socket
> > > > > +exists for the destination address.
> > > > > +
> > > > > +\paragraph{Datagram Fragmentation}\label{sec:Device Types / Socket Device / Device Operation / Datagram Sockets / Fragmentation}
> > > > > +
> > > > > +\field{flags} may have the following bit set:
> > > > > +
> > > > > +\begin{lstlisting}
> > > > > +#define VIRTIO_VSOCK_DGRAM_EOM (1 << 0)
> > > > > +\end{lstlisting}
> > > > > +
> > > > > +When the header \field{flags} field bit VIRTIO_VSOCK_DGRAM_EOM (bit 0) is set,
> > > > > +it indicates that the current payload is the end of a datagram fragment OR that
> > > > > +the current payload is an entire datagram packet.
> > > > > +
> > > > > +Datagram fragmentation is subject to the following rules:
> > > > > +
> > > > > +The fragments for a datagram packet MUST be added to the virtqueue in
> > > > > +sequential order.
> > > > > +
> > > > > +If a packet is not a fragment, then the VIRTIO_VSOCK_DGRAM_EOM bit (bit 0) of
> > > > > +\field{flags} MUST be set.
> > > > > +
> > > > > +If a packet is the last fragment of a fragment sequence, then the
> > > > > +VIRTIO_VSOCK_DGRAM_EOM bit (bit 0) of \field{flags} MUST be set.
> > > > > +
> > > > > +If a packet is any fragment except the last fragment of a fragment sequence,
> > > > > +then the VIRTIO_VSOCK_DGRAM_EOM bit (bit 0) of \field{flags} MUST NOT be set.
> > > > > +
> > > > > +If a driver or device has already placed a fragment of a packet on the
> > > > > +virtqueue, it MUST add the remaining fragments of the packet to the virtqueue
> > > > > +before adding any additional packets or fragments to the virtqueue.
> > > > > +
> > > > > +If fragments for a packet are only partially received after an
> > > > > +implementation-specific amount of time, then the destination device or driver
> > > > > +MAY drop the fragments.
> > > >
> > > > and how to detect this partial situation?
> > > >
> > > > and if not then what happens?
> > >
> > > All packets with EOM=0 should be added into a per-flow fragment list.
> > >
> > > Once a packet with EOM=1 is received, they can all be concatenated and
> > > delivered to the destination socket.
> > >
> > > If after X time no packet with EOM=1 is received, then we have detected
> > > this partial situation. The "then what happens part" is that the
> > > fragment list can be dropped and freed. Because the peer must send the
> > > remaining fragments, they will also be dropped and freed until EOM=1 is
> > > received.
> >
> > aha. worth mentioning.
> >
> > also all the must/may/should really belong in conformance
> > sections.
> >
>
> Got it, will do.
>
> >
> > > >
> > > > > +Each buffer containing a fragment MUST begin with a valid struct
> > > > > +virtio_vsock_hdr. \field{len} MUST equal the length of the fragment payload
> > > > > +only.
> > > >
> > > > what does this mean exactly?
> > > >
> > >
> > > It means that len only accounts for the fragment and not the whole
> > > (fragmented) packet. The second sentence means that the virtqueue buffer
> > > starts with the header.
> > >
> > > For virtio-net mergable RX buffers the header is only in the first
> > > buffer and the length field accounts for the entire fragmented packet
> > > (that spans multiple bufers), so I suspected the specification was
> > > needed here too.
> > >
> > > I'm happy to omit it.
> > > > > \subsubsection{Device Events}\label{sec:Device Types / Socket Device / Device Operation / Device Events}
> > > > >
> > > > > Certain events are communicated by the device to the driver using the event
> > > > > --
> > > > > 2.20.1
> > > >
> > > > _______________________________________________
> > > > Virtualization mailing list
> > > > Virtualization@lists.linux-foundation.org
> > > > https://lists.linuxfoundation.org/mailman/listinfo/virtualization
> >
> > _______________________________________________
> > Virtualization mailing list
> > Virtualization@lists.linux-foundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/virtualization
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
next prev parent reply other threads:[~2023-09-02 8:41 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20230829212945.3420727-1-bobby.eshleman@bytedance.com>
2023-08-29 22:21 ` [PATCH] virtio-vsock: add VIRTIO_VSOCK_F_DGRAM feature bit Michael S. Tsirkin
[not found] ` <ZO6Ql9iWi6a8vDyU@bullseye>
2023-09-01 13:31 ` Michael S. Tsirkin
[not found] ` <ZPK7/84QeSHtYGBQ@bullseye>
2023-09-02 8:41 ` Michael S. Tsirkin [this message]
[not found] ` <ZPL/Ss6sXmfU/0Mg@bullseye>
2023-09-06 8:17 ` Michael S. Tsirkin
2023-09-01 12:45 ` Stefano Garzarella
[not found] ` <ZPLAip/TWqvWZ0hv@bullseye>
2023-09-02 8:35 ` Michael S. Tsirkin
2023-09-06 14:28 ` Stefano Garzarella
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230902043532-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=arseny.krasnov@kaspersky.com \
--cc=bobby.eshleman@bytedance.com \
--cc=bobbyeshleman@gmail.com \
--cc=cohuck@redhat.com \
--cc=cong.wang@bytedance.com \
--cc=jiang.wang@bytedance.com \
--cc=stefanha@redhat.com \
--cc=virtio-comment@lists.oasis-open.org \
--cc=virtio-dev@lists.oasis-open.org \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).