virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] virtio-vsock: add VIRTIO_VSOCK_F_DGRAM feature bit
       [not found] <20230829212945.3420727-1-bobby.eshleman@bytedance.com>
@ 2023-08-29 22:21 ` Michael S. Tsirkin
       [not found]   ` <ZO6Ql9iWi6a8vDyU@bullseye>
  2023-09-01 12:45 ` Stefano Garzarella
  1 sibling, 1 reply; 7+ messages in thread
From: Michael S. Tsirkin @ 2023-08-29 22:21 UTC (permalink / raw)
  To: Bobby Eshleman
  Cc: virtio-dev, cong.wang, jiang.wang, bobbyeshleman, cohuck,
	virtualization, stefanha, virtio-comment, arseny.krasnov

On Tue, Aug 29, 2023 at 09:29:45PM +0000, Bobby Eshleman wrote:
> This adds support for datagrams to the virtio-vsock device.
> 
> virtio-vsock already supports stream and seqpacket types. The existing
> message types and header fields are extended to support datagrams.
> Semantic differences between the flow types are stated, as well as any
> additional requirements for devices and drivers implementing this
> feature.
> 
> Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
> ---
>  device-types/vsock/description.tex | 95 +++++++++++++++++++++++++++---
>  1 file changed, 88 insertions(+), 7 deletions(-)
> 
> diff --git a/device-types/vsock/description.tex b/device-types/vsock/description.tex
> index 7d91d159872f..638dca8e5da1 100644
> --- a/device-types/vsock/description.tex
> +++ b/device-types/vsock/description.tex
> @@ -20,6 +20,7 @@ \subsection{Feature bits}\label{sec:Device Types / Socket Device / Feature bits}
>  \item[VIRTIO_VSOCK_F_STREAM (0)] stream socket type is supported.
>  \item[VIRTIO_VSOCK_F_SEQPACKET (1)] seqpacket socket type is supported.
>  \item[VIRTIO_VSOCK_F_NO_IMPLIED_STREAM (2)] stream socket type is not implied.
> +\item[VIRTIO_VSOCK_F_DGRAM (3)] datagram socket type is supported.
>  \end{description}
>  
>  \drivernormative{\subsubsection}{Feature bits}{Device Types / Socket Device / Feature bits}
> @@ -167,17 +168,22 @@ \subsubsection{Addressing}\label{sec:Device Types / Socket Device / Device Opera
>  consists of a (cid, port number) tuple. The header fields used for this are
>  \field{src_cid}, \field{src_port}, \field{dst_cid}, and \field{dst_port}.
>  
> -Currently stream and seqpacket sockets are supported. \field{type} is 1 (VIRTIO_VSOCK_TYPE_STREAM)
> -for stream socket types, and 2 (VIRTIO_VSOCK_TYPE_SEQPACKET) for seqpacket socket types.
> +
> +Currently stream, seqpacket, and datagram sockets are supported. \field{type} is
> +1 (VIRTIO_VSOCK_TYPE_STREAM) for stream socket types, 2 (VIRTIO_VSOCK_TYPE_SEQPACKET) for
> +seqpacket socket types, and 3 (VIRTIO_VSOCK_TYPE_DGRAM) for datagram socket types.
>  
>  \begin{lstlisting}
>  #define VIRTIO_VSOCK_TYPE_STREAM    1
>  #define VIRTIO_VSOCK_TYPE_SEQPACKET 2
> +#define VIRTIO_VSOCK_TYPE_DGRAM     3
>  \end{lstlisting}
>  
>  Stream sockets provide in-order, guaranteed, connection-oriented delivery
>  without message boundaries. Seqpacket sockets provide in-order, guaranteed,
> -connection-oriented delivery with message and record boundaries.
> +connection-oriented delivery with message and record boundaries. Datagram
> +sockets provide connection-less, best-effort delivery of messages, with no
> +order or reliability guarantees.
>  
>  \subsubsection{Buffer Space Management}\label{sec:Device Types / Socket Device / Device Operation / Buffer Space Management}
>  \field{buf_alloc} and \field{fwd_cnt} are used for buffer space management of
> @@ -203,16 +209,19 @@ \subsubsection{Buffer Space Management}\label{sec:Device Types / Socket Device /
>  previously receiving a VIRTIO_VSOCK_OP_CREDIT_REQUEST packet. This allows
>  communicating updates any time a change in buffer space occurs.
>  
> +\field{buf_alloc} and \field{fwd_cnt} are reserved for future use by datagram
> +sockets. These fields are not used for datagram buffer space management.

no limits on datagram size?

also with no flow control at all there's a performance problem:
imagine queue gets full. now buffers begin to be dropped.
problem is, dropping is faster than delivering to application.
so now application sees its packets consumed super quickly
and starts sending even faster.
not good for performance.

yes datagram expects some way to drop packets but just disabling flow
control completely is not the right thing to do I think.


> +
>  \drivernormative{\paragraph}{Device Operation: Buffer Space Management}{Device Types / Socket Device / Device Operation / Buffer Space Management}
> -VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has
> -sufficient free buffer space for the payload.
> +For stream and seqpacket flows, VIRTIO_VSOCK_OP_RW data packets MUST only be
> +transmitted when the peer has sufficient free buffer space for the payload.
>  
>  All packets associated with a stream flow MUST contain valid information in
>  \field{buf_alloc} and \field{fwd_cnt} fields.
>  
>  \devicenormative{\paragraph}{Device Operation: Buffer Space Management}{Device Types / Socket Device / Device Operation / Buffer Space Management}
> -VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has
> -sufficient free buffer space for the payload.
> +For stream and seqpacket flows, VIRTIO_VSOCK_OP_RW data packets MUST only be
> +transmitted when the peer has sufficient free buffer space for the payload.
>  
>  All packets associated with a stream flow MUST contain valid information in
>  \field{buf_alloc} and \field{fwd_cnt} fields.
> @@ -299,6 +308,78 @@ \subsubsection{Seqpacket Sockets}\label{sec:Device Types / Socket Device / Devic
>  #define VIRTIO_VSOCK_SEQ_EOR (1 << 1)
>  \end{lstlisting}
>  
> +\subsubsection{Datagram Sockets}\label{sec:Device Types / Socket Device / Device Operation / Datagram Sockets}
> +
> +\drivernormative{\paragraph}{Device Operation: Packet Fragmentation}{Device Types / Socket Device / Datagram Sockets / Fragmentation}
> +
> +Drivers MAY disassemble packets into smaller fragments. If drivers fragment a
> +packet, they MUST follow the fragmentation rules described in section
> +\ref{sec:Device Types / Socket Device / Device Operation / Datagram Sockets / Fragmentation}.
> +
> +Drivers MUST support assembly of received packet fragments according to the
> +fragmentation rules described in section
> +\ref{sec:Device Types / Socket Device / Device Operation / Datagram Sockets / Fragmentation}.
> +
> +\devicenormative{\paragraph}{Device Operation: Packet Fragmentation}{Device Types / Socket Device / Datagram Sockets / Fragmentation}
> +
> +Devices MAY disassemble packets into smaller fragments. If devices fragment a
> +packet, they MUST follow the fragmentation rules described in section
> +\ref{sec:Device Types / Socket Device / Device Operation / Datagram Sockets / Fragmentation}.
> +
> +Devices MUST support assembly of received packet fragments according to the
> +fragmentation rules described in section
> +\ref{sec:Device Types / Socket Device / Device Operation / Datagram Sockets / Fragmentation}.
> +
> +\drivernormative{\paragraph}{Device Operation: Packet Dropping}{Device Types / Socket Device / Datagram Sockets / Dropping}
> +
> +The driver MAY drop received packets with no notification to the device. This
> +can happen if, for example, there are insufficient resources or no socket
> +exists for the destination address.
> +
> +\devicenormative{\paragraph}{Device Operation: Packet Dropping}{Device Types / Socket Device / Datagram Sockets / Dropping}
> +
> +The device MAY drop received packets with no notification to the driver. This
> +can happen if, for example, there are insufficient resources or no socket
> +exists for the destination address.
> +
> +\paragraph{Datagram Fragmentation}\label{sec:Device Types / Socket Device / Device Operation / Datagram Sockets / Fragmentation}
> +
> +\field{flags} may have the following bit set:
> +
> +\begin{lstlisting}
> +#define VIRTIO_VSOCK_DGRAM_EOM (1 << 0)
> +\end{lstlisting}
> +
> +When the header \field{flags} field bit VIRTIO_VSOCK_DGRAM_EOM (bit 0) is set,
> +it indicates that the current payload is the end of a datagram fragment OR that
> +the current payload is an entire datagram packet.
> +
> +Datagram fragmentation is subject to the following rules:
> +
> +The fragments for a datagram packet MUST be added to the virtqueue in
> +sequential order.
> +
> +If a packet is not a fragment, then the VIRTIO_VSOCK_DGRAM_EOM bit (bit 0) of
> +\field{flags} MUST be set.
> +
> +If a packet is the last fragment of a fragment sequence, then the
> +VIRTIO_VSOCK_DGRAM_EOM bit (bit 0) of \field{flags} MUST be set.
> +
> +If a packet is any fragment except the last fragment of a fragment sequence,
> +then the VIRTIO_VSOCK_DGRAM_EOM bit (bit 0) of \field{flags} MUST NOT be set.
> +
> +If a driver or device has already placed a fragment of a packet on the
> +virtqueue, it MUST add the remaining fragments of the packet to the virtqueue
> +before adding any additional packets or fragments to the virtqueue.
> +
> +If fragments for a packet are only partially received after an
> +implementation-specific amount of time, then the destination device or driver
> +MAY drop the fragments.

and how to detect this partial situation?

and if not then what happens?

> +Each buffer containing a fragment MUST begin with a valid struct
> +virtio_vsock_hdr. \field{len} MUST equal the length of the fragment payload
> +only.

what does this mean exactly?

>  \subsubsection{Device Events}\label{sec:Device Types / Socket Device / Device Operation / Device Events}
>  
>  Certain events are communicated by the device to the driver using the event
> -- 
> 2.20.1

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] virtio-vsock: add VIRTIO_VSOCK_F_DGRAM feature bit
       [not found] <20230829212945.3420727-1-bobby.eshleman@bytedance.com>
  2023-08-29 22:21 ` [PATCH] virtio-vsock: add VIRTIO_VSOCK_F_DGRAM feature bit Michael S. Tsirkin
@ 2023-09-01 12:45 ` Stefano Garzarella
       [not found]   ` <ZPLAip/TWqvWZ0hv@bullseye>
  1 sibling, 1 reply; 7+ messages in thread
From: Stefano Garzarella @ 2023-09-01 12:45 UTC (permalink / raw)
  To: Bobby Eshleman
  Cc: virtio-dev, cong.wang, jiang.wang, bobbyeshleman, cohuck,
	virtualization, mst, stefanha, virtio-comment, arseny.krasnov

On Tue, Aug 29, 2023 at 09:29:45PM +0000, Bobby Eshleman wrote:
>This adds support for datagrams to the virtio-vsock device.
>
>virtio-vsock already supports stream and seqpacket types. The existing
>message types and header fields are extended to support datagrams.
>Semantic differences between the flow types are stated, as well as any
>additional requirements for devices and drivers implementing this
>feature.
>
>Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
>---
> device-types/vsock/description.tex | 95 +++++++++++++++++++++++++++---
> 1 file changed, 88 insertions(+), 7 deletions(-)
>
>diff --git a/device-types/vsock/description.tex b/device-types/vsock/description.tex
>index 7d91d159872f..638dca8e5da1 100644
>--- a/device-types/vsock/description.tex
>+++ b/device-types/vsock/description.tex
>@@ -20,6 +20,7 @@ \subsection{Feature bits}\label{sec:Device Types / Socket Device / Feature bits}
> \item[VIRTIO_VSOCK_F_STREAM (0)] stream socket type is supported.
> \item[VIRTIO_VSOCK_F_SEQPACKET (1)] seqpacket socket type is supported.
> \item[VIRTIO_VSOCK_F_NO_IMPLIED_STREAM (2)] stream socket type is not implied.
>+\item[VIRTIO_VSOCK_F_DGRAM (3)] datagram socket type is supported.
> \end{description}
>
> \drivernormative{\subsubsection}{Feature bits}{Device Types / Socket Device / Feature bits}
>@@ -167,17 +168,22 @@ \subsubsection{Addressing}\label{sec:Device Types / Socket Device / Device Opera
> consists of a (cid, port number) tuple. The header fields used for this are
> \field{src_cid}, \field{src_port}, \field{dst_cid}, and \field{dst_port}.
>
>-Currently stream and seqpacket sockets are supported. \field{type} is 1 (VIRTIO_VSOCK_TYPE_STREAM)
>-for stream socket types, and 2 (VIRTIO_VSOCK_TYPE_SEQPACKET) for seqpacket socket types.
>+
>+Currently stream, seqpacket, and datagram sockets are supported. \field{type} is
>+1 (VIRTIO_VSOCK_TYPE_STREAM) for stream socket types, 2 (VIRTIO_VSOCK_TYPE_SEQPACKET) for
>+seqpacket socket types, and 3 (VIRTIO_VSOCK_TYPE_DGRAM) for datagram socket types.
>
> \begin{lstlisting}
> #define VIRTIO_VSOCK_TYPE_STREAM    1
> #define VIRTIO_VSOCK_TYPE_SEQPACKET 2
>+#define VIRTIO_VSOCK_TYPE_DGRAM     3
> \end{lstlisting}
>
> Stream sockets provide in-order, guaranteed, connection-oriented delivery
> without message boundaries. Seqpacket sockets provide in-order, guaranteed,
>-connection-oriented delivery with message and record boundaries.
>+connection-oriented delivery with message and record boundaries. Datagram
>+sockets provide connection-less, best-effort delivery of messages, with no
>+order or reliability guarantees.
>
> \subsubsection{Buffer Space Management}\label{sec:Device Types / Socket Device / Device Operation / Buffer Space Management}
> \field{buf_alloc} and \field{fwd_cnt} are used for buffer space management of
>@@ -203,16 +209,19 @@ \subsubsection{Buffer Space Management}\label{sec:Device Types / Socket Device /
> previously receiving a VIRTIO_VSOCK_OP_CREDIT_REQUEST packet. This allows
> communicating updates any time a change in buffer space occurs.
>
>+\field{buf_alloc} and \field{fwd_cnt} are reserved for future use by datagram
>+sockets. These fields are not used for datagram buffer space management.
>+
> \drivernormative{\paragraph}{Device Operation: Buffer Space Management}{Device Types / Socket Device / Device Operation / Buffer Space Management}
>-VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has
>-sufficient free buffer space for the payload.
>+For stream and seqpacket flows, VIRTIO_VSOCK_OP_RW data packets MUST only be
>+transmitted when the peer has sufficient free buffer space for the payload.
>
> All packets associated with a stream flow MUST contain valid information in
> \field{buf_alloc} and \field{fwd_cnt} fields.
>
> \devicenormative{\paragraph}{Device Operation: Buffer Space 
> Management}{Device Types / Socket Device / Device Operation / Buffer 
> Space Management}
>-VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has
>-sufficient free buffer space for the payload.
>+For stream and seqpacket flows, VIRTIO_VSOCK_OP_RW data packets MUST only be
>+transmitted when the peer has sufficient free buffer space for the payload.
>
> All packets associated with a stream flow MUST contain valid information in
> \field{buf_alloc} and \field{fwd_cnt} fields.
>@@ -299,6 +308,78 @@ \subsubsection{Seqpacket Sockets}\label{sec:Device Types / Socket Device / Devic
> #define VIRTIO_VSOCK_SEQ_EOR (1 << 1)
> \end{lstlisting}
>
>+\subsubsection{Datagram Sockets}\label{sec:Device Types / Socket Device / Device Operation / Datagram Sockets}
>+
>+\drivernormative{\paragraph}{Device Operation: Packet Fragmentation}{Device Types / Socket Device / Datagram Sockets / Fragmentation}
>+
>+Drivers MAY disassemble packets into smaller fragments. If drivers fragment a
>+packet, they MUST follow the fragmentation rules described in section
>+\ref{sec:Device Types / Socket Device / Device Operation / Datagram Sockets / Fragmentation}.
>+
>+Drivers MUST support assembly of received packet fragments according to the
>+fragmentation rules described in section
>+\ref{sec:Device Types / Socket Device / Device Operation / Datagram 
>Sockets / Fragmentation}.
>+
>+\devicenormative{\paragraph}{Device Operation: Packet Fragmentation}{Device Types / Socket Device / Datagram Sockets / Fragmentation}
>+
>+Devices MAY disassemble packets into smaller fragments. If devices fragment a
>+packet, they MUST follow the fragmentation rules described in section
>+\ref{sec:Device Types / Socket Device / Device Operation / Datagram Sockets / Fragmentation}.
>+
>+Devices MUST support assembly of received packet fragments according to the
>+fragmentation rules described in section
>+\ref{sec:Device Types / Socket Device / Device Operation / Datagram Sockets / Fragmentation}.
>+
>+\drivernormative{\paragraph}{Device Operation: Packet Dropping}{Device Types / Socket Device / Datagram Sockets / Dropping}
>+
>+The driver MAY drop received packets with no notification to the device. This
>+can happen if, for example, there are insufficient resources or no socket
>+exists for the destination address.
>+
>+\devicenormative{\paragraph}{Device Operation: Packet Dropping}{Device Types / Socket Device / Datagram Sockets / Dropping}
>+
>+The device MAY drop received packets with no notification to the driver. This
>+can happen if, for example, there are insufficient resources or no socket
>+exists for the destination address.

Should we provide some notification if the socket does not exist at the
destination?

>+
>+\paragraph{Datagram Fragmentation}\label{sec:Device Types / Socket Device / Device Operation / Datagram Sockets / Fragmentation}
>+
>+\field{flags} may have the following bit set:
>+
>+\begin{lstlisting}
>+#define VIRTIO_VSOCK_DGRAM_EOM (1 << 0)
>+\end{lstlisting}
>+
>+When the header \field{flags} field bit VIRTIO_VSOCK_DGRAM_EOM (bit 0) is set,
>+it indicates that the current payload is the end of a datagram 
>fragment OR that
>+the current payload is an entire datagram packet.

In the destination, if we discard some fragments, then could we
reconstruct a different datagram from the one sent?

Is that anything acceptable?

Thanks,
Stefano

>+
>+Datagram fragmentation is subject to the following rules:
>+
>+The fragments for a datagram packet MUST be added to the virtqueue in
>+sequential order.
>+
>+If a packet is not a fragment, then the VIRTIO_VSOCK_DGRAM_EOM bit (bit 0) of
>+\field{flags} MUST be set.
>+
>+If a packet is the last fragment of a fragment sequence, then the
>+VIRTIO_VSOCK_DGRAM_EOM bit (bit 0) of \field{flags} MUST be set.
>+
>+If a packet is any fragment except the last fragment of a fragment sequence,
>+then the VIRTIO_VSOCK_DGRAM_EOM bit (bit 0) of \field{flags} MUST NOT be set.
>+
>+If a driver or device has already placed a fragment of a packet on the
>+virtqueue, it MUST add the remaining fragments of the packet to the virtqueue
>+before adding any additional packets or fragments to the virtqueue.
>+
>+If fragments for a packet are only partially received after an
>+implementation-specific amount of time, then the destination device or driver
>+MAY drop the fragments.
>+
>+Each buffer containing a fragment MUST begin with a valid struct
>+virtio_vsock_hdr. \field{len} MUST equal the length of the fragment payload
>+only.
>+
> \subsubsection{Device Events}\label{sec:Device Types / Socket Device / Device Operation / Device Events}
>
> Certain events are communicated by the device to the driver using the event
>-- 
>2.20.1
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] virtio-vsock: add VIRTIO_VSOCK_F_DGRAM feature bit
       [not found]   ` <ZO6Ql9iWi6a8vDyU@bullseye>
@ 2023-09-01 13:31     ` Michael S. Tsirkin
       [not found]       ` <ZPK7/84QeSHtYGBQ@bullseye>
  0 siblings, 1 reply; 7+ messages in thread
From: Michael S. Tsirkin @ 2023-09-01 13:31 UTC (permalink / raw)
  To: Bobby Eshleman
  Cc: virtio-dev, cong.wang, Bobby Eshleman, jiang.wang, cohuck,
	virtualization, stefanha, virtio-comment, arseny.krasnov

On Wed, Aug 30, 2023 at 12:43:03AM +0000, Bobby Eshleman wrote:
> On Tue, Aug 29, 2023 at 06:21:35PM -0400, Michael S. Tsirkin wrote:
> > On Tue, Aug 29, 2023 at 09:29:45PM +0000, Bobby Eshleman wrote:
> > > This adds support for datagrams to the virtio-vsock device.
> > > 
> > > virtio-vsock already supports stream and seqpacket types. The existing
> > > message types and header fields are extended to support datagrams.
> > > Semantic differences between the flow types are stated, as well as any
> > > additional requirements for devices and drivers implementing this
> > > feature.
> > > 
> > > Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
> > > ---
> > >  device-types/vsock/description.tex | 95 +++++++++++++++++++++++++++---
> > >  1 file changed, 88 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/device-types/vsock/description.tex b/device-types/vsock/description.tex
> > > index 7d91d159872f..638dca8e5da1 100644
> > > --- a/device-types/vsock/description.tex
> > > +++ b/device-types/vsock/description.tex
> > > @@ -20,6 +20,7 @@ \subsection{Feature bits}\label{sec:Device Types / Socket Device / Feature bits}
> > >  \item[VIRTIO_VSOCK_F_STREAM (0)] stream socket type is supported.
> > >  \item[VIRTIO_VSOCK_F_SEQPACKET (1)] seqpacket socket type is supported.
> > >  \item[VIRTIO_VSOCK_F_NO_IMPLIED_STREAM (2)] stream socket type is not implied.
> > > +\item[VIRTIO_VSOCK_F_DGRAM (3)] datagram socket type is supported.
> > >  \end{description}
> > >  
> > >  \drivernormative{\subsubsection}{Feature bits}{Device Types / Socket Device / Feature bits}
> > > @@ -167,17 +168,22 @@ \subsubsection{Addressing}\label{sec:Device Types / Socket Device / Device Opera
> > >  consists of a (cid, port number) tuple. The header fields used for this are
> > >  \field{src_cid}, \field{src_port}, \field{dst_cid}, and \field{dst_port}.
> > >  
> > > -Currently stream and seqpacket sockets are supported. \field{type} is 1 (VIRTIO_VSOCK_TYPE_STREAM)
> > > -for stream socket types, and 2 (VIRTIO_VSOCK_TYPE_SEQPACKET) for seqpacket socket types.
> > > +
> > > +Currently stream, seqpacket, and datagram sockets are supported. \field{type} is
> > > +1 (VIRTIO_VSOCK_TYPE_STREAM) for stream socket types, 2 (VIRTIO_VSOCK_TYPE_SEQPACKET) for
> > > +seqpacket socket types, and 3 (VIRTIO_VSOCK_TYPE_DGRAM) for datagram socket types.
> > >  
> > >  \begin{lstlisting}
> > >  #define VIRTIO_VSOCK_TYPE_STREAM    1
> > >  #define VIRTIO_VSOCK_TYPE_SEQPACKET 2
> > > +#define VIRTIO_VSOCK_TYPE_DGRAM     3
> > >  \end{lstlisting}
> > >  
> > >  Stream sockets provide in-order, guaranteed, connection-oriented delivery
> > >  without message boundaries. Seqpacket sockets provide in-order, guaranteed,
> > > -connection-oriented delivery with message and record boundaries.
> > > +connection-oriented delivery with message and record boundaries. Datagram
> > > +sockets provide connection-less, best-effort delivery of messages, with no
> > > +order or reliability guarantees.
> > >  
> > >  \subsubsection{Buffer Space Management}\label{sec:Device Types / Socket Device / Device Operation / Buffer Space Management}
> > >  \field{buf_alloc} and \field{fwd_cnt} are used for buffer space management of
> > > @@ -203,16 +209,19 @@ \subsubsection{Buffer Space Management}\label{sec:Device Types / Socket Device /
> > >  previously receiving a VIRTIO_VSOCK_OP_CREDIT_REQUEST packet. This allows
> > >  communicating updates any time a change in buffer space occurs.
> > >  
> > > +\field{buf_alloc} and \field{fwd_cnt} are reserved for future use by datagram
> > > +sockets. These fields are not used for datagram buffer space management.
> > 
> > no limits on datagram size?
> > 
> 
> In the Linux proof-of-concept, it is 64KB. I can add that here too.

or device driven maybe ...

> > also with no flow control at all there's a performance problem:
> > imagine queue gets full. now buffers begin to be dropped.
> > problem is, dropping is faster than delivering to application.
> > so now application sees its packets consumed super quickly
> > and starts sending even faster.
> > not good for performance.
> > 
> > yes datagram expects some way to drop packets but just disabling flow
> > control completely is not the right thing to do I think.
> > 
> 
> On the LKML I discussed using congestion notification as a way to handle
> this situation, but deferred it to a future feature bit. I can build
> it in from the beginning though.

as in messages to stop/start transmission? might work.

> > 
> > > +
> > >  \drivernormative{\paragraph}{Device Operation: Buffer Space Management}{Device Types / Socket Device / Device Operation / Buffer Space Management}
> > > -VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has
> > > -sufficient free buffer space for the payload.
> > > +For stream and seqpacket flows, VIRTIO_VSOCK_OP_RW data packets MUST only be
> > > +transmitted when the peer has sufficient free buffer space for the payload.
> > >  
> > >  All packets associated with a stream flow MUST contain valid information in
> > >  \field{buf_alloc} and \field{fwd_cnt} fields.
> > >  
> > >  \devicenormative{\paragraph}{Device Operation: Buffer Space Management}{Device Types / Socket Device / Device Operation / Buffer Space Management}
> > > -VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has
> > > -sufficient free buffer space for the payload.
> > > +For stream and seqpacket flows, VIRTIO_VSOCK_OP_RW data packets MUST only be
> > > +transmitted when the peer has sufficient free buffer space for the payload.
> > >  
> > >  All packets associated with a stream flow MUST contain valid information in
> > >  \field{buf_alloc} and \field{fwd_cnt} fields.
> > > @@ -299,6 +308,78 @@ \subsubsection{Seqpacket Sockets}\label{sec:Device Types / Socket Device / Devic
> > >  #define VIRTIO_VSOCK_SEQ_EOR (1 << 1)
> > >  \end{lstlisting}
> > >  
> > > +\subsubsection{Datagram Sockets}\label{sec:Device Types / Socket Device / Device Operation / Datagram Sockets}
> > > +
> > > +\drivernormative{\paragraph}{Device Operation: Packet Fragmentation}{Device Types / Socket Device / Datagram Sockets / Fragmentation}
> > > +
> > > +Drivers MAY disassemble packets into smaller fragments. If drivers fragment a
> > > +packet, they MUST follow the fragmentation rules described in section
> > > +\ref{sec:Device Types / Socket Device / Device Operation / Datagram Sockets / Fragmentation}.
> > > +
> > > +Drivers MUST support assembly of received packet fragments according to the
> > > +fragmentation rules described in section
> > > +\ref{sec:Device Types / Socket Device / Device Operation / Datagram Sockets / Fragmentation}.
> > > +
> > > +\devicenormative{\paragraph}{Device Operation: Packet Fragmentation}{Device Types / Socket Device / Datagram Sockets / Fragmentation}
> > > +
> > > +Devices MAY disassemble packets into smaller fragments. If devices fragment a
> > > +packet, they MUST follow the fragmentation rules described in section
> > > +\ref{sec:Device Types / Socket Device / Device Operation / Datagram Sockets / Fragmentation}.
> > > +
> > > +Devices MUST support assembly of received packet fragments according to the
> > > +fragmentation rules described in section
> > > +\ref{sec:Device Types / Socket Device / Device Operation / Datagram Sockets / Fragmentation}.
> > > +
> > > +\drivernormative{\paragraph}{Device Operation: Packet Dropping}{Device Types / Socket Device / Datagram Sockets / Dropping}
> > > +
> > > +The driver MAY drop received packets with no notification to the device. This
> > > +can happen if, for example, there are insufficient resources or no socket
> > > +exists for the destination address.
> > > +
> > > +\devicenormative{\paragraph}{Device Operation: Packet Dropping}{Device Types / Socket Device / Datagram Sockets / Dropping}
> > > +
> > > +The device MAY drop received packets with no notification to the driver. This
> > > +can happen if, for example, there are insufficient resources or no socket
> > > +exists for the destination address.
> > > +
> > > +\paragraph{Datagram Fragmentation}\label{sec:Device Types / Socket Device / Device Operation / Datagram Sockets / Fragmentation}
> > > +
> > > +\field{flags} may have the following bit set:
> > > +
> > > +\begin{lstlisting}
> > > +#define VIRTIO_VSOCK_DGRAM_EOM (1 << 0)
> > > +\end{lstlisting}
> > > +
> > > +When the header \field{flags} field bit VIRTIO_VSOCK_DGRAM_EOM (bit 0) is set,
> > > +it indicates that the current payload is the end of a datagram fragment OR that
> > > +the current payload is an entire datagram packet.
> > > +
> > > +Datagram fragmentation is subject to the following rules:
> > > +
> > > +The fragments for a datagram packet MUST be added to the virtqueue in
> > > +sequential order.
> > > +
> > > +If a packet is not a fragment, then the VIRTIO_VSOCK_DGRAM_EOM bit (bit 0) of
> > > +\field{flags} MUST be set.
> > > +
> > > +If a packet is the last fragment of a fragment sequence, then the
> > > +VIRTIO_VSOCK_DGRAM_EOM bit (bit 0) of \field{flags} MUST be set.
> > > +
> > > +If a packet is any fragment except the last fragment of a fragment sequence,
> > > +then the VIRTIO_VSOCK_DGRAM_EOM bit (bit 0) of \field{flags} MUST NOT be set.
> > > +
> > > +If a driver or device has already placed a fragment of a packet on the
> > > +virtqueue, it MUST add the remaining fragments of the packet to the virtqueue
> > > +before adding any additional packets or fragments to the virtqueue.
> > > +
> > > +If fragments for a packet are only partially received after an
> > > +implementation-specific amount of time, then the destination device or driver
> > > +MAY drop the fragments.
> > 
> > and how to detect this partial situation?
> > 
> > and if not then what happens?
> 
> All packets with EOM=0 should be added into a per-flow fragment list.
> 
> Once a packet with EOM=1 is received, they can all be concatenated and
> delivered to the destination socket.
> 
> If after X time no packet with EOM=1 is received, then we have detected
> this partial situation. The "then what happens part" is that the
> fragment list can be dropped and freed. Because the peer must send the
> remaining fragments, they will also be dropped and freed until EOM=1 is
> received.

aha. worth mentioning.

also all the must/may/should really belong in conformance
sections.


> > 
> > > +Each buffer containing a fragment MUST begin with a valid struct
> > > +virtio_vsock_hdr. \field{len} MUST equal the length of the fragment payload
> > > +only.
> > 
> > what does this mean exactly?
> > 
> 
> It means that len only accounts for the fragment and not the whole
> (fragmented) packet. The second sentence means that the virtqueue buffer
> starts with the header.
> 
> For virtio-net mergable RX buffers the header is only in the first
> buffer and the length field accounts for the entire fragmented packet
> (that spans multiple bufers), so I suspected the specification was
> needed here too.
> 
> I'm happy to omit it.
> > >  \subsubsection{Device Events}\label{sec:Device Types / Socket Device / Device Operation / Device Events}
> > >  
> > >  Certain events are communicated by the device to the driver using the event
> > > -- 
> > > 2.20.1
> > 
> > _______________________________________________
> > Virtualization mailing list
> > Virtualization@lists.linux-foundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/virtualization

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] virtio-vsock: add VIRTIO_VSOCK_F_DGRAM feature bit
       [not found]   ` <ZPLAip/TWqvWZ0hv@bullseye>
@ 2023-09-02  8:35     ` Michael S. Tsirkin
  2023-09-06 14:28       ` Stefano Garzarella
  0 siblings, 1 reply; 7+ messages in thread
From: Michael S. Tsirkin @ 2023-09-02  8:35 UTC (permalink / raw)
  To: Bobby Eshleman
  Cc: virtio-dev, cong.wang, Bobby Eshleman, jiang.wang, cohuck,
	virtualization, stefanha, virtio-comment, arseny.krasnov

On Sat, Sep 02, 2023 at 04:56:42AM +0000, Bobby Eshleman wrote:
> On Fri, Sep 01, 2023 at 02:45:14PM +0200, Stefano Garzarella wrote:
> > On Tue, Aug 29, 2023 at 09:29:45PM +0000, Bobby Eshleman wrote:
> > > This adds support for datagrams to the virtio-vsock device.
> > > 
> > > virtio-vsock already supports stream and seqpacket types. The existing
> > > message types and header fields are extended to support datagrams.
> > > Semantic differences between the flow types are stated, as well as any
> > > additional requirements for devices and drivers implementing this
> > > feature.
> > > 
> > > Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
> > > ---
> > > device-types/vsock/description.tex | 95 +++++++++++++++++++++++++++---
> > > 1 file changed, 88 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/device-types/vsock/description.tex b/device-types/vsock/description.tex
> > > index 7d91d159872f..638dca8e5da1 100644
> > > --- a/device-types/vsock/description.tex
> > > +++ b/device-types/vsock/description.tex
> > > @@ -20,6 +20,7 @@ \subsection{Feature bits}\label{sec:Device Types / Socket Device / Feature bits}
> > > \item[VIRTIO_VSOCK_F_STREAM (0)] stream socket type is supported.
> > > \item[VIRTIO_VSOCK_F_SEQPACKET (1)] seqpacket socket type is supported.
> > > \item[VIRTIO_VSOCK_F_NO_IMPLIED_STREAM (2)] stream socket type is not implied.
> > > +\item[VIRTIO_VSOCK_F_DGRAM (3)] datagram socket type is supported.
> > > \end{description}
> > > 
> > > \drivernormative{\subsubsection}{Feature bits}{Device Types / Socket Device / Feature bits}
> > > @@ -167,17 +168,22 @@ \subsubsection{Addressing}\label{sec:Device Types / Socket Device / Device Opera
> > > consists of a (cid, port number) tuple. The header fields used for this are
> > > \field{src_cid}, \field{src_port}, \field{dst_cid}, and \field{dst_port}.
> > > 
> > > -Currently stream and seqpacket sockets are supported. \field{type} is 1 (VIRTIO_VSOCK_TYPE_STREAM)
> > > -for stream socket types, and 2 (VIRTIO_VSOCK_TYPE_SEQPACKET) for seqpacket socket types.
> > > +
> > > +Currently stream, seqpacket, and datagram sockets are supported. \field{type} is
> > > +1 (VIRTIO_VSOCK_TYPE_STREAM) for stream socket types, 2 (VIRTIO_VSOCK_TYPE_SEQPACKET) for
> > > +seqpacket socket types, and 3 (VIRTIO_VSOCK_TYPE_DGRAM) for datagram socket types.
> > > 
> > > \begin{lstlisting}
> > > #define VIRTIO_VSOCK_TYPE_STREAM    1
> > > #define VIRTIO_VSOCK_TYPE_SEQPACKET 2
> > > +#define VIRTIO_VSOCK_TYPE_DGRAM     3
> > > \end{lstlisting}
> > > 
> > > Stream sockets provide in-order, guaranteed, connection-oriented delivery
> > > without message boundaries. Seqpacket sockets provide in-order, guaranteed,
> > > -connection-oriented delivery with message and record boundaries.
> > > +connection-oriented delivery with message and record boundaries. Datagram
> > > +sockets provide connection-less, best-effort delivery of messages, with no
> > > +order or reliability guarantees.
> > > 
> > > \subsubsection{Buffer Space Management}\label{sec:Device Types / Socket Device / Device Operation / Buffer Space Management}
> > > \field{buf_alloc} and \field{fwd_cnt} are used for buffer space management of
> > > @@ -203,16 +209,19 @@ \subsubsection{Buffer Space Management}\label{sec:Device Types / Socket Device /
> > > previously receiving a VIRTIO_VSOCK_OP_CREDIT_REQUEST packet. This allows
> > > communicating updates any time a change in buffer space occurs.
> > > 
> > > +\field{buf_alloc} and \field{fwd_cnt} are reserved for future use by datagram
> > > +sockets. These fields are not used for datagram buffer space management.
> > > +
> > > \drivernormative{\paragraph}{Device Operation: Buffer Space Management}{Device Types / Socket Device / Device Operation / Buffer Space Management}
> > > -VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has
> > > -sufficient free buffer space for the payload.
> > > +For stream and seqpacket flows, VIRTIO_VSOCK_OP_RW data packets MUST only be
> > > +transmitted when the peer has sufficient free buffer space for the payload.
> > > 
> > > All packets associated with a stream flow MUST contain valid information in
> > > \field{buf_alloc} and \field{fwd_cnt} fields.
> > > 
> > > \devicenormative{\paragraph}{Device Operation: Buffer Space
> > > Management}{Device Types / Socket Device / Device Operation / Buffer
> > > Space Management}
> > > -VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has
> > > -sufficient free buffer space for the payload.
> > > +For stream and seqpacket flows, VIRTIO_VSOCK_OP_RW data packets MUST only be
> > > +transmitted when the peer has sufficient free buffer space for the payload.
> > > 
> > > All packets associated with a stream flow MUST contain valid information in
> > > \field{buf_alloc} and \field{fwd_cnt} fields.
> > > @@ -299,6 +308,78 @@ \subsubsection{Seqpacket Sockets}\label{sec:Device Types / Socket Device / Devic
> > > #define VIRTIO_VSOCK_SEQ_EOR (1 << 1)
> > > \end{lstlisting}
> > > 
> > > +\subsubsection{Datagram Sockets}\label{sec:Device Types / Socket Device / Device Operation / Datagram Sockets}
> > > +
> > > +\drivernormative{\paragraph}{Device Operation: Packet Fragmentation}{Device Types / Socket Device / Datagram Sockets / Fragmentation}
> > > +
> > > +Drivers MAY disassemble packets into smaller fragments. If drivers fragment a
> > > +packet, they MUST follow the fragmentation rules described in section
> > > +\ref{sec:Device Types / Socket Device / Device Operation / Datagram Sockets / Fragmentation}.
> > > +
> > > +Drivers MUST support assembly of received packet fragments according to the
> > > +fragmentation rules described in section
> > > +\ref{sec:Device Types / Socket Device / Device Operation / Datagram
> > > Sockets / Fragmentation}.
> > > +
> > > +\devicenormative{\paragraph}{Device Operation: Packet Fragmentation}{Device Types / Socket Device / Datagram Sockets / Fragmentation}
> > > +
> > > +Devices MAY disassemble packets into smaller fragments. If devices fragment a
> > > +packet, they MUST follow the fragmentation rules described in section
> > > +\ref{sec:Device Types / Socket Device / Device Operation / Datagram Sockets / Fragmentation}.
> > > +
> > > +Devices MUST support assembly of received packet fragments according to the
> > > +fragmentation rules described in section
> > > +\ref{sec:Device Types / Socket Device / Device Operation / Datagram Sockets / Fragmentation}.
> > > +
> > > +\drivernormative{\paragraph}{Device Operation: Packet Dropping}{Device Types / Socket Device / Datagram Sockets / Dropping}
> > > +
> > > +The driver MAY drop received packets with no notification to the device. This
> > > +can happen if, for example, there are insufficient resources or no socket
> > > +exists for the destination address.
> > > +
> > > +\devicenormative{\paragraph}{Device Operation: Packet Dropping}{Device Types / Socket Device / Datagram Sockets / Dropping}
> > > +
> > > +The device MAY drop received packets with no notification to the driver. This
> > > +can happen if, for example, there are insufficient resources or no socket
> > > +exists for the destination address.
> > 
> > Should we provide some notification if the socket does not exist at the
> > destination?
> > 
> 
> Yes, I think so. I believe a start/stop congestion notification scheme
> actually manages this issue well.
> 
> For example, the source begins sending packets to a destination.
> 
> The destination finds that there exists no socket for that destination
> address. The destination sends a "stop" notification to the source that
> contains the address in question. Meanwhile, packets are still coming in
> but they are being dropped.
> 
> The source receives the "stop" notification with the address and adds it
> to the "stopped destinations" list. Any new packet destination address
> will be compared to that list. Any matches will be dropped before
> sending (and ideally, before wasting time allocating the packet).
> 
> Only when a socket is bound to an address that matches a "stopped"
> address does the destination send a "start" notification to any source
> it has previusly sent a "stop" notification to.
> 
> Once "start" is received, flow may resume as normal.

Again, dropping as control flow tactic has a bunch of problems.
Blocking senders sounds more reasonable.


> > > +
> > > +\paragraph{Datagram Fragmentation}\label{sec:Device Types / Socket Device / Device Operation / Datagram Sockets / Fragmentation}
> > > +
> > > +\field{flags} may have the following bit set:
> > > +
> > > +\begin{lstlisting}
> > > +#define VIRTIO_VSOCK_DGRAM_EOM (1 << 0)
> > > +\end{lstlisting}
> > > +
> > > +When the header \field{flags} field bit VIRTIO_VSOCK_DGRAM_EOM (bit 0) is set,
> > > +it indicates that the current payload is the end of a datagram fragment
> > > OR that
> > > +the current payload is an entire datagram packet.
> > 
> > In the destination, if we discard some fragments, then could we
> > reconstruct a different datagram from the one sent?
> > 
> > Is that anything acceptable?
> > 
> 
> Dropping fragments should be explicitly disallowed. The sender is
> explicitly disallowed from NOT placing fragments on the virtqueue, but I
> see that I am missing the piece that states that they may not be dropped
> on the receive side.
> 
> I think it is worth mentioning that implicit in this spec is that
> socket-to-socket dgram communication is unreliable, but device-to-driver
> (and vice versa) is still reliable. That is, we can rely at least on the
> virtqueues to work... and if they fail then the device/driver can simply
> requeue (think send_pkt_queue in Linux)... so there is some reliability
> at the lowest layer.

Well you have this weird timeout thing for some reason.

> > Thanks,
> > Stefano
> > 
> 
> Thanks!
> Bobby

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] virtio-vsock: add VIRTIO_VSOCK_F_DGRAM feature bit
       [not found]       ` <ZPK7/84QeSHtYGBQ@bullseye>
@ 2023-09-02  8:41         ` Michael S. Tsirkin
       [not found]           ` <ZPL/Ss6sXmfU/0Mg@bullseye>
  0 siblings, 1 reply; 7+ messages in thread
From: Michael S. Tsirkin @ 2023-09-02  8:41 UTC (permalink / raw)
  To: Bobby Eshleman
  Cc: virtio-dev, cong.wang, Bobby Eshleman, jiang.wang, cohuck,
	virtualization, stefanha, virtio-comment, arseny.krasnov

On Sat, Sep 02, 2023 at 04:37:19AM +0000, Bobby Eshleman wrote:
> On Fri, Sep 01, 2023 at 09:31:03AM -0400, Michael S. Tsirkin wrote:
> > On Wed, Aug 30, 2023 at 12:43:03AM +0000, Bobby Eshleman wrote:
> > > On Tue, Aug 29, 2023 at 06:21:35PM -0400, Michael S. Tsirkin wrote:
> > > > On Tue, Aug 29, 2023 at 09:29:45PM +0000, Bobby Eshleman wrote:
> > > > > This adds support for datagrams to the virtio-vsock device.
> > > > > 
> > > > > virtio-vsock already supports stream and seqpacket types. The existing
> > > > > message types and header fields are extended to support datagrams.
> > > > > Semantic differences between the flow types are stated, as well as any
> > > > > additional requirements for devices and drivers implementing this
> > > > > feature.
> > > > > 
> > > > > Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
> > > > > ---
> > > > >  device-types/vsock/description.tex | 95 +++++++++++++++++++++++++++---
> > > > >  1 file changed, 88 insertions(+), 7 deletions(-)
> > > > > 
> > > > > diff --git a/device-types/vsock/description.tex b/device-types/vsock/description.tex
> > > > > index 7d91d159872f..638dca8e5da1 100644
> > > > > --- a/device-types/vsock/description.tex
> > > > > +++ b/device-types/vsock/description.tex
> > > > > @@ -20,6 +20,7 @@ \subsection{Feature bits}\label{sec:Device Types / Socket Device / Feature bits}
> > > > >  \item[VIRTIO_VSOCK_F_STREAM (0)] stream socket type is supported.
> > > > >  \item[VIRTIO_VSOCK_F_SEQPACKET (1)] seqpacket socket type is supported.
> > > > >  \item[VIRTIO_VSOCK_F_NO_IMPLIED_STREAM (2)] stream socket type is not implied.
> > > > > +\item[VIRTIO_VSOCK_F_DGRAM (3)] datagram socket type is supported.
> > > > >  \end{description}
> > > > >  
> > > > >  \drivernormative{\subsubsection}{Feature bits}{Device Types / Socket Device / Feature bits}
> > > > > @@ -167,17 +168,22 @@ \subsubsection{Addressing}\label{sec:Device Types / Socket Device / Device Opera
> > > > >  consists of a (cid, port number) tuple. The header fields used for this are
> > > > >  \field{src_cid}, \field{src_port}, \field{dst_cid}, and \field{dst_port}.
> > > > >  
> > > > > -Currently stream and seqpacket sockets are supported. \field{type} is 1 (VIRTIO_VSOCK_TYPE_STREAM)
> > > > > -for stream socket types, and 2 (VIRTIO_VSOCK_TYPE_SEQPACKET) for seqpacket socket types.
> > > > > +
> > > > > +Currently stream, seqpacket, and datagram sockets are supported. \field{type} is
> > > > > +1 (VIRTIO_VSOCK_TYPE_STREAM) for stream socket types, 2 (VIRTIO_VSOCK_TYPE_SEQPACKET) for
> > > > > +seqpacket socket types, and 3 (VIRTIO_VSOCK_TYPE_DGRAM) for datagram socket types.
> > > > >  
> > > > >  \begin{lstlisting}
> > > > >  #define VIRTIO_VSOCK_TYPE_STREAM    1
> > > > >  #define VIRTIO_VSOCK_TYPE_SEQPACKET 2
> > > > > +#define VIRTIO_VSOCK_TYPE_DGRAM     3
> > > > >  \end{lstlisting}
> > > > >  
> > > > >  Stream sockets provide in-order, guaranteed, connection-oriented delivery
> > > > >  without message boundaries. Seqpacket sockets provide in-order, guaranteed,
> > > > > -connection-oriented delivery with message and record boundaries.
> > > > > +connection-oriented delivery with message and record boundaries. Datagram
> > > > > +sockets provide connection-less, best-effort delivery of messages, with no
> > > > > +order or reliability guarantees.
> > > > >  
> > > > >  \subsubsection{Buffer Space Management}\label{sec:Device Types / Socket Device / Device Operation / Buffer Space Management}
> > > > >  \field{buf_alloc} and \field{fwd_cnt} are used for buffer space management of
> > > > > @@ -203,16 +209,19 @@ \subsubsection{Buffer Space Management}\label{sec:Device Types / Socket Device /
> > > > >  previously receiving a VIRTIO_VSOCK_OP_CREDIT_REQUEST packet. This allows
> > > > >  communicating updates any time a change in buffer space occurs.
> > > > >  
> > > > > +\field{buf_alloc} and \field{fwd_cnt} are reserved for future use by datagram
> > > > > +sockets. These fields are not used for datagram buffer space management.
> > > > 
> > > > no limits on datagram size?
> > > > 
> > > 
> > > In the Linux proof-of-concept, it is 64KB. I can add that here too.
> > 
> > or device driven maybe ...
> > 
> 
> Ah yes, I think Stefano was suggesting something like Laura's proposal:
> https://lists.oasis-open.org/archives/virtio-comment/202206/msg00093.html
> 
> > > > also with no flow control at all there's a performance problem:
> > > > imagine queue gets full. now buffers begin to be dropped.
> > > > problem is, dropping is faster than delivering to application.
> > > > so now application sees its packets consumed super quickly
> > > > and starts sending even faster.
> > > > not good for performance.
> > > > 
> > > > yes datagram expects some way to drop packets but just disabling flow
> > > > control completely is not the right thing to do I think.
> > > > 
> > > 
> > > On the LKML I discussed using congestion notification as a way to handle
> > > this situation, but deferred it to a future feature bit. I can build
> > > it in from the beginning though.
> > 
> > as in messages to stop/start transmission? might work.
> > 
> 
> Yes, say for example that a flow is sending a large number of packets
> and the destination socket's receive queue becomes full. The destination
> sends a "stop" message with the destination address. More packets are
> likely to continue trailing in, and they may be dropped. The source
> device or driver will use the destination address to throttle further
> attempts of sockets to transmit to that destination, and the flow will
> stop.
> 
> After the socket's receive queue is no longer full, it may notify the
> device/driver which can then send a "start" message with the destination
> address to any previously "stopped" sources. The sources can then
> reenable flows to that destination.

Ah so control is actually per socket. Another idea is that does not have
to be completely full - we can start suppressing a bit before it's full
to reduce packet drops.

> 
> The other alternative I evaluated was using a single "stop for some
> time" message with an exponential or fibonacci backoff with a ceiling,
> but the downside is that a congested queue could block the "stop"
> message from reaching the source and eventually the source would become
> unthrottled even though the destination is *trying* to throttle it. I
> think this is bad for some pretty abusable cases, such as creating a
> socket that has no receiver. A stop/start message pair doesn't have this
> issue.
> > > > 
> > > > > +
> > > > >  \drivernormative{\paragraph}{Device Operation: Buffer Space Management}{Device Types / Socket Device / Device Operation / Buffer Space Management}
> > > > > -VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has
> > > > > -sufficient free buffer space for the payload.
> > > > > +For stream and seqpacket flows, VIRTIO_VSOCK_OP_RW data packets MUST only be
> > > > > +transmitted when the peer has sufficient free buffer space for the payload.
> > > > >  
> > > > >  All packets associated with a stream flow MUST contain valid information in
> > > > >  \field{buf_alloc} and \field{fwd_cnt} fields.
> > > > >  
> > > > >  \devicenormative{\paragraph}{Device Operation: Buffer Space Management}{Device Types / Socket Device / Device Operation / Buffer Space Management}
> > > > > -VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has
> > > > > -sufficient free buffer space for the payload.
> > > > > +For stream and seqpacket flows, VIRTIO_VSOCK_OP_RW data packets MUST only be
> > > > > +transmitted when the peer has sufficient free buffer space for the payload.
> > > > >  
> > > > >  All packets associated with a stream flow MUST contain valid information in
> > > > >  \field{buf_alloc} and \field{fwd_cnt} fields.
> > > > > @@ -299,6 +308,78 @@ \subsubsection{Seqpacket Sockets}\label{sec:Device Types / Socket Device / Devic
> > > > >  #define VIRTIO_VSOCK_SEQ_EOR (1 << 1)
> > > > >  \end{lstlisting}
> > > > >  
> > > > > +\subsubsection{Datagram Sockets}\label{sec:Device Types / Socket Device / Device Operation / Datagram Sockets}
> > > > > +
> > > > > +\drivernormative{\paragraph}{Device Operation: Packet Fragmentation}{Device Types / Socket Device / Datagram Sockets / Fragmentation}
> > > > > +
> > > > > +Drivers MAY disassemble packets into smaller fragments. If drivers fragment a
> > > > > +packet, they MUST follow the fragmentation rules described in section
> > > > > +\ref{sec:Device Types / Socket Device / Device Operation / Datagram Sockets / Fragmentation}.
> > > > > +
> > > > > +Drivers MUST support assembly of received packet fragments according to the
> > > > > +fragmentation rules described in section
> > > > > +\ref{sec:Device Types / Socket Device / Device Operation / Datagram Sockets / Fragmentation}.
> > > > > +
> > > > > +\devicenormative{\paragraph}{Device Operation: Packet Fragmentation}{Device Types / Socket Device / Datagram Sockets / Fragmentation}
> > > > > +
> > > > > +Devices MAY disassemble packets into smaller fragments. If devices fragment a
> > > > > +packet, they MUST follow the fragmentation rules described in section
> > > > > +\ref{sec:Device Types / Socket Device / Device Operation / Datagram Sockets / Fragmentation}.
> > > > > +
> > > > > +Devices MUST support assembly of received packet fragments according to the
> > > > > +fragmentation rules described in section
> > > > > +\ref{sec:Device Types / Socket Device / Device Operation / Datagram Sockets / Fragmentation}.
> > > > > +
> > > > > +\drivernormative{\paragraph}{Device Operation: Packet Dropping}{Device Types / Socket Device / Datagram Sockets / Dropping}
> > > > > +
> > > > > +The driver MAY drop received packets with no notification to the device. This
> > > > > +can happen if, for example, there are insufficient resources or no socket
> > > > > +exists for the destination address.
> > > > > +
> > > > > +\devicenormative{\paragraph}{Device Operation: Packet Dropping}{Device Types / Socket Device / Datagram Sockets / Dropping}
> > > > > +
> > > > > +The device MAY drop received packets with no notification to the driver. This
> > > > > +can happen if, for example, there are insufficient resources or no socket
> > > > > +exists for the destination address.
> > > > > +
> > > > > +\paragraph{Datagram Fragmentation}\label{sec:Device Types / Socket Device / Device Operation / Datagram Sockets / Fragmentation}
> > > > > +
> > > > > +\field{flags} may have the following bit set:
> > > > > +
> > > > > +\begin{lstlisting}
> > > > > +#define VIRTIO_VSOCK_DGRAM_EOM (1 << 0)
> > > > > +\end{lstlisting}
> > > > > +
> > > > > +When the header \field{flags} field bit VIRTIO_VSOCK_DGRAM_EOM (bit 0) is set,
> > > > > +it indicates that the current payload is the end of a datagram fragment OR that
> > > > > +the current payload is an entire datagram packet.
> > > > > +
> > > > > +Datagram fragmentation is subject to the following rules:
> > > > > +
> > > > > +The fragments for a datagram packet MUST be added to the virtqueue in
> > > > > +sequential order.
> > > > > +
> > > > > +If a packet is not a fragment, then the VIRTIO_VSOCK_DGRAM_EOM bit (bit 0) of
> > > > > +\field{flags} MUST be set.
> > > > > +
> > > > > +If a packet is the last fragment of a fragment sequence, then the
> > > > > +VIRTIO_VSOCK_DGRAM_EOM bit (bit 0) of \field{flags} MUST be set.
> > > > > +
> > > > > +If a packet is any fragment except the last fragment of a fragment sequence,
> > > > > +then the VIRTIO_VSOCK_DGRAM_EOM bit (bit 0) of \field{flags} MUST NOT be set.
> > > > > +
> > > > > +If a driver or device has already placed a fragment of a packet on the
> > > > > +virtqueue, it MUST add the remaining fragments of the packet to the virtqueue
> > > > > +before adding any additional packets or fragments to the virtqueue.
> > > > > +
> > > > > +If fragments for a packet are only partially received after an
> > > > > +implementation-specific amount of time, then the destination device or driver
> > > > > +MAY drop the fragments.
> > > > 
> > > > and how to detect this partial situation?
> > > > 
> > > > and if not then what happens?
> > > 
> > > All packets with EOM=0 should be added into a per-flow fragment list.
> > > 
> > > Once a packet with EOM=1 is received, they can all be concatenated and
> > > delivered to the destination socket.
> > > 
> > > If after X time no packet with EOM=1 is received, then we have detected
> > > this partial situation. The "then what happens part" is that the
> > > fragment list can be dropped and freed. Because the peer must send the
> > > remaining fragments, they will also be dropped and freed until EOM=1 is
> > > received.
> > 
> > aha. worth mentioning.
> > 
> > also all the must/may/should really belong in conformance
> > sections.
> > 
> 
> Got it, will do.
> 
> > 
> > > > 
> > > > > +Each buffer containing a fragment MUST begin with a valid struct
> > > > > +virtio_vsock_hdr. \field{len} MUST equal the length of the fragment payload
> > > > > +only.
> > > > 
> > > > what does this mean exactly?
> > > > 
> > > 
> > > It means that len only accounts for the fragment and not the whole
> > > (fragmented) packet. The second sentence means that the virtqueue buffer
> > > starts with the header.
> > > 
> > > For virtio-net mergable RX buffers the header is only in the first
> > > buffer and the length field accounts for the entire fragmented packet
> > > (that spans multiple bufers), so I suspected the specification was
> > > needed here too.
> > > 
> > > I'm happy to omit it.
> > > > >  \subsubsection{Device Events}\label{sec:Device Types / Socket Device / Device Operation / Device Events}
> > > > >  
> > > > >  Certain events are communicated by the device to the driver using the event
> > > > > -- 
> > > > > 2.20.1
> > > > 
> > > > _______________________________________________
> > > > Virtualization mailing list
> > > > Virtualization@lists.linux-foundation.org
> > > > https://lists.linuxfoundation.org/mailman/listinfo/virtualization
> > 
> > _______________________________________________
> > Virtualization mailing list
> > Virtualization@lists.linux-foundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/virtualization

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] virtio-vsock: add VIRTIO_VSOCK_F_DGRAM feature bit
       [not found]           ` <ZPL/Ss6sXmfU/0Mg@bullseye>
@ 2023-09-06  8:17             ` Michael S. Tsirkin
  0 siblings, 0 replies; 7+ messages in thread
From: Michael S. Tsirkin @ 2023-09-06  8:17 UTC (permalink / raw)
  To: Bobby Eshleman
  Cc: virtio-dev, cong.wang, Bobby Eshleman, jiang.wang, cohuck,
	virtualization, stefanha, virtio-comment, arseny.krasnov

On Sat, Sep 02, 2023 at 09:24:26AM +0000, Bobby Eshleman wrote:
> On Sat, Sep 02, 2023 at 04:41:28AM -0400, Michael S. Tsirkin wrote:
> > On Sat, Sep 02, 2023 at 04:37:19AM +0000, Bobby Eshleman wrote:
> > > On Fri, Sep 01, 2023 at 09:31:03AM -0400, Michael S. Tsirkin wrote:
> > > > On Wed, Aug 30, 2023 at 12:43:03AM +0000, Bobby Eshleman wrote:
> > > > > On Tue, Aug 29, 2023 at 06:21:35PM -0400, Michael S. Tsirkin wrote:
> > > > > > On Tue, Aug 29, 2023 at 09:29:45PM +0000, Bobby Eshleman wrote:
> > > > > > > This adds support for datagrams to the virtio-vsock device.
> > > > > > > 
> > > > > > > virtio-vsock already supports stream and seqpacket types. The existing
> > > > > > > message types and header fields are extended to support datagrams.
> > > > > > > Semantic differences between the flow types are stated, as well as any
> > > > > > > additional requirements for devices and drivers implementing this
> > > > > > > feature.
> > > > > > > 
> > > > > > > Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
> > > > > > > ---
> > > > > > >  device-types/vsock/description.tex | 95 +++++++++++++++++++++++++++---
> > > > > > >  1 file changed, 88 insertions(+), 7 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/device-types/vsock/description.tex b/device-types/vsock/description.tex
> > > > > > > index 7d91d159872f..638dca8e5da1 100644
> > > > > > > --- a/device-types/vsock/description.tex
> > > > > > > +++ b/device-types/vsock/description.tex
> > > > > > > @@ -20,6 +20,7 @@ \subsection{Feature bits}\label{sec:Device Types / Socket Device / Feature bits}
> > > > > > >  \item[VIRTIO_VSOCK_F_STREAM (0)] stream socket type is supported.
> > > > > > >  \item[VIRTIO_VSOCK_F_SEQPACKET (1)] seqpacket socket type is supported.
> > > > > > >  \item[VIRTIO_VSOCK_F_NO_IMPLIED_STREAM (2)] stream socket type is not implied.
> > > > > > > +\item[VIRTIO_VSOCK_F_DGRAM (3)] datagram socket type is supported.
> > > > > > >  \end{description}
> > > > > > >  
> > > > > > >  \drivernormative{\subsubsection}{Feature bits}{Device Types / Socket Device / Feature bits}
> > > > > > > @@ -167,17 +168,22 @@ \subsubsection{Addressing}\label{sec:Device Types / Socket Device / Device Opera
> > > > > > >  consists of a (cid, port number) tuple. The header fields used for this are
> > > > > > >  \field{src_cid}, \field{src_port}, \field{dst_cid}, and \field{dst_port}.
> > > > > > >  
> > > > > > > -Currently stream and seqpacket sockets are supported. \field{type} is 1 (VIRTIO_VSOCK_TYPE_STREAM)
> > > > > > > -for stream socket types, and 2 (VIRTIO_VSOCK_TYPE_SEQPACKET) for seqpacket socket types.
> > > > > > > +
> > > > > > > +Currently stream, seqpacket, and datagram sockets are supported. \field{type} is
> > > > > > > +1 (VIRTIO_VSOCK_TYPE_STREAM) for stream socket types, 2 (VIRTIO_VSOCK_TYPE_SEQPACKET) for
> > > > > > > +seqpacket socket types, and 3 (VIRTIO_VSOCK_TYPE_DGRAM) for datagram socket types.
> > > > > > >  
> > > > > > >  \begin{lstlisting}
> > > > > > >  #define VIRTIO_VSOCK_TYPE_STREAM    1
> > > > > > >  #define VIRTIO_VSOCK_TYPE_SEQPACKET 2
> > > > > > > +#define VIRTIO_VSOCK_TYPE_DGRAM     3
> > > > > > >  \end{lstlisting}
> > > > > > >  
> > > > > > >  Stream sockets provide in-order, guaranteed, connection-oriented delivery
> > > > > > >  without message boundaries. Seqpacket sockets provide in-order, guaranteed,
> > > > > > > -connection-oriented delivery with message and record boundaries.
> > > > > > > +connection-oriented delivery with message and record boundaries. Datagram
> > > > > > > +sockets provide connection-less, best-effort delivery of messages, with no
> > > > > > > +order or reliability guarantees.
> > > > > > >  
> > > > > > >  \subsubsection{Buffer Space Management}\label{sec:Device Types / Socket Device / Device Operation / Buffer Space Management}
> > > > > > >  \field{buf_alloc} and \field{fwd_cnt} are used for buffer space management of
> > > > > > > @@ -203,16 +209,19 @@ \subsubsection{Buffer Space Management}\label{sec:Device Types / Socket Device /
> > > > > > >  previously receiving a VIRTIO_VSOCK_OP_CREDIT_REQUEST packet. This allows
> > > > > > >  communicating updates any time a change in buffer space occurs.
> > > > > > >  
> > > > > > > +\field{buf_alloc} and \field{fwd_cnt} are reserved for future use by datagram
> > > > > > > +sockets. These fields are not used for datagram buffer space management.
> > > > > > 
> > > > > > no limits on datagram size?
> > > > > > 
> > > > > 
> > > > > In the Linux proof-of-concept, it is 64KB. I can add that here too.
> > > > 
> > > > or device driven maybe ...
> > > > 
> > > 
> > > Ah yes, I think Stefano was suggesting something like Laura's proposal:
> > > https://lists.oasis-open.org/archives/virtio-comment/202206/msg00093.html
> > > 
> > > > > > also with no flow control at all there's a performance problem:
> > > > > > imagine queue gets full. now buffers begin to be dropped.
> > > > > > problem is, dropping is faster than delivering to application.
> > > > > > so now application sees its packets consumed super quickly
> > > > > > and starts sending even faster.
> > > > > > not good for performance.
> > > > > > 
> > > > > > yes datagram expects some way to drop packets but just disabling flow
> > > > > > control completely is not the right thing to do I think.
> > > > > > 
> > > > > 
> > > > > On the LKML I discussed using congestion notification as a way to handle
> > > > > this situation, but deferred it to a future feature bit. I can build
> > > > > it in from the beginning though.
> > > > 
> > > > as in messages to stop/start transmission? might work.
> > > > 
> > > 
> > > Yes, say for example that a flow is sending a large number of packets
> > > and the destination socket's receive queue becomes full. The destination
> > > sends a "stop" message with the destination address. More packets are
> > > likely to continue trailing in, and they may be dropped. The source
> > > device or driver will use the destination address to throttle further
> > > attempts of sockets to transmit to that destination, and the flow will
> > > stop.
> > > 
> > > After the socket's receive queue is no longer full, it may notify the
> > > device/driver which can then send a "start" message with the destination
> > > address to any previously "stopped" sources. The sources can then
> > > reenable flows to that destination.
> > 
> > Ah so control is actually per socket. Another idea is that does not have
> > to be completely full - we can start suppressing a bit before it's full
> > to reduce packet drops.
> > 
> 
> Totally, yes that makes sense.


I don't know how we prevent blocking traffic to unrelated
destinations in this case though.  Do we have to?

> > > 
> > > The other alternative I evaluated was using a single "stop for some
> > > time" message with an exponential or fibonacci backoff with a ceiling,
> > > but the downside is that a congested queue could block the "stop"
> > > message from reaching the source and eventually the source would become
> > > unthrottled even though the destination is *trying* to throttle it. I
> > > think this is bad for some pretty abusable cases, such as creating a
> > > socket that has no receiver. A stop/start message pair doesn't have this
> > > issue.
> > > > > > 
> > > > > > > +
> > > > > > >  \drivernormative{\paragraph}{Device Operation: Buffer Space Management}{Device Types / Socket Device / Device Operation / Buffer Space Management}
> > > > > > > -VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has
> > > > > > > -sufficient free buffer space for the payload.
> > > > > > > +For stream and seqpacket flows, VIRTIO_VSOCK_OP_RW data packets MUST only be
> > > > > > > +transmitted when the peer has sufficient free buffer space for the payload.
> > > > > > >  
> > > > > > >  All packets associated with a stream flow MUST contain valid information in
> > > > > > >  \field{buf_alloc} and \field{fwd_cnt} fields.
> > > > > > >  
> > > > > > >  \devicenormative{\paragraph}{Device Operation: Buffer Space Management}{Device Types / Socket Device / Device Operation / Buffer Space Management}
> > > > > > > -VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has
> > > > > > > -sufficient free buffer space for the payload.
> > > > > > > +For stream and seqpacket flows, VIRTIO_VSOCK_OP_RW data packets MUST only be
> > > > > > > +transmitted when the peer has sufficient free buffer space for the payload.
> > > > > > >  
> > > > > > >  All packets associated with a stream flow MUST contain valid information in
> > > > > > >  \field{buf_alloc} and \field{fwd_cnt} fields.
> > > > > > > @@ -299,6 +308,78 @@ \subsubsection{Seqpacket Sockets}\label{sec:Device Types / Socket Device / Devic
> > > > > > >  #define VIRTIO_VSOCK_SEQ_EOR (1 << 1)
> > > > > > >  \end{lstlisting}
> > > > > > >  
> > > > > > > +\subsubsection{Datagram Sockets}\label{sec:Device Types / Socket Device / Device Operation / Datagram Sockets}
> > > > > > > +
> > > > > > > +\drivernormative{\paragraph}{Device Operation: Packet Fragmentation}{Device Types / Socket Device / Datagram Sockets / Fragmentation}
> > > > > > > +
> > > > > > > +Drivers MAY disassemble packets into smaller fragments. If drivers fragment a
> > > > > > > +packet, they MUST follow the fragmentation rules described in section
> > > > > > > +\ref{sec:Device Types / Socket Device / Device Operation / Datagram Sockets / Fragmentation}.
> > > > > > > +
> > > > > > > +Drivers MUST support assembly of received packet fragments according to the
> > > > > > > +fragmentation rules described in section
> > > > > > > +\ref{sec:Device Types / Socket Device / Device Operation / Datagram Sockets / Fragmentation}.
> > > > > > > +
> > > > > > > +\devicenormative{\paragraph}{Device Operation: Packet Fragmentation}{Device Types / Socket Device / Datagram Sockets / Fragmentation}
> > > > > > > +
> > > > > > > +Devices MAY disassemble packets into smaller fragments. If devices fragment a
> > > > > > > +packet, they MUST follow the fragmentation rules described in section
> > > > > > > +\ref{sec:Device Types / Socket Device / Device Operation / Datagram Sockets / Fragmentation}.
> > > > > > > +
> > > > > > > +Devices MUST support assembly of received packet fragments according to the
> > > > > > > +fragmentation rules described in section
> > > > > > > +\ref{sec:Device Types / Socket Device / Device Operation / Datagram Sockets / Fragmentation}.
> > > > > > > +
> > > > > > > +\drivernormative{\paragraph}{Device Operation: Packet Dropping}{Device Types / Socket Device / Datagram Sockets / Dropping}
> > > > > > > +
> > > > > > > +The driver MAY drop received packets with no notification to the device. This
> > > > > > > +can happen if, for example, there are insufficient resources or no socket
> > > > > > > +exists for the destination address.
> > > > > > > +
> > > > > > > +\devicenormative{\paragraph}{Device Operation: Packet Dropping}{Device Types / Socket Device / Datagram Sockets / Dropping}
> > > > > > > +
> > > > > > > +The device MAY drop received packets with no notification to the driver. This
> > > > > > > +can happen if, for example, there are insufficient resources or no socket
> > > > > > > +exists for the destination address.
> > > > > > > +
> > > > > > > +\paragraph{Datagram Fragmentation}\label{sec:Device Types / Socket Device / Device Operation / Datagram Sockets / Fragmentation}
> > > > > > > +
> > > > > > > +\field{flags} may have the following bit set:
> > > > > > > +
> > > > > > > +\begin{lstlisting}
> > > > > > > +#define VIRTIO_VSOCK_DGRAM_EOM (1 << 0)
> > > > > > > +\end{lstlisting}
> > > > > > > +
> > > > > > > +When the header \field{flags} field bit VIRTIO_VSOCK_DGRAM_EOM (bit 0) is set,
> > > > > > > +it indicates that the current payload is the end of a datagram fragment OR that
> > > > > > > +the current payload is an entire datagram packet.
> > > > > > > +
> > > > > > > +Datagram fragmentation is subject to the following rules:
> > > > > > > +
> > > > > > > +The fragments for a datagram packet MUST be added to the virtqueue in
> > > > > > > +sequential order.
> > > > > > > +
> > > > > > > +If a packet is not a fragment, then the VIRTIO_VSOCK_DGRAM_EOM bit (bit 0) of
> > > > > > > +\field{flags} MUST be set.
> > > > > > > +
> > > > > > > +If a packet is the last fragment of a fragment sequence, then the
> > > > > > > +VIRTIO_VSOCK_DGRAM_EOM bit (bit 0) of \field{flags} MUST be set.
> > > > > > > +
> > > > > > > +If a packet is any fragment except the last fragment of a fragment sequence,
> > > > > > > +then the VIRTIO_VSOCK_DGRAM_EOM bit (bit 0) of \field{flags} MUST NOT be set.
> > > > > > > +
> > > > > > > +If a driver or device has already placed a fragment of a packet on the
> > > > > > > +virtqueue, it MUST add the remaining fragments of the packet to the virtqueue
> > > > > > > +before adding any additional packets or fragments to the virtqueue.
> > > > > > > +
> > > > > > > +If fragments for a packet are only partially received after an
> > > > > > > +implementation-specific amount of time, then the destination device or driver
> > > > > > > +MAY drop the fragments.
> > > > > > 
> > > > > > and how to detect this partial situation?
> > > > > > 
> > > > > > and if not then what happens?
> > > > > 
> > > > > All packets with EOM=0 should be added into a per-flow fragment list.
> > > > > 
> > > > > Once a packet with EOM=1 is received, they can all be concatenated and
> > > > > delivered to the destination socket.
> > > > > 
> > > > > If after X time no packet with EOM=1 is received, then we have detected
> > > > > this partial situation. The "then what happens part" is that the
> > > > > fragment list can be dropped and freed. Because the peer must send the
> > > > > remaining fragments, they will also be dropped and freed until EOM=1 is
> > > > > received.
> > > > 
> > > > aha. worth mentioning.
> > > > 
> > > > also all the must/may/should really belong in conformance
> > > > sections.
> > > > 
> > > 
> > > Got it, will do.
> > > 
> > > > 
> > > > > > 
> > > > > > > +Each buffer containing a fragment MUST begin with a valid struct
> > > > > > > +virtio_vsock_hdr. \field{len} MUST equal the length of the fragment payload
> > > > > > > +only.
> > > > > > 
> > > > > > what does this mean exactly?
> > > > > > 
> > > > > 
> > > > > It means that len only accounts for the fragment and not the whole
> > > > > (fragmented) packet. The second sentence means that the virtqueue buffer
> > > > > starts with the header.
> > > > > 
> > > > > For virtio-net mergable RX buffers the header is only in the first
> > > > > buffer and the length field accounts for the entire fragmented packet
> > > > > (that spans multiple bufers), so I suspected the specification was
> > > > > needed here too.
> > > > > 
> > > > > I'm happy to omit it.
> > > > > > >  \subsubsection{Device Events}\label{sec:Device Types / Socket Device / Device Operation / Device Events}
> > > > > > >  
> > > > > > >  Certain events are communicated by the device to the driver using the event
> > > > > > > -- 
> > > > > > > 2.20.1
> > > > > > 
> > > > > > _______________________________________________
> > > > > > Virtualization mailing list
> > > > > > Virtualization@lists.linux-foundation.org
> > > > > > https://lists.linuxfoundation.org/mailman/listinfo/virtualization
> > > > 
> > > > _______________________________________________
> > > > Virtualization mailing list
> > > > Virtualization@lists.linux-foundation.org
> > > > https://lists.linuxfoundation.org/mailman/listinfo/virtualization
> > 
> > _______________________________________________
> > Virtualization mailing list
> > Virtualization@lists.linux-foundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/virtualization

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] virtio-vsock: add VIRTIO_VSOCK_F_DGRAM feature bit
  2023-09-02  8:35     ` Michael S. Tsirkin
@ 2023-09-06 14:28       ` Stefano Garzarella
  0 siblings, 0 replies; 7+ messages in thread
From: Stefano Garzarella @ 2023-09-06 14:28 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Bobby Eshleman, cong.wang, Bobby Eshleman, jiang.wang, virtio-dev,
	cohuck, virtualization, stefanha, virtio-comment, arseny.krasnov

On Sat, Sep 02, 2023 at 04:35:25AM -0400, Michael S. Tsirkin wrote:
>On Sat, Sep 02, 2023 at 04:56:42AM +0000, Bobby Eshleman wrote:
>> On Fri, Sep 01, 2023 at 02:45:14PM +0200, Stefano Garzarella wrote:
>> > On Tue, Aug 29, 2023 at 09:29:45PM +0000, Bobby Eshleman wrote:
>> > > This adds support for datagrams to the virtio-vsock device.
>> > >
>> > > virtio-vsock already supports stream and seqpacket types. The existing
>> > > message types and header fields are extended to support datagrams.
>> > > Semantic differences between the flow types are stated, as well as any
>> > > additional requirements for devices and drivers implementing this
>> > > feature.
>> > >
>> > > Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
>> > > ---
>> > > device-types/vsock/description.tex | 95 +++++++++++++++++++++++++++---
>> > > 1 file changed, 88 insertions(+), 7 deletions(-)
>> > >
>> > > diff --git a/device-types/vsock/description.tex b/device-types/vsock/description.tex
>> > > index 7d91d159872f..638dca8e5da1 100644
>> > > --- a/device-types/vsock/description.tex
>> > > +++ b/device-types/vsock/description.tex
>> > > @@ -20,6 +20,7 @@ \subsection{Feature bits}\label{sec:Device Types / Socket Device / Feature bits}
>> > > \item[VIRTIO_VSOCK_F_STREAM (0)] stream socket type is supported.
>> > > \item[VIRTIO_VSOCK_F_SEQPACKET (1)] seqpacket socket type is supported.
>> > > \item[VIRTIO_VSOCK_F_NO_IMPLIED_STREAM (2)] stream socket type is not implied.
>> > > +\item[VIRTIO_VSOCK_F_DGRAM (3)] datagram socket type is supported.
>> > > \end{description}
>> > >
>> > > \drivernormative{\subsubsection}{Feature bits}{Device Types / Socket Device / Feature bits}
>> > > @@ -167,17 +168,22 @@ \subsubsection{Addressing}\label{sec:Device Types / Socket Device / Device Opera
>> > > consists of a (cid, port number) tuple. The header fields used for this are
>> > > \field{src_cid}, \field{src_port}, \field{dst_cid}, and \field{dst_port}.
>> > >
>> > > -Currently stream and seqpacket sockets are supported. \field{type} is 1 (VIRTIO_VSOCK_TYPE_STREAM)
>> > > -for stream socket types, and 2 (VIRTIO_VSOCK_TYPE_SEQPACKET) for seqpacket socket types.
>> > > +
>> > > +Currently stream, seqpacket, and datagram sockets are supported. \field{type} is
>> > > +1 (VIRTIO_VSOCK_TYPE_STREAM) for stream socket types, 2 (VIRTIO_VSOCK_TYPE_SEQPACKET) for
>> > > +seqpacket socket types, and 3 (VIRTIO_VSOCK_TYPE_DGRAM) for datagram socket types.
>> > >
>> > > \begin{lstlisting}
>> > > #define VIRTIO_VSOCK_TYPE_STREAM    1
>> > > #define VIRTIO_VSOCK_TYPE_SEQPACKET 2
>> > > +#define VIRTIO_VSOCK_TYPE_DGRAM     3
>> > > \end{lstlisting}
>> > >
>> > > Stream sockets provide in-order, guaranteed, connection-oriented delivery
>> > > without message boundaries. Seqpacket sockets provide in-order, guaranteed,
>> > > -connection-oriented delivery with message and record boundaries.
>> > > +connection-oriented delivery with message and record boundaries. Datagram
>> > > +sockets provide connection-less, best-effort delivery of messages, with no
>> > > +order or reliability guarantees.
>> > >
>> > > \subsubsection{Buffer Space Management}\label{sec:Device Types / Socket Device / Device Operation / Buffer Space Management}
>> > > \field{buf_alloc} and \field{fwd_cnt} are used for buffer space management of
>> > > @@ -203,16 +209,19 @@ \subsubsection{Buffer Space Management}\label{sec:Device Types / Socket Device /
>> > > previously receiving a VIRTIO_VSOCK_OP_CREDIT_REQUEST packet. This allows
>> > > communicating updates any time a change in buffer space occurs.
>> > >
>> > > +\field{buf_alloc} and \field{fwd_cnt} are reserved for future use by datagram
>> > > +sockets. These fields are not used for datagram buffer space management.
>> > > +
>> > > \drivernormative{\paragraph}{Device Operation: Buffer Space Management}{Device Types / Socket Device / Device Operation / Buffer Space Management}
>> > > -VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has
>> > > -sufficient free buffer space for the payload.
>> > > +For stream and seqpacket flows, VIRTIO_VSOCK_OP_RW data packets MUST only be
>> > > +transmitted when the peer has sufficient free buffer space for the payload.
>> > >
>> > > All packets associated with a stream flow MUST contain valid information in
>> > > \field{buf_alloc} and \field{fwd_cnt} fields.
>> > >
>> > > \devicenormative{\paragraph}{Device Operation: Buffer Space
>> > > Management}{Device Types / Socket Device / Device Operation / Buffer
>> > > Space Management}
>> > > -VIRTIO_VSOCK_OP_RW data packets MUST only be transmitted when the peer has
>> > > -sufficient free buffer space for the payload.
>> > > +For stream and seqpacket flows, VIRTIO_VSOCK_OP_RW data packets MUST only be
>> > > +transmitted when the peer has sufficient free buffer space for the payload.
>> > >
>> > > All packets associated with a stream flow MUST contain valid information in
>> > > \field{buf_alloc} and \field{fwd_cnt} fields.
>> > > @@ -299,6 +308,78 @@ \subsubsection{Seqpacket Sockets}\label{sec:Device Types / Socket Device / Devic
>> > > #define VIRTIO_VSOCK_SEQ_EOR (1 << 1)
>> > > \end{lstlisting}
>> > >
>> > > +\subsubsection{Datagram Sockets}\label{sec:Device Types / Socket Device / Device Operation / Datagram Sockets}
>> > > +
>> > > +\drivernormative{\paragraph}{Device Operation: Packet Fragmentation}{Device Types / Socket Device / Datagram Sockets / Fragmentation}
>> > > +
>> > > +Drivers MAY disassemble packets into smaller fragments. If drivers fragment a
>> > > +packet, they MUST follow the fragmentation rules described in section
>> > > +\ref{sec:Device Types / Socket Device / Device Operation / Datagram Sockets / Fragmentation}.
>> > > +
>> > > +Drivers MUST support assembly of received packet fragments according to the
>> > > +fragmentation rules described in section
>> > > +\ref{sec:Device Types / Socket Device / Device Operation / Datagram
>> > > Sockets / Fragmentation}.
>> > > +
>> > > +\devicenormative{\paragraph}{Device Operation: Packet Fragmentation}{Device Types / Socket Device / Datagram Sockets / Fragmentation}
>> > > +
>> > > +Devices MAY disassemble packets into smaller fragments. If devices fragment a
>> > > +packet, they MUST follow the fragmentation rules described in section
>> > > +\ref{sec:Device Types / Socket Device / Device Operation / Datagram Sockets / Fragmentation}.
>> > > +
>> > > +Devices MUST support assembly of received packet fragments according to the
>> > > +fragmentation rules described in section
>> > > +\ref{sec:Device Types / Socket Device / Device Operation / Datagram Sockets / Fragmentation}.
>> > > +
>> > > +\drivernormative{\paragraph}{Device Operation: Packet Dropping}{Device Types / Socket Device / Datagram Sockets / Dropping}
>> > > +
>> > > +The driver MAY drop received packets with no notification to the device. This
>> > > +can happen if, for example, there are insufficient resources or no socket
>> > > +exists for the destination address.
>> > > +
>> > > +\devicenormative{\paragraph}{Device Operation: Packet Dropping}{Device Types / Socket Device / Datagram Sockets / Dropping}
>> > > +
>> > > +The device MAY drop received packets with no notification to the driver. This
>> > > +can happen if, for example, there are insufficient resources or no socket
>> > > +exists for the destination address.
>> >
>> > Should we provide some notification if the socket does not exist at the
>> > destination?
>> >
>>
>> Yes, I think so. I believe a start/stop congestion notification scheme
>> actually manages this issue well.
>>
>> For example, the source begins sending packets to a destination.
>>
>> The destination finds that there exists no socket for that destination
>> address. The destination sends a "stop" notification to the source that
>> contains the address in question. Meanwhile, packets are still coming in
>> but they are being dropped.
>>
>> The source receives the "stop" notification with the address and adds it
>> to the "stopped destinations" list. Any new packet destination address
>> will be compared to that list. Any matches will be dropped before
>> sending (and ideally, before wasting time allocating the packet).
>>
>> Only when a socket is bound to an address that matches a "stopped"
>> address does the destination send a "start" notification to any source
>> it has previusly sent a "stop" notification to.

mmm, keeping the state forever could facilitate a DoS, perhaps we
should provide a timeout after which to try again

>>
>> Once "start" is received, flow may resume as normal.
>
>Again, dropping as control flow tactic has a bunch of problems.
>Blocking senders sounds more reasonable.

Yep, I think so.

>
>
>> > > +
>> > > +\paragraph{Datagram Fragmentation}\label{sec:Device Types / Socket Device / Device Operation / Datagram Sockets / Fragmentation}
>> > > +
>> > > +\field{flags} may have the following bit set:
>> > > +
>> > > +\begin{lstlisting}
>> > > +#define VIRTIO_VSOCK_DGRAM_EOM (1 << 0)
>> > > +\end{lstlisting}
>> > > +
>> > > +When the header \field{flags} field bit VIRTIO_VSOCK_DGRAM_EOM (bit 0) is set,
>> > > +it indicates that the current payload is the end of a datagram fragment
>> > > OR that
>> > > +the current payload is an entire datagram packet.
>> >
>> > In the destination, if we discard some fragments, then could we
>> > reconstruct a different datagram from the one sent?
>> >
>> > Is that anything acceptable?
>> >
>>
>> Dropping fragments should be explicitly disallowed. The sender is
>> explicitly disallowed from NOT placing fragments on the virtqueue, but I
>> see that I am missing the piece that states that they may not be dropped
>> on the receive side.
>>
>> I think it is worth mentioning that implicit in this spec is that
>> socket-to-socket dgram communication is unreliable, but device-to-driver
>> (and vice versa) is still reliable. That is, we can rely at least on the
>> virtqueues to work... and if they fail then the device/driver can simply
>> requeue (think send_pkt_queue in Linux)... so there is some reliability
>> at the lowest layer.

Yep, we should clarify this better.

Thanks,
Stefano

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-09-06 14:28 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20230829212945.3420727-1-bobby.eshleman@bytedance.com>
2023-08-29 22:21 ` [PATCH] virtio-vsock: add VIRTIO_VSOCK_F_DGRAM feature bit Michael S. Tsirkin
     [not found]   ` <ZO6Ql9iWi6a8vDyU@bullseye>
2023-09-01 13:31     ` Michael S. Tsirkin
     [not found]       ` <ZPK7/84QeSHtYGBQ@bullseye>
2023-09-02  8:41         ` Michael S. Tsirkin
     [not found]           ` <ZPL/Ss6sXmfU/0Mg@bullseye>
2023-09-06  8:17             ` Michael S. Tsirkin
2023-09-01 12:45 ` Stefano Garzarella
     [not found]   ` <ZPLAip/TWqvWZ0hv@bullseye>
2023-09-02  8:35     ` Michael S. Tsirkin
2023-09-06 14:28       ` Stefano Garzarella

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).