From: Stefan Hajnoczi <stefanha@redhat.com>
To: zhenwei pi <pizhenwei@bytedance.com>
Cc: parav@nvidia.com, mst@redhat.com, jasowang@redhat.com,
virtio-comment@lists.oasis-open.org, houp@yusur.tech,
helei.sig11@bytedance.com, xinhao.kong@duke.edu
Subject: [virtio-comment] Re: Re: Re: [PATCH v2 06/11] transport-fabrics: introduce command set
Date: Tue, 6 Jun 2023 09:34:44 -0400 [thread overview]
Message-ID: <20230606133444.GC1958291@fedora> (raw)
In-Reply-To: <54bb85af-7979-8226-cfef-d72c1cf2332f@bytedance.com>
[-- Attachment #1: Type: text/plain, Size: 9826 bytes --]
On Tue, Jun 06, 2023 at 09:31:27AM +0800, zhenwei pi wrote:
> On 6/6/23 00:30, Stefan Hajnoczi wrote:
> > On Fri, Jun 02, 2023 at 01:15:00PM +0800, zhenwei pi wrote:
> > >
> > >
> > > On 6/1/23 01:10, Stefan Hajnoczi wrote:
> > > > On Thu, May 04, 2023 at 04:19:05PM +0800, zhenwei pi wrote:
> > > > > Introduce command structures for Virtio-oF.
> > > > >
> > > > > Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
> > > > > ---
> > > > > transport-fabrics.tex | 209 ++++++++++++++++++++++++++++++++++++++++++
> > > > > 1 file changed, 209 insertions(+)
> > > > >
> > > > > diff --git a/transport-fabrics.tex b/transport-fabrics.tex
> > > > > index 7711321..37f57c6 100644
> > > > > --- a/transport-fabrics.tex
> > > > > +++ b/transport-fabrics.tex
> > > > > @@ -495,3 +495,212 @@ \subsubsection{Buffer Mapping Definition}\label{sec:Virtio Transport Options / V
> > > > > |value | -> 8193 (value.u32)
> > > > > +------+
> > > > > \end{lstlisting}
> > > > > +
> > > > > +\subsubsection{Commands Definition}\label{sec:Virtio Transport Options / Virtio Over Fabrics / Transmission Protocol / Commands Definition}
> > > > > +This section defines command structures for Virtio Over Fabrics.
> > > > > +
> > > > > +A common structure virtio_of_value is fixed to 8 bytes and MUST be used as one
> > > > > +of the following format:
> > > > > +
> > > > > +\begin{itemize}
> > > > > +\item u8
> > > > > +\item le16
> > > > > +\item le32
> > > > > +\item le64
> > > > > +\end{itemize}
> > > >
> > > > The way it's written does not document where the u8, u16, u32 bytes are
> > > > located and that the unused bytes are 0. I think I understand what you
> > > > mean though:
> > > >
> > > > le64 value = cpu_to_le64((u64)v); /* v is u8, u16, u32, or u64 */
> > > >
> > > > Please clarify.
> > > >
> > >
> > > I want to describe an union structure of 8 bytes:
> > > union virtio_of_value {
> > > u8;
> > > u16;
> > > u32;
> > > u64;
> > > };
> > >
> > > Depending on the opcode, use the right one.
> >
> > I was trying to point out that the memory layout of C unions is not
> > portable. Your example does not define the exact in-memory layout of
> > union virtio_of_value. Here is the first web search result I found about
> > this topic:
> >
> > "Q: And a related question: if you dump unions in binary form to a file,
> > and then reload them from the file on a different platform, or with a
> > program compiled by a different compiler, are you guaranteed to get
> > back what you stored? (I think not, but I'm not sure)
> >
> > A: You're right; you're not."
> >
> > https://bytes.com/topic/c/answers/220372-unions-storage-abis
> >
> > In the cpu_to_le64() code example that I gave, the exact in-memory
> > layout is well-defined. There is no ambiguity.
> >
>
> OK, thanks.
>
> > > > > +\hline
> > > > > +0xff00 - 0xfffd & Reserved \\
> > > > > +\hline
> > > > > +\end{tabular}
> > > > > +
> > > > > +\paragraph{Connect Command}\label{sec:Virtio Transport Options / Virtio Over Fabrics / Transmission Protocol / Commands Definition / Connect Command}
> > > > > +The Connect Command is used to establish Virtio Over Fabrics queue. The control
> > > > > +queue MUST be established firstly, then the Connect command establishes an
> > > > > +association between the initiator and the target.
> > > >
> > > > Is a "Virtio Over Fabrics queue" different from a virtqueue?
> > > >
> > > > If I understand correctly, the control queue must be established by the
> > > > initiator first and then the Connect command is sent to begin
> > > > communication between the initiator and the target?
> > > >
> > >
> > > The queue mapping is missing in the '[PATCH v2 01/11] transport-fabrics:
> > > introduce Virtio Over Fabrics overview', like:
> > > A "Virtio Over Fabrics queue" is a reliable connection between initiator and
> > > target. There are 2 types of Virtio Over Fabrics queue:
> > > +\begin{itemize}
> > > +\item A single Control queue is required to execute control operations.
> > > +\item 0 or more Virtio Over Fabrics queues map the virtqueues.
> > > +\end{itemize}
> >
> > That helps, thanks!
> >
> > >
> > > > > +
> > > > > +The Target ID of 0xffff is reserved, then:
> > > >
> > > > Please move this after the fields have been shown and the purpose of the
> > > > Target ID field has been explained.
> > > >
> > > > > +\begin{itemize}
> > > > > +\item The Target ID of 0xffff MUST be specified as the Target ID in a Connect
> > > > > +Command for the control queue.
> > > > > +\item The target SHOULD allocate any available Target ID to the initiator,
> > > > > +and return the allocated Target ID in the Completion.
> > > > > +\item The returned Target ID MUST be specified as the Target ID, and the Queue ID
> > > > > +MUST be specified in a Connect Command for the virtqueue.
> > > > > +\end{itemize}
> > > >
> > > > What is the purpose of the Target ID? Is it to allow a server to provide
> > > > access to multiple targets over the same connection?
> > > >
> > >
> > > A target listens on a port, and provides access to 0 or more targets. An
> > > initiator connect the specific target by TVQN of connect command.
> > > An initiator could connect a single target, multiple initiators could
> > > connect the same target(typically, shared disk/fs).
> >
> > Why is the target ID separate from the TVQN? If the Target ID is a
> > separate parameter then users will have to learn additional
> > syntax/command-line options to specify the TVQN + Target ID and that
> > syntax may vary between software.
> >
> > >
> > > > > +
> > > > > +The Connect Command has following structure:
> > > > > +
> > > > > +\begin{lstlisting}
> > > > > +struct virtio_of_command_connect {
> > > > > + le16 opcode;
> > > > > + le16 command_id;
> > > > > + le16 target_id;
> > > > > + le16 queue_id;
> > > > > + le16 ndesc;
> > > >
> > > > Where is this field documented?
> > > >
> > >
> > > OK. Will add.
> > >
> > > > Why does the initiator send ndesc to the target? Normally a VIRTIO Transport reports the device's max descriptors and then the driver can tell the device to reduce the number of descriptors, if desired.
> > > >
> > >
> > > A target supports at lease 1 descriptor. The 'ndesc' of struct
> > > virtio_of_command_connect indicates the full PDU contains: struct
> > > virtio_of_command_connect + 1 * virtio_of_vq_desc + data.
> > >
> > > > > +#define VIRTIO_OF_CONNECTION_TCP 1
> > > > > +#define VIRTIO_OF_CONNECTION_RDMA 2
> > > >
> > > > What does RDMA mean? I thought RDMA is a general concept that several
> > > > fabrics implement (with different details like how addressing works).
> > > >
> > >
> > > I guest your concern is the difference of IB/RoCE/iWarp ...
> > > We are trying to define the payload protocol here, so I think we can ignore
> > > the difference of the HCA.
> >
> > I see, maybe this could be called STREAM vs KEYED instead of TCP vs RDMA?
> >
>
> I'd like to define two PDU mapping rules(in '[PATCH v2 04/11]
> transport-fabrics: introduce Stream Transmission' and '[PATCH v2 05/11]
> transport-fabrics: introduce Keyed Transmission'): STREAM and KEYED. A
> transport protocols need to use one.
>
> Then we can define protocols:
> #define VIRTIO_OF_CONNECTION_TCP 1 -> use STREAM
> #define VIRTIO_OF_CONNECTION_RDMA 2 -> use KEYED
> #define VIRTIO_OF_CONNECTION_TLS 3(in the future) -> use STREAM
> #define VIRTIO_OF_CONNECTION_XXX
It's not clear to me whether TCP actually means TCP/IP or if it actually
means STREAM. For example, if I run Virtio Over Fabrics over AF_VSOCK,
would it use VIRTIO_OF_CONNECTION_TCP although there is no TCP/IP? If
so, then I think the name TCP is misleading and STREAM would be clearer.
>
> > >
> > > > > + u8 oftype;
> > > > > + u8 padding[5];
> > > > > +};
> > > > > +\end{lstlisting}
> > > > > +
> > > > > +The Connect commands MUST contains one Segment Descriptor and one structure
> > > > > +virtio_of_command_connect to specify Initiator VQN and Target VNQ,
> > > > > +virtio_of_command_connect has following structure:
> > > >
> > > > I'm confsued. virtio_of_command_connect was defined above. The struct
> > > > defined below is virtio_of_connect. Does this paragraph need to be
> > > > updated (virtio_of_command_connect -> virtio_of_connect)?
> > > >
> > > > Why is virtio_of_connect a separate struct and not part of
> > > > virtio_of_command_connect?
> > > >
> > >
> > > Because I'd like to define all the commands with a fixed length.
> >
> > I don't understand. virtio_of_connect and virtio_of_command_connect are
> > both fixed-length. Why can't they be unified into 1 fixed-length struct?
> >
>
> For stream protocol, it always work fine.
> For keyed protocol, for example RDMA, the target side needs to use
> ibv_post_recv to receive a large size(sizeof virtio_of_command_connect +
> sizeof virtio_of_connect). If the target uses ibv_post_recv to receive
> sizeof(CMD) + sizeof(DESC) * 1, the initiator fails in RDMA SEND.
I read that "A RC connection is very similar to a TCP connection" in the
NVIDIA documentation
(https://docs.nvidia.com/networking/display/RDMAAwareProgrammingv17/Transport+Modes)
and expected SOCK_STREAM semantics for RDMA SEND.
Are you saying ibv_post_send() fails when the receiver's work request
sg_list size is smaller (fewer bytes) than the sender's?
Does the receiver have just 1 WR queued in your example? What happens if
the receiver queues multiple small WRs?
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next prev parent reply other threads:[~2023-06-06 13:35 UTC|newest]
Thread overview: 74+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-04 8:18 [virtio-comment] [PATCH v2 00/11] Introduce Virtio Over Fabrics zhenwei pi
2023-05-04 8:19 ` [virtio-comment] [PATCH v2 01/11] transport-fabrics: introduce Virtio Over Fabrics overview zhenwei pi
2023-05-04 8:57 ` David Hildenbrand
2023-05-04 9:46 ` zhenwei pi
2023-05-04 10:05 ` Michael S. Tsirkin
2023-05-04 10:12 ` David Hildenbrand
2023-05-04 10:50 ` Re: " zhenwei pi
2023-05-31 14:00 ` [virtio-comment] " Stefan Hajnoczi
2023-06-02 1:17 ` [virtio-comment] " zhenwei pi
2023-06-05 2:39 ` [virtio-comment] " Parav Pandit
2023-06-05 2:39 ` Parav Pandit
2023-05-04 8:19 ` [virtio-comment] [PATCH v2 02/11] transport-fabrics: introduce Virtio Qualified Name zhenwei pi
2023-05-31 14:06 ` Stefan Hajnoczi
2023-06-02 1:50 ` zhenwei pi
2023-06-05 2:40 ` Parav Pandit
2023-06-05 7:57 ` zhenwei pi
2023-06-05 17:05 ` Stefan Hajnoczi
2023-05-04 8:19 ` [virtio-comment] [PATCH v2 03/11] transport-fabircs: introduce Segment Descriptor Definition zhenwei pi
2023-05-31 14:23 ` Stefan Hajnoczi
2023-06-02 3:08 ` zhenwei pi
2023-06-05 2:40 ` [virtio-comment] " Parav Pandit
2023-05-04 8:19 ` [virtio-comment] [PATCH v2 04/11] transport-fabrics: introduce Stream Transmission zhenwei pi
2023-05-31 15:20 ` Stefan Hajnoczi
2023-06-02 2:26 ` zhenwei pi
2023-06-05 16:11 ` Stefan Hajnoczi
2023-06-06 3:13 ` zhenwei pi
2023-06-06 13:09 ` Stefan Hajnoczi
2023-05-04 8:19 ` [virtio-comment] [PATCH v2 05/11] transport-fabrics: introduce Keyed Transmission zhenwei pi
2023-05-31 16:20 ` [virtio-comment] " Stefan Hajnoczi
2023-06-01 9:02 ` zhenwei pi
2023-06-01 11:33 ` Stefan Hajnoczi
2023-06-01 13:09 ` zhenwei pi
2023-06-01 19:13 ` Stefan Hajnoczi
2023-06-01 21:23 ` Stefan Hajnoczi
2023-06-02 0:55 ` zhenwei pi
2023-06-05 17:21 ` Stefan Hajnoczi
2023-06-05 2:41 ` Parav Pandit
2023-06-05 8:41 ` zhenwei pi
2023-06-05 11:45 ` Parav Pandit
2023-06-05 12:50 ` zhenwei pi
2023-06-05 13:12 ` Parav Pandit
2023-06-06 7:13 ` zhenwei pi
2023-06-06 21:52 ` Parav Pandit
2023-05-04 8:19 ` [virtio-comment] [PATCH v2 06/11] transport-fabrics: introduce command set zhenwei pi
2023-05-31 17:10 ` [virtio-comment] " Stefan Hajnoczi
2023-06-02 5:15 ` [virtio-comment] " zhenwei pi
2023-06-05 16:30 ` Stefan Hajnoczi
2023-06-06 1:31 ` [virtio-comment] " zhenwei pi
2023-06-06 13:34 ` Stefan Hajnoczi [this message]
2023-06-07 2:58 ` [virtio-comment] " zhenwei pi
2023-06-08 16:41 ` Stefan Hajnoczi
2023-06-08 17:01 ` [virtio-comment] " Parav Pandit
2023-06-09 1:39 ` [virtio-comment] " zhenwei pi
2023-06-09 2:06 ` [virtio-comment] " Parav Pandit
2023-06-09 3:55 ` zhenwei pi
2023-06-11 20:56 ` Parav Pandit
2023-06-06 2:02 ` [virtio-comment] " zhenwei pi
2023-06-06 13:44 ` Stefan Hajnoczi
2023-06-07 2:03 ` [virtio-comment] " zhenwei pi
2023-05-04 8:19 ` [virtio-comment] [PATCH v2 07/11] transport-fabrics: introduce opcodes zhenwei pi
2023-05-31 17:11 ` [virtio-comment] " Stefan Hajnoczi
[not found] ` <20230531205508.GA1509630@fedora>
2023-06-02 8:39 ` [virtio-comment] " zhenwei pi
2023-06-05 16:46 ` Stefan Hajnoczi
2023-05-04 8:19 ` [virtio-comment] [PATCH v2 08/11] transport-fabrics: introduce status of completion zhenwei pi
2023-05-04 8:19 ` [virtio-comment] [PATCH v2 09/11] transport-fabrics: add TCP&RDMA binding zhenwei pi
[not found] ` <20230531210255.GC1509630@fedora>
2023-06-02 9:07 ` [virtio-comment] Re: " zhenwei pi
2023-06-05 16:57 ` Stefan Hajnoczi
2023-06-06 1:41 ` [virtio-comment] " zhenwei pi
2023-06-06 13:51 ` Stefan Hajnoczi
2023-06-07 2:15 ` zhenwei pi
2023-05-04 8:19 ` [virtio-comment] [PATCH v2 10/11] transport-fabrics: add device initialization zhenwei pi
[not found] ` <20230531210925.GD1509630@fedora>
2023-06-02 9:11 ` zhenwei pi
2023-05-04 8:19 ` [virtio-comment] [PATCH v2 11/11] transport-fabrics: support inline data for keyed transmission zhenwei pi
2023-05-29 0:56 ` [virtio-comment] PING: [PATCH v2 00/11] Introduce Virtio Over Fabrics zhenwei pi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230606133444.GC1958291@fedora \
--to=stefanha@redhat.com \
--cc=helei.sig11@bytedance.com \
--cc=houp@yusur.tech \
--cc=jasowang@redhat.com \
--cc=mst@redhat.com \
--cc=parav@nvidia.com \
--cc=pizhenwei@bytedance.com \
--cc=virtio-comment@lists.oasis-open.org \
--cc=xinhao.kong@duke.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox