All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@redhat.com>
To: zhenwei pi <pizhenwei@bytedance.com>
Cc: virtio-comment@lists.oasis-open.org
Subject: Re: [virtio-comment] Re: [PATCH v2 05/11] transport-fabrics: introduce Keyed Transmission
Date: Thu, 1 Jun 2023 07:33:22 -0400	[thread overview]
Message-ID: <20230601113322.GA1538357@fedora> (raw)
In-Reply-To: <aad519cd-5114-6144-8520-a258745c5e4a@bytedance.com>

[-- Attachment #1: Type: text/plain, Size: 3848 bytes --]

On Thu, Jun 01, 2023 at 05:02:45PM +0800, zhenwei pi wrote:
> 
> 
> On 6/1/23 00:20, Stefan Hajnoczi wrote:
> > On Thu, May 04, 2023 at 04:19:04PM +0800, zhenwei pi wrote:
> > > Keyed transmission is used for message oriented communication(Ex RDMA),
> > > also add virtio-blk read/write 8K example.
> > > 
> > > Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
> > > ---
> > >   transport-fabrics.tex | 178 ++++++++++++++++++++++++++++++++++++++++++
> > >   1 file changed, 178 insertions(+)
> > > 
> > > diff --git a/transport-fabrics.tex b/transport-fabrics.tex
> > > index c02cf26..7711321 100644
> > > --- a/transport-fabrics.tex
> > > +++ b/transport-fabrics.tex
> > > @@ -317,3 +317,181 @@ \subsubsection{Buffer Mapping Definition}\label{sec:Virtio Transport Options / V
> > >                       |......|
> > >                       +------+  -> 8193
> > >   \end{lstlisting}
> > > +
> > > +\paragraph{Keyed Transmission}\label{sec:Virtio Transport Options / Virtio Over Fabrics / Transmission Protocol / Commands Definition / Keyed Transmission}
> > > +Command and Segment Descriptors are transmitted in a message within a
> > > +connection, and buffer is transmitted by remote memory access.  The layout in message:
> > 
> > With RDMA it is theoretically possible to implement virtqueues without
> > messages in the data path (i.e. by using something similar to vring with
> > RDMA). Why did you decide to use a mixed messages + RDMA approach
> > instead of a 100% RDMA approach?
> > 
> 
> Hi,
> 
> To reduce networking RTT. From my experience, a single RDMA message(event
> based) uses at least 6us.
> This approach has a chance to send a command(include data segments) by 1
> networking RTT, and receive a completion(include data segments) in 1
> networking RTT. I tried to design a 100% RDMA approach(mapping a vring to
> the remote side, the remote side accesses this vring by RDMA READ/WRITE),
> but I failed to find an idea to achieve.

The goal is to minimize the number of RDMA transfers. Each area of
memory should be located on the system that is polling constantly (busy
waiting) and the other side occassionally sends an RDMA WRITE request.

This idea requires bi-directional RDMA where both initiator and target
make memory accessible to the other side. Is this possible?

The target owns the Available Ring, a descriptor table similar to those
used by the Split and Packed Virtqueue layouts that is used by the
driver to submit virtqueue buffers to the device. The target sends a key
to the Available Ring to the initiator during virtqueue setup. The
initiator sends RDMA WRITEs that fill in virtqueue descriptors. Indirect
descriptors are supported, but the target will need to use RDMA READs to
load the indirect descriptor table, so there is overhead. Even regular
non-indirect descriptors have overhead because an RDMA READ is required
to read the payload. The best approach for small virtqueue elements is
to inline the payload in the Available Ring descriptor so no additional
RDMA transfers are needed (this achieves similar effect to your approach
of using messages + RDMA, but with pure RDMA). The target polls the
Available Ring to detect available buffers.

The initiator sends a key to the Used Ring to the target during
virtqueue setup. The target sends RDMA WRITEs that fill in used
elements. The initiator polls the Used Ring to detect used buffers.

I'm not sure if the Used Ring makes sense as RDMA memory. Maybe it's
better to send a message over the reliable connection instead so that
Used Buffer Notifications can support interrupts and not just polling.

This is a new virtqueue layout. It's only worthwhile implementing it if
the Available Ring RDMA performance is significantly better than the
current approach.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2023-06-01 11:33 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-04  8:18 [virtio-comment] [PATCH v2 00/11] Introduce Virtio Over Fabrics zhenwei pi
2023-05-04  8:19 ` [virtio-comment] [PATCH v2 01/11] transport-fabrics: introduce Virtio Over Fabrics overview zhenwei pi
2023-05-04  8:57   ` David Hildenbrand
2023-05-04  9:46     ` zhenwei pi
2023-05-04 10:05       ` Michael S. Tsirkin
2023-05-04 10:12         ` David Hildenbrand
2023-05-04 10:50         ` Re: " zhenwei pi
2023-05-31 14:00   ` [virtio-comment] " Stefan Hajnoczi
2023-06-02  1:17     ` [virtio-comment] " zhenwei pi
2023-06-05  2:39   ` [virtio-comment] " Parav Pandit
2023-06-05  2:39   ` Parav Pandit
2023-05-04  8:19 ` [virtio-comment] [PATCH v2 02/11] transport-fabrics: introduce Virtio Qualified Name zhenwei pi
2023-05-31 14:06   ` Stefan Hajnoczi
2023-06-02  1:50     ` zhenwei pi
2023-06-05  2:40       ` Parav Pandit
2023-06-05  7:57         ` zhenwei pi
2023-06-05 17:05         ` Stefan Hajnoczi
2023-05-04  8:19 ` [virtio-comment] [PATCH v2 03/11] transport-fabircs: introduce Segment Descriptor Definition zhenwei pi
2023-05-31 14:23   ` Stefan Hajnoczi
2023-06-02  3:08     ` zhenwei pi
2023-06-05  2:40   ` [virtio-comment] " Parav Pandit
2023-05-04  8:19 ` [virtio-comment] [PATCH v2 04/11] transport-fabrics: introduce Stream Transmission zhenwei pi
2023-05-31 15:20   ` Stefan Hajnoczi
2023-06-02  2:26     ` zhenwei pi
2023-06-05 16:11       ` Stefan Hajnoczi
2023-06-06  3:13         ` zhenwei pi
2023-06-06 13:09           ` Stefan Hajnoczi
2023-05-04  8:19 ` [virtio-comment] [PATCH v2 05/11] transport-fabrics: introduce Keyed Transmission zhenwei pi
2023-05-31 16:20   ` [virtio-comment] " Stefan Hajnoczi
2023-06-01  9:02     ` zhenwei pi
2023-06-01 11:33       ` Stefan Hajnoczi [this message]
2023-06-01 13:09         ` zhenwei pi
2023-06-01 19:13           ` Stefan Hajnoczi
2023-06-01 21:23             ` Stefan Hajnoczi
2023-06-02  0:55               ` zhenwei pi
2023-06-05 17:21                 ` Stefan Hajnoczi
2023-06-05  2:41   ` Parav Pandit
2023-06-05  8:41     ` zhenwei pi
2023-06-05 11:45       ` Parav Pandit
2023-06-05 12:50         ` zhenwei pi
2023-06-05 13:12           ` Parav Pandit
2023-06-06  7:13             ` zhenwei pi
2023-06-06 21:52               ` Parav Pandit
2023-05-04  8:19 ` [virtio-comment] [PATCH v2 06/11] transport-fabrics: introduce command set zhenwei pi
2023-05-31 17:10   ` [virtio-comment] " Stefan Hajnoczi
2023-06-02  5:15     ` [virtio-comment] " zhenwei pi
2023-06-05 16:30       ` Stefan Hajnoczi
2023-06-06  1:31         ` [virtio-comment] " zhenwei pi
2023-06-06 13:34           ` Stefan Hajnoczi
2023-06-07  2:58             ` [virtio-comment] " zhenwei pi
2023-06-08 16:41               ` Stefan Hajnoczi
2023-06-08 17:01                 ` [virtio-comment] " Parav Pandit
2023-06-09  1:39                   ` [virtio-comment] " zhenwei pi
2023-06-09  2:06                     ` [virtio-comment] " Parav Pandit
2023-06-09  3:55                       ` zhenwei pi
2023-06-11 20:56                         ` Parav Pandit
2023-06-06  2:02         ` [virtio-comment] " zhenwei pi
2023-06-06 13:44           ` Stefan Hajnoczi
2023-06-07  2:03             ` [virtio-comment] " zhenwei pi
2023-05-04  8:19 ` [virtio-comment] [PATCH v2 07/11] transport-fabrics: introduce opcodes zhenwei pi
2023-05-31 17:11   ` [virtio-comment] " Stefan Hajnoczi
     [not found]   ` <20230531205508.GA1509630@fedora>
2023-06-02  8:39     ` [virtio-comment] " zhenwei pi
2023-06-05 16:46       ` Stefan Hajnoczi
2023-05-04  8:19 ` [virtio-comment] [PATCH v2 08/11] transport-fabrics: introduce status of completion zhenwei pi
2023-05-04  8:19 ` [virtio-comment] [PATCH v2 09/11] transport-fabrics: add TCP&RDMA binding zhenwei pi
     [not found]   ` <20230531210255.GC1509630@fedora>
2023-06-02  9:07     ` [virtio-comment] Re: " zhenwei pi
2023-06-05 16:57       ` Stefan Hajnoczi
2023-06-06  1:41         ` [virtio-comment] " zhenwei pi
2023-06-06 13:51           ` Stefan Hajnoczi
2023-06-07  2:15             ` zhenwei pi
2023-05-04  8:19 ` [virtio-comment] [PATCH v2 10/11] transport-fabrics: add device initialization zhenwei pi
     [not found]   ` <20230531210925.GD1509630@fedora>
2023-06-02  9:11     ` zhenwei pi
2023-05-04  8:19 ` [virtio-comment] [PATCH v2 11/11] transport-fabrics: support inline data for keyed transmission zhenwei pi
2023-05-29  0:56 ` [virtio-comment] PING: [PATCH v2 00/11] Introduce Virtio Over Fabrics zhenwei pi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230601113322.GA1538357@fedora \
    --to=stefanha@redhat.com \
    --cc=pizhenwei@bytedance.com \
    --cc=virtio-comment@lists.oasis-open.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.