All of lore.kernel.org
 help / color / mirror / Atom feed
* [virtio-comment] [PROPOSAL] Virtio Over Fabrics(TCP/RDMA)
@ 2023-04-23 11:29 zhenwei pi
  2023-04-24  3:40   ` [virtio-dev] " Jason Wang
  0 siblings, 1 reply; 29+ messages in thread
From: zhenwei pi @ 2023-04-23 11:29 UTC (permalink / raw)
  To: Michael S . Tsirkin, Cornelia Huck, parav
  Cc: virtio-dev, virtio-comment@lists.oasis-open.org,
	helei.sig11@bytedance.com, houp

Hi,

In the past years, virtio supports lots of device specifications by 
PCI/MMIO/CCW. These devices work fine in the virtualization environment, 
and we have a chance to support virtio device family for the 
container/host scenario.

- Theory
"Virtio Over Fabrics" aims at "reuse virtio device specifications", and 
provides network defined peripheral devices.
And this protocol also could be used in virtualization environment, 
typically hypervisor(or vhost-user process) handles request from virtio 
PCI/MMIO/CCW, remaps request and forwards to target by fabrics.

- Protocol
The detail protocol definition see:
https://github.com/pizhenwei/linux/blob/virtio-of-github/include/uapi/linux/virtio_of.h

Example of virtio-blk read/write by TCP/RDMA:
1. Virtio Over TCP
1.1 An example of virtio-blk write(8K) command:
Initiator side sends a stream buffer(command + 4 * desc + 8208 bytes):
  COMMAND            +------+
                     |opcode|  ->  virtio_of_op_vring
                     +------+
                     |cmd id|  ->  10
                     +------+
                     |length|  ->  8208
                     +------+
                     |ndesc |  ->  4
                     +------+
                     |rsvd  |
                     +------+

  DESC0              +------+
               +-----|addr  |  -> 0
               |     +------+
               |     |length|  -> 16 (virtio blk write command)
               |     +------+
               |     |id    |  -> 10
               |     +------+
               |     |flags |  -> VRING_DESC_F_NEXT
               |     +------+
               |
  DESC1        |     +------+
               | +---|addr  |  -> 16
               | |   +------+
               | |   |length|  -> 4096
               | |   +------+
               | |   |id    |  -> 11
               | |   +------+
               | |   |flags |  -> VRING_DESC_F_NEXT
               | |   +------+
               | |
  DESC2        | |   +------+
               | |   |addr  |  -> 4112
               | |   +------+
               | | +-|length|  -> 4096
               | | | +------+
               | | | |id    |  -> 12
               | | | +------+
               | | | |flags |  -> VRING_DESC_F_NEXT
               | | | +------+
               | | |
  DESC3        | | | +------+
               | | | |addr  |  -> 0
               | | | +------+
               | | | |length|  -> 1
               | | | +------+
               | | | |id    |  -> 13
               | | | +------+
               | | | |flags |  -> VRING_DESC_F_WRITE
               | | | +------+
               | | |
  DATA         +-+-+>+------+  -> 0
                 | | |......|
                 +-+>+------+  -> 16
                   | |......|
                   +>+------+  -> 4112
                     |......|
                     +------+  -> 8208

Target side sends a stream buffer(completion + 1 * desc + 1 bytes):
  COMPLETION         +------+
                     |status|  ->  VIRTIO_OF_SUCCESS
                     +------+
                     |cmd id|  ->  10
                     +------+
                     |ndesc |  ->  1
                     +------+
                     |rsvd  |
                     +------+
                     |value |  -> 1 (value.u32)
                     +------+

  DESC0              +------+
                   +-|addr  |  -> 0
                   | +------+
                   | |length|  -> 1
                   | +------+
                   | |id    |  -> 13
                   | +------+
                   | |flags |  -> VRING_DESC_F_WRITE
                   | +------+
                   |
  DATA             |>+------+  -> 0
                     |......|
                     +------+  -> 1

1.2 An example of virtio-blk read(8K) command:
Initiator side sends a stream buffer(command + 4 * desc + 16 bytes):
  COMMAND            +------+
                     |opcode|  ->  virtio_of_op_vring
                     +------+
                     |cmd id|  ->  14
                     +------+
                     |length|  ->  16 (virtio blk read command)
                     +------+
                     |ndesc |  ->  4
                     +------+
                     |rsvd  |
                     +------+

  DESC0              +------+
                   +-|addr  |  -> 0
                   | +------+
                   | |length|  -> 16
                   | +------+
                   | |id    |  -> 14
                   | +------+
                   | |flags |  -> VRING_DESC_F_NEXT
                   | +------+
                   |
  DESC1            | +------+
                   | |addr  |  -> 16
                   | +------+
                   | |length|  -> 4096
                   | +------+
                   | |id    |  -> 15
                   | +------+
                   | |flags |  -> VRING_DESC_F_NEXT | VRING_DESC_F_WRITE
                   | +------+
                   |
  DESC2            | +------+
                   | |addr  |  -> 4112
                   | +------+
                   | |length|  -> 4096
                   | +------+
                   | |id    |  -> 16
                   | +------+
                   | |flags |  -> VRING_DESC_F_NEXT | VRING_DESC_F_WRITE
                   | +------+
                   |
  DESC3            | +------+
                   | |addr  |  -> 0
                   | +------+
                   | |length|  -> 1
                   | +------+
                   | |id    |  -> 17
                   | +------+
                   | |flags |  -> VRING_DESC_F_WRITE
                   | +------+
                   |
  DATA             +>+------+  -> 0
                     |......|
                     +------+  -> 16

Target side sends a stream buffer(completion + 3 * desc + 8193 bytes):
  COMPLETION         +------+
                     |status|  ->  VIRTIO_OF_SUCCESS
                     +------+
                     |cmd id|  ->  14
                     +------+
                     |ndesc |  ->  3
                     +------+
                     |rsvd  |
                     +------+
                     |value |  -> 8193 (value.u32)
                     +------+

  DESC0              +------+
               +-----|addr  |  -> 0
               |     +------+
               |     |length|  -> 4096
               |     +------+
               |     |id    |  -> 15
               |     +------+
               |     |flags |  -> VRING_DESC_F_NEXT | VRING_DESC_F_WRITE
               |     +------+
               |
  DESC1        |     +------+
               | +---|addr  |  -> 4096
               | |   +------+
               | |   |length|  -> 4096
               | |   +------+
               | |   |id    |  -> 16
               | |   +------+
               | |   |flags |  -> VRING_DESC_F_NEXT | VRING_DESC_F_WRITE
               | |   +------+
               | |
  DESC2        | |   +------+
               | |   |addr  |  -> 8192
               | |   +------+
               | | +-|length|  -> 1
               | | | +------+
               | | | |id    |  -> 17
               | | | +------+
               | | | |flags |  -> VRING_DESC_F_WRITE
               | | | +------+
               | | |
  DATA         +-+-+>+------+  -> 0
                 | | |......|
                 +-+>+------+  -> 4096
                   | |......|
                   +>+------+  -> 8192
                     |......|
                     +------+  -> 8193

1. Virtio Over RDMA
2.1 An example of virtio-blk write(8K) command:
Initiator side sends a message (command + 4 * desc) by RDMA POST SEND:
  COMMAND            +------+
                     |opcode|  ->  virtio_of_op_vring
                     +------+
                     |cmd id|  ->  10
                     +------+
                     |length|  ->  0
                     +------+
                     |ndesc |  ->  4
                     +------+
                     |rsvd  |
                     +------+

  DESC0              +------+
                     |addr  |  -> 0xffff012345670000
                     +------+
                     |length|  -> 16 (virtio blk write command)
                     +------+
                     |id    |  -> 10
                     +------+
                     |flags |  -> VRING_DESC_F_NEXT
                     +------+
                     |key   |  -> 0x1234
                     +------+

  DESC1              +------+
                     |addr  |  -> 0xffff012345671000
                     +------+
                     |length|  -> 4096
                     +------+
                     |id    |  -> 11
                     +------+
                     |flags |  -> VRING_DESC_F_NEXT
                     +------+
                     |key   |  -> 0x1236
                     +------+

  DESC2              +------+
                     |addr  |  -> 0xffff012345673000
                     +------+
                     |length|  -> 4096
                     +------+
                     |id    |  -> 12
                     +------+
                     |flags |  -> VRING_DESC_F_NEXT
                     +------+
                     |key   |  -> 0x1238
                     +------+

  DESC3              +------+
                     |addr  |  -> 0xffff012345677000
                     +------+
                     |length|  -> 1
                     +------+
                     |id    |  -> 13
                     +------+
                     |flags |  -> VRING_DESC_F_WRITE
                     +------+
                     |key   |  -> 0x1239
                     +------+

Target side reads the remote address of DESC0/DESC1/DESC2 by RDMA POST 
READ, and writes the remote address of DESC3 by RDMA POST WRITE, sends a 
completion by POST SEND:
  COMPLETION         +------+
                     |status|  ->  VIRTIO_OF_SUCCESS
                     +------+
                     |cmd id|  ->  10
                     +------+
                     |ndesc |  ->  0
                     +------+
                     |rsvd  |
                     +------+
                     |value |  -> 1 (value.u32)
                     +------+

2.2 An example of virtio-blk read(8K) command:
This is quite similar to 2.1 except flags in DESC1/DESC2, target side 
reads the remote address of DESC0 by RDMA POST READ, and writes the 
remote address of DESC1/DESC2/DESC3 by RDMA POST WRITE, sends a 
completion by POST SEND.

- Example
I develop an kernel initiator(unstable, WIP version, currently TCP/RDMA 
supported):
https://github.com/pizhenwei/linux/tree/virtio-of-github

And a target(unstable, WIP version, currently blk/crypto/rng supported):
https://github.com/pizhenwei/virtio-target/tree/WIP

Run target firstly: ~# ./vtgt vtgt.conf
Then install kernel modules in initiator side:
  ~# insmod ./virtio_fabrics.ko
  ~# insmod ./virtio_tcp.ko
  ~# insmod ./virtio_rdma.ko

Create a virtio-blk device over TCP by command:
  ~# echo 
command=create,transport=tcp,taddr=192.168.122.1,tport=15771,tvqn=virtio-target/block/block0.service,iaddr=192.168.122.1,iport=0,ivqn=vqn.uuid:42761df9-4c3f-4b27-843d-c88d1dcdce32 
 > /dev/virtio-fabrics

Or create a virtio-crypto device over RDMA by command:
  ~# echo 
command=create,transport=rdma,taddr=192.168.122.1,tport=15771,tvqn=virtio-target/crypto/crypto0.service,iaddr=192.168.122.1,iport=0,ivqn=vqn.uuid:42761df9-4c3f-4b27-843d-c88d1dcdce32 
 > /dev/virtio-fabrics

Or destroy a virtio-of device by command:
  ~# echo 
command=destroy,transport=tcp,taddr=192.168.122.1,tport=15771,tvqn=vqn.uuid:2d5130d8-36d5-4fe8-ae55-48ea51e0391a,iaddr=192.168.122.1,ivqn=vqn.uuid:42761df9-4c3f-4b27-843d-c88d1dcdce32 
 > /dev/virtio-fabrics

-- 
zhenwei pi

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2023-04-28  7:55 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-04-23 11:29 [virtio-comment] [PROPOSAL] Virtio Over Fabrics(TCP/RDMA) zhenwei pi
2023-04-24  3:40 ` Jason Wang
2023-04-24  3:40   ` [virtio-dev] " Jason Wang
2023-04-24 13:38   ` zhenwei pi
2023-04-25  5:03     ` Parav Pandit
2023-04-25  5:03       ` [virtio-dev] " Parav Pandit
2023-04-25  6:31       ` Jason Wang
2023-04-25  6:31         ` [virtio-dev] " Jason Wang
2023-04-25 13:27         ` Parav Pandit
2023-04-25 13:27           ` [virtio-dev] " Parav Pandit
2023-04-25  7:02       ` zhenwei pi
2023-04-27  8:20       ` zhenwei pi
2023-04-27  8:20         ` [virtio-dev] " zhenwei pi
2023-04-27 20:31         ` Parav Pandit
2023-04-27 20:31           ` [virtio-dev] " Parav Pandit
2023-04-28  7:53           ` zhenwei pi
2023-04-25  6:36     ` Jason Wang
2023-04-25  6:36       ` [virtio-dev] " Jason Wang
2023-04-25  7:21       ` zhenwei pi
2023-04-26  9:29       ` [virtio-comment] Re: [virtio-dev] " Xuan Zhuo
2023-04-26  9:29         ` Xuan Zhuo
2023-04-25 13:55   ` Stefan Hajnoczi
2023-04-25 13:55     ` [virtio-dev] " Stefan Hajnoczi
2023-04-26  1:08     ` zhenwei pi
2023-04-26  1:08       ` [virtio-dev] " zhenwei pi
2023-04-25 14:09   ` Stefan Hajnoczi
2023-04-25 14:09     ` [virtio-dev] " Stefan Hajnoczi
2023-04-26  3:03     ` Jason Wang
2023-04-26  3:03       ` [virtio-dev] " Jason Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.