Discussion of the implementations of VIRTIO specification
 help / color / mirror / Atom feed
From: Dust Li <dust.li@linux.alibaba.com>
To: Gerry <gerry@linux.alibaba.com>, Jason Wang <jasowang@redhat.com>
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>,
	virtio-dev@lists.oasis-open.org, hans@linux.alibaba.com,
	herongguang@linux.alibaba.com, zmlcc@linux.alibaba.com,
	tonylu@linux.alibaba.com, zhenzao@linux.alibaba.com,
	helinguo@linux.alibaba.com, mst@redhat.com, cohuck@redhat.com,
	Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device
Date: Wed, 19 Oct 2022 16:21:36 +0800	[thread overview]
Message-ID: <20221019082136.GA63658@linux.alibaba.com> (raw)
In-Reply-To: <90A95AD3-DCC6-474C-A0E6-13347B13A2B3@linux.alibaba.com>

On Wed, Oct 19, 2022 at 04:03:42PM +0800, Gerry wrote:
>
>
>> 2022年10月19日 16:01,Jason Wang <jasowang@redhat.com> 写道:
>> 
>> On Wed, Oct 19, 2022 at 3:00 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>>> 
>>> On Tue, 18 Oct 2022 14:54:22 +0800, Jason Wang <jasowang@redhat.com> wrote:
>>>> On Mon, Oct 17, 2022 at 8:31 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>>>>> 
>>>>> On Mon, 17 Oct 2022 16:17:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
>>>>>> Adding Stefan.
>>>>>> 
>>>>>> 
>>>>>> On Mon, Oct 17, 2022 at 3:47 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>>>>>>> 
>>>>>>> Hello everyone,
>>>>>>> 
>>>>>>> # Background
>>>>>>> 
>>>>>>> Nowadays, there is a common scenario to accelerate communication between
>>>>>>> different VMs and containers, including light weight virtual machine based
>>>>>>> containers. One way to achieve this is to colocate them on the same host.
>>>>>>> However, the performance of inter-VM communication through network stack is not
>>>>>>> optimal and may also waste extra CPU cycles. This scenario has been discussed
>>>>>>> many times, but still no generic solution available [1] [2] [3].
>>>>>>> 
>>>>>>> With pci-ivshmem + SMC(Shared Memory Communications: [4]) based PoC[5],
>>>>>>> We found that by changing the communication channel between VMs from TCP to SMC
>>>>>>> with shared memory, we can achieve superior performance for a common
>>>>>>> socket-based application[5]:
>>>>>>>  - latency reduced by about 50%
>>>>>>>  - throughput increased by about 300%
>>>>>>>  - CPU consumption reduced by about 50%
>>>>>>> 
>>>>>>> Since there is no particularly suitable shared memory management solution
>>>>>>> matches the need for SMC(See ## Comparison with existing technology), and virtio
>>>>>>> is the standard for communication in the virtualization world, we want to
>>>>>>> implement a virtio-ism device based on virtio, which can support on-demand
>>>>>>> memory sharing across VMs, containers or VM-container. To match the needs of SMC,
>>>>>>> the virtio-ism device need to support:
>>>>>>> 
>>>>>>> 1. Dynamic provision: shared memory regions are dynamically allocated and
>>>>>>>   provisioned.
>>>>>>> 2. Multi-region management: the shared memory is divided into regions,
>>>>>>>   and a peer may allocate one or more regions from the same shared memory
>>>>>>>   device.
>>>>>>> 3. Permission control: The permission of each region can be set seperately.
>>>>>> 
>>>>>> Looks like virtio-ROCE
>>>>>> 
>>>>>> https://lore.kernel.org/all/20220511095900.343-1-xieyongji@bytedance.com/T/
>>>>>> 
>>>>>> and virtio-vhost-user can satisfy the requirement?
>>>>>> 
>>>>>>> 
>>>>>>> # Virtio ism device
>>>>>>> 
>>>>>>> ISM devices provide the ability to share memory between different guests on a
>>>>>>> host. A guest's memory got from ism device can be shared with multiple peers at
>>>>>>> the same time. This shared relationship can be dynamically created and released.
>>>>>>> 
>>>>>>> The shared memory obtained from the device is divided into multiple ism regions
>>>>>>> for share. ISM device provides a mechanism to notify other ism region referrers
>>>>>>> of content update events.
>>>>>>> 
>>>>>>> # Usage (SMC as example)
>>>>>>> 
>>>>>>> Maybe there is one of possible use cases:
>>>>>>> 
>>>>>>> 1. SMC calls the interface ism_alloc_region() of the ism driver to return the
>>>>>>>   location of a memory region in the PCI space and a token.
>>>>>>> 2. The ism driver mmap the memory region and return to SMC with the token
>>>>>>> 3. SMC passes the token to the connected peer
>>>>>>> 3. the peer calls the ism driver interface ism_attach_region(token) to
>>>>>>>   get the location of the PCI space of the shared memory
>>>>>>> 
>>>>>>> 
>>>>>>> # About hot plugging of the ism device
>>>>>>> 
>>>>>>>   Hot plugging of devices is a heavier, possibly failed, time-consuming, and
>>>>>>>   less scalable operation. So, we don't plan to support it for now.
>>>>>>> 
>>>>>>> # Comparison with existing technology
>>>>>>> 
>>>>>>> ## ivshmem or ivshmem 2.0 of Qemu
>>>>>>> 
>>>>>>>   1. ivshmem 1.0 is a large piece of memory that can be seen by all devices that
>>>>>>>   use this VM, so the security is not enough.
>>>>>>> 
>>>>>>>   2. ivshmem 2.0 is a shared memory belonging to a VM that can be read-only by all
>>>>>>>   other VMs that use the ivshmem 2.0 shared memory device, which also does not
>>>>>>>   meet our needs in terms of security.
>>>>>>> 
>>>>>>> ## vhost-pci and virtiovhostuser
>>>>>>> 
>>>>>>>   Does not support dynamic allocation and therefore not suitable for SMC.
>>>>>> 
>>>>>> I think this is an implementation issue, we can support VHOST IOTLB
>>>>>> message then the regions could be added/removed on demand.
>>>>> 
>>>>> 
>>>>> 1. After the attacker connects with the victim, if the attacker does not
>>>>>   dereference memory, the memory will be occupied under virtiovhostuser. In the
>>>>>   case of ism devices, the victim can directly release the reference, and the
>>>>>   maliciously referenced region only occupies the attacker's resources
>>>> 
>>>> Let's define the security boundary here. E.g do we trust the device or
>>>> not? If yes, in the case of virtiovhostuser, can we simple do
>>>> VHOST_IOTLB_UNMAP then we can safely release the memory from the
>>>> attacker.
>>>> 
>>>>> 
>>>>> 2. The ism device of a VM can be shared with multiple (1000+) VMs at the same
>>>>>   time, which is a challenge for virtiovhostuser
>>>> 
>>>> Please elaborate more the the challenges, anything make
>>>> virtiovhostuser different?
>>> 
>>> I understand (please point out any mistakes), one vvu device corresponds to one
>>> vm. If we share memory with 1000 vm, do we have 1000 vvu devices?
>> 
>> There could be some misunderstanding here. With 1000 VM, you still
>> need 1000 virtio-sim devices I think.
>We are trying to achieve one virtio-ism device per vm instead of one virtio-ism device per SMC connection.

I think we must achieve this if we want to meet the requirements of SMC.
In SMC, a SMC socket(Corresponding to a TCP socket) need 2 memory
regions(1 for Tx and 1 for Rx). So if we have 1K TCP connections,
we'll need 2K share memory regions, and those memory regions are
dynamically allocated and freed with the TCP socket.

>
>> 
>>> 
>>> 
>>>> 
>>>>> 
>>>>> 3. The sharing relationship of ism is dynamically increased, and virtiovhostuser
>>>>>   determines the sharing relationship at startup.
>>>> 
>>>> Not necessarily with IOTLB API?
>>> 
>>> Unlike virtio-vhost-user, which shares the memory of a vm with another vm, we
>>> provide the same memory on the host to two vms. So the implementation of this
>>> part will be much simpler. This is why we gave up virtio-vhost-user at the
>>> beginning.
>> 
>> Ok, just to make sure we're at the same page. From spec level,
>> virtio-vhost-user doesn't (can't) limit the backend to be implemented
>> in another VM. So it should be ok to be used for sharing memory
>> between a guest and host.
>> 
>> Thanks
>> 
>>> 
>>> Thanks.
>>> 
>>> 
>>>> 
>>>>> 
>>>>> 4. For security issues, the device under virtiovhostuser may mmap more memory,
>>>>>   while ism only maps one region to other devices
>>>> 
>>>> With VHOST_IOTLB_MAP, the map could be done per region.
>>>> 
>>>> Thanks
>>>> 
>>>>> 
>>>>> Thanks.
>>>>> 
>>>>>> 
>>>>>> Thanks
>>>>>> 
>>>>>>> 
>>>>>>> # Design
>>>>>>> 
>>>>>>>   This is a structure diagram based on ism sharing between two vms.
>>>>>>> 
>>>>>>>    |-------------------------------------------------------------------------------------------------------------|
>>>>>>>    | |------------------------------------------------|       |------------------------------------------------| |
>>>>>>>    | | Guest                                          |       | Guest                                          | |
>>>>>>>    | |                                                |       |                                                | |
>>>>>>>    | |   ----------------                             |       |   ----------------                             | |
>>>>>>>    | |   |    driver    |     [M1]   [M2]   [M3]      |       |   |    driver    |             [M2]   [M3]     | |
>>>>>>>    | |   ----------------       |      |      |       |       |   ----------------               |      |      | |
>>>>>>>    | |    |cq|                  |map   |map   |map    |       |    |cq|                          |map   |map   | |
>>>>>>>    | |    |  |                  |      |      |       |       |    |  |                          |      |      | |
>>>>>>>    | |    |  |                -------------------     |       |    |  |                --------------------    | |
>>>>>>>    | |----|--|----------------|  device memory  |-----|       |----|--|----------------|  device memory   |----| |
>>>>>>>    | |    |  |                -------------------     |       |    |  |                --------------------    | |
>>>>>>>    | |                                |               |       |                               |                | |
>>>>>>>    | |                                |               |       |                               |                | |
>>>>>>>    | | Qemu                           |               |       | Qemu                          |                | |
>>>>>>>    | |--------------------------------+---------------|       |-------------------------------+----------------| |
>>>>>>>    |                                  |                                                       |                  |
>>>>>>>    |                                  |                                                       |                  |
>>>>>>>    |                                  |------------------------------+------------------------|                  |
>>>>>>>    |                                                                 |                                           |
>>>>>>>    |                                                                 |                                           |
>>>>>>>    |                                                   --------------------------                                |
>>>>>>>    |                                                    | M1 |   | M2 |   | M3 |                                 |
>>>>>>>    |                                                   --------------------------                                |
>>>>>>>    |                                                                                                             |
>>>>>>>    | HOST                                                                                                        |
>>>>>>>    ---------------------------------------------------------------------------------------------------------------
>>>>>>> 
>>>>>>> # POC code
>>>>>>> 
>>>>>>>   Kernel: https://github.com/fengidri/linux-kernel-virtio-ism/commits/ism
>>>>>>>   Qemu:   https://github.com/fengidri/qemu/commits/ism
>>>>>>> 
>>>>>>> If there are any problems, please point them out.
>>>>>>> 
>>>>>>> Hope to hear from you, thank you.
>>>>>>> 
>>>>>>> [1] https://projectacrn.github.io/latest/tutorials/enable_ivshmem.html
>>>>>>> [2] https://dl.acm.org/doi/10.1145/2847562
>>>>>>> [3] https://hal.archives-ouvertes.fr/hal-00368622/document
>>>>>>> [4] https://lwn.net/Articles/711071/
>>>>>>> [5] https://lore.kernel.org/netdev/20220720170048.20806-1-tonylu@linux.alibaba.com/T/
>>>>>>> 
>>>>>>> 
>>>>>>> Xuan Zhuo (2):
>>>>>>>  Reserve device id for ISM device
>>>>>>>  virtio-ism: introduce new device virtio-ism
>>>>>>> 
>>>>>>> content.tex    |   3 +
>>>>>>> virtio-ism.tex | 340 +++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>> 2 files changed, 343 insertions(+)
>>>>>>> create mode 100644 virtio-ism.tex
>>>>>>> 
>>>>>>> --
>>>>>>> 2.32.0.3.g01195cf9f
>>>>>>> 
>>>>>>> 
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>>>>>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>>>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>>>>> 
>>>> 
>>> 

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


  parent reply	other threads:[~2022-10-19  8:21 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-17  7:47 [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device Xuan Zhuo
2022-10-17  7:47 ` [virtio-dev] [PATCH 1/2] Reserve device id for ISM device Xuan Zhuo
2022-10-17  7:47 ` [PATCH 2/2] virtio-ism: introduce new device virtio-ism Xuan Zhuo
2022-10-17  8:17 ` [virtio-dev] [PATCH 0/2] introduce virtio-ism: internal shared memory device Jason Wang
2022-10-17 12:26   ` Xuan Zhuo
2022-10-18  6:54     ` Jason Wang
2022-10-18  8:33       ` Gerry
2022-10-19  3:55         ` Jason Wang
2022-10-19  5:29           ` Gerry
2022-10-18  8:55       ` He Rongguang
2022-10-19  4:16         ` Jason Wang
2022-10-19  6:43       ` Xuan Zhuo
2022-10-19  8:01         ` Jason Wang
2022-10-19  8:03           ` Gerry
2022-10-19  8:14             ` Xuan Zhuo
2022-10-19  8:21             ` Dust Li [this message]
2022-10-19  9:08               ` Jason Wang
2022-10-19  9:10                 ` Xuan Zhuo
2022-10-19  9:15                   ` Jason Wang
2022-10-19  9:23                     ` Xuan Zhuo
2022-10-21  2:41                       ` Jason Wang
2022-10-21  2:53                         ` Gerry
2022-10-21  3:30                         ` Dust Li
2022-10-21  6:37                           ` Jason Wang
2022-10-21  9:26                             ` Dust Li
2022-10-19  8:13           ` Xuan Zhuo
2022-10-19  8:15             ` Xuan Zhuo
2022-10-19  9:11               ` Jason Wang
2022-10-19  9:15                 ` Xuan Zhuo
2022-10-21  2:42                   ` Jason Wang
2022-10-21  3:03                     ` Xuan Zhuo
2022-10-21  6:35                       ` Jason Wang
2022-10-18  3:15   ` dust.li
2022-10-18  7:29     ` Jason Wang
2022-10-19  2:34   ` Xuan Zhuo
2022-10-19  3:56     ` Jason Wang
2022-10-19  4:08       ` Xuan Zhuo
2022-10-19  4:36         ` Jason Wang
2022-10-19  6:02           ` Xuan Zhuo
2022-10-19  8:07             ` Tony Lu
2022-10-19  9:04               ` Jason Wang
2022-10-19  9:10                 ` Gerry
2022-10-19  9:13                   ` Jason Wang
2022-10-19 10:01                 ` Tony Lu
2022-10-21  2:47                   ` Jason Wang
2022-10-21  3:05                     ` Tony Lu
2022-10-21  3:07                       ` Jason Wang
2022-10-21  3:23                         ` Tony Lu
2022-10-21  3:09                       ` Jason Wang
2022-10-21  3:53                         ` Tony Lu
2022-10-21  4:54                           ` Dust Li
2022-10-21  5:13                             ` Tony Lu
2022-10-21  6:38                               ` Jason Wang
2022-10-19  4:30       ` Xuan Zhuo
2022-10-19  5:10         ` Jason Wang
2022-10-19  6:13           ` Xuan Zhuo
2022-10-18  7:32 ` Jan Kiszka
2022-11-14 21:30   ` Jan Kiszka
2022-11-16  2:13     ` Xuan Zhuo
2022-11-23 15:27       ` Jan Kiszka
2022-11-24  2:32         ` Xuan Zhuo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221019082136.GA63658@linux.alibaba.com \
    --to=dust.li@linux.alibaba.com \
    --cc=cohuck@redhat.com \
    --cc=gerry@linux.alibaba.com \
    --cc=hans@linux.alibaba.com \
    --cc=helinguo@linux.alibaba.com \
    --cc=herongguang@linux.alibaba.com \
    --cc=jasowang@redhat.com \
    --cc=mst@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=tonylu@linux.alibaba.com \
    --cc=virtio-dev@lists.oasis-open.org \
    --cc=xuanzhuo@linux.alibaba.com \
    --cc=zhenzao@linux.alibaba.com \
    --cc=zmlcc@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox