* [virtio-comment] About adding a new device type virtio-nvme
@ 2023-01-11 3:21 侯英乐
2023-01-11 15:16 ` Stefan Hajnoczi
0 siblings, 1 reply; 40+ messages in thread
From: 侯英乐 @ 2023-01-11 3:21 UTC (permalink / raw)
To: virtio-comment; +Cc: 侯英乐
hi all,
As we know, nvme has more features than virtio-blk. For example, with the development of virtualization IO offloading to hardware, virtio-blk and NVME-OF offloading to hardware are developing rapidly. So if virtio and nvme are combined into Virtio-NvMe, Is it necessary to add a device type Virtio-NvMe ?
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [virtio-comment] About adding a new device type virtio-nvme
2023-01-11 3:21 [virtio-comment] About adding a new device type virtio-nvme 侯英乐
@ 2023-01-11 15:16 ` Stefan Hajnoczi
2023-01-17 2:04 ` 侯英乐
0 siblings, 1 reply; 40+ messages in thread
From: Stefan Hajnoczi @ 2023-01-11 15:16 UTC (permalink / raw)
To: 侯英乐
Cc: virtio-comment, Christoph Hellwig, Keith Busch, Kevin Wolf,
Klaus Jensen, sgarzare, Michael S. Tsirkin
[-- Attachment #1: Type: text/plain, Size: 1321 bytes --]
On Wed, Jan 11, 2023 at 11:21:35AM +0800, 侯英乐 wrote:
> As we know, nvme has more features than virtio-blk. For example, with the development of virtualization IO offloading to hardware, virtio-blk and NVME-OF offloading to hardware are developing rapidly. So if virtio and nvme are combined into Virtio-NvMe, Is it necessary to add a device type Virtio-NvMe ?
Hi,
In theory, yes, virtio-nvme can be done. The question is why do it?
NVMe already provides a PCI hardware spec for software and hardware
implementations to follow. An NVMe PCI device can be exposed to the
guest and modern operating systems recognize it without requiring new
drivers.
The value of VIRTIO here is probably in the deep integration into the
virtualization stack with vDPA, vhost, etc. A virtio-nvme device can use
all these things whereas a PCI device needs to do everything from
scratch.
Let's not forget that virtio-blk is widely used and new commands are
being added as needed. Which NVMe features are you missing in
virtio-blk?
I guess this is why virtio-nvme hasn't been done before: people who want
NVMe can already do NVMe PCI, people who want VIRTIO can use virtio-blk,
and so there hasn't been a great need to combine VIRTIO and NVMe yet.
What advantages do you see in having virtio-nvme?
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Re: [virtio-comment] About adding a new device type virtio-nvme
2023-01-11 15:16 ` Stefan Hajnoczi
@ 2023-01-17 2:04 ` 侯英乐
2023-01-17 8:32 ` David Hildenbrand
` (2 more replies)
0 siblings, 3 replies; 40+ messages in thread
From: 侯英乐 @ 2023-01-17 2:04 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: virtio-comment, Christoph Hellwig, Keith Busch, Kevin Wolf,
Klaus Jensen, sgarzare, Michael S. Tsirkin
On Wed, 11 Jan 2023 10:16:55 -0500, Stefan wrote:
>>On Wed, Jan 11, 2023 at 11:21:35AM +0800, 侯英乐 wrote:
>> As we know, nvme has more features than virtio-blk. For example, with the development of virtualization IO offloading to hardware, virtio-blk and NVME-OF offloading to hardware >are developing rapidly. So if virtio and nvme are combined into Virtio-NvMe, Is it necessary to add a device type Virtio-NvMe ?
>Hi,
>In theory, yes, virtio-nvme can be done. The question is why do it?
>NVMe already provides a PCI hardware spec for software and hardware
>implementations to follow. An NVMe PCI device can be exposed to the
>guest and modern operating systems recognize it without requiring new
>drivers.
>The value of VIRTIO here is probably in the deep integration into the
>virtualization stack with vDPA, vhost, etc. A virtio-nvme device can use
>all these things whereas a PCI device needs to do everything from
>scratch.
The NVME technology and ecosystem are complete. However, in virtualization scenarios, NVME devices can only use PCIe pass-through . When NVME and virtio combine to connect to the vDPA ecosystem, live migration is supported.
>Let's not forget that virtio-blk is widely used and new commands are
>being added as needed. Which NVMe features are you missing in
>virtio-blk?
With the introduction of the concept of DPU, a large number of vendors are offloading virtual devices to hardware. The back-end of Virtio-blk does not support remote storage. Therefore, Virtio-Nvme-of can well combine the advantages of remote storage and virtio live migration
>I guess this is why virtio-nvme hasn't been done before: people who want
>NVMe can already do NVMe PCI, people who want VIRTIO can use virtio-blk,
>and so there hasn't been a great need to combine VIRTIO and NVMe yet.
>What advantages do you see in having virtio-nvme?
virtio-nvme advantages :
1) live migration
2) support remote storage
Leo Hou/侯英乐
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [virtio-comment] About adding a new device type virtio-nvme
2023-01-17 2:04 ` 侯英乐
@ 2023-01-17 8:32 ` David Hildenbrand
2023-01-17 9:30 ` 侯英乐
[not found] ` <202301171730174296359@sudoinfotech.com>
2023-01-17 16:01 ` Stefan Hajnoczi
2023-01-17 17:19 ` Max Gurtovoy
2 siblings, 2 replies; 40+ messages in thread
From: David Hildenbrand @ 2023-01-17 8:32 UTC (permalink / raw)
To: 侯英乐, Stefan Hajnoczi
Cc: virtio-comment, Christoph Hellwig, Keith Busch, Kevin Wolf,
Klaus Jensen, sgarzare, Michael S. Tsirkin
On 17.01.23 03:04, 侯英乐 wrote:
> On Wed, 11 Jan 2023 10:16:55 -0500, Stefan wrote:
>
>
>>> On Wed, Jan 11, 2023 at 11:21:35AM +0800, 侯英乐 wrote:
>>> As we know, nvme has more features than virtio-blk. For example, with the development of virtualization IO offloading to hardware, virtio-blk and NVME-OF offloading to hardware >are developing rapidly. So if virtio and nvme are combined into Virtio-NvMe, Is it necessary to add a device type Virtio-NvMe ?
>
>
>
>
>
>
>
>> Hi,
>> In theory, yes, virtio-nvme can be done. The question is why do it?
>
>
>
>> NVMe already provides a PCI hardware spec for software and hardware
>> implementations to follow. An NVMe PCI device can be exposed to the
>> guest and modern operating systems recognize it without requiring new
>> drivers.
>
>
>> The value of VIRTIO here is probably in the deep integration into the
>> virtualization stack with vDPA, vhost, etc. A virtio-nvme device can use
>> all these things whereas a PCI device needs to do everything from
>> scratch.
>
>
> The NVME technology and ecosystem are complete. However, in virtualization scenarios, NVME devices can only use PCIe pass-through . When NVME and virtio combine to connect to the vDPA ecosystem, live migration is supported.
>
>
>> Let's not forget that virtio-blk is widely used and new commands are
>> being added as needed. Which NVMe features are you missing in
>> virtio-blk?
>
> With the introduction of the concept of DPU, a large number of vendors are offloading virtual devices to hardware. The back-end of Virtio-blk does not support remote storage. Therefore, Virtio-Nvme-of can well combine the advantages of remote storage and virtio live migration
>
>
>
>> I guess this is why virtio-nvme hasn't been done before: people who want
>> NVMe can already do NVMe PCI, people who want VIRTIO can use virtio-blk,
>> and so there hasn't been a great need to combine VIRTIO and NVMe yet.
>
>
>> What advantages do you see in having virtio-nvme?
>
>
>
> virtio-nvme advantages :
> 1) live migration
> 2) support remote storage
At least 1) is an implementation detail in the NVME implementation in
the hypervisor. I suspect 2) in a similar way, or is there a fundamental
issue with that?
One problematic thing about the NVME implementation in QEMU is that it
will pin (via vfio) all guest RAM. Could that be avoided using
virtio-NVME, or what exactly would be the difference between virtio-nvme
and ordinary NVME?
--
Thanks,
David / dhildenb
This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.
In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.
Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Re: [virtio-comment] About adding a new device type virtio-nvme
2023-01-17 8:32 ` David Hildenbrand
@ 2023-01-17 9:30 ` 侯英乐
[not found] ` <202301171730174296359@sudoinfotech.com>
1 sibling, 0 replies; 40+ messages in thread
From: 侯英乐 @ 2023-01-17 9:30 UTC (permalink / raw)
To: David Hildenbrand, Stefan Hajnoczi
Cc: virtio-comment, Christoph Hellwig, Keith Busch, Kevin Wolf,
Klaus Jensen, sgarzare, Michael S. Tsirkin
On Tue, 17 Jan 2023 09:32:05 +0100,David wrote:
>On 17.01.23 03:04, 侯英乐 wrote:
>> virtio-nvme advantages :
>> 1) live migration
>> 2) support remote storage
>At least 1) is an implementation detail in the NVME implementation in
>the hypervisor. I suspect 2) in a similar way, or is there a fundamental
>issue with that?
>One problematic thing about the NVME implementation in QEMU is that it
>will pin (via vfio) all guest RAM. Could that be avoided using
>virtio-NVME, or what exactly would be the difference between virtio-nvme
>and ordinary NVME?
In the virtualization scenario where devices are offload to hardware:
NVME:
---------------------------------------------------------------------------------------------------------------------
_____________________________________________________________________________
| ___________________________________________________________ |
| | _____________________________________________________ | |
| | | | | |
| | | __________________________________ | | |
| | | | ______ | | | | ______
| | | User | | Mem |-----------------------|----|-|----|-----> | |
| | | | |______| SPDK | | | | (gVA) |______|
| | | | (gVA) | | | | | |
| | | |______|___________________________| | | | | |
| | |--GuestOS----------|--------------------------------| | | | |
| | | ______\/__________________________ | | | | |
| | VM | | VFIO | | | | | |
| User | | Kernel |___________ __________________| | | | | |
| | | | vfio-pci | | | | | | | |
| | | | (gIOVA) | | vfio_iommu_type1 | | | | | |
Software | | | |______|____|___|__________________| | | | | |
| | |___________________|________________________________| | | | |
| | ___________________|________________________________ | | | |
| | | ______|____ __________________ | | | \/ \/
| | | | \/ | | | | | | ______
| | | | NVME | | vIOMMU | | | | | |
| | | QEMU | Instance | | --|----|-|----|-----> |______|
| | | | (gIOVA) | | (gIOVA-->gPA) | | | | (gPA) | |
| | | |_____|_____| |__________________| | | | | |
| | |__________________|_________________________________| | | | |
| |_______________________|___________________________________| | | |
|---HostOS---------------------------|----------------------------------------| | |
| _______________________|___________________________________ | | |
| | | | | | |
| | | VFIO | | | |
| Kernel |_______________________|_____ _________________________| | | |
| | \/ | | | | | |
| | vfio-pci | | vfio_iommu_type1 | | | |
| | | | | | | | |
| |_______________________|_____|___|_________________________| | | |
|____________________________________|________________________________________| | |
-----------------------------------------------|-------------------------------------------------|----|--------------
____________________________________\/____ _________________________ ___\/___\/___________
| | | | | | | | | |
Hardware | | | DMA (gIOVA) --|---|-> IOMMU --|---------|->|______| |
| | |_____________________________| | (gIOVA-->hPA) | (hPA) | Physical Memory |
| DPU | | |_________________________| |_____________________|
| | NVME-of |
| |__________________________________|
| | |
|___________________________|______________|
--------------------------------------|------------------------------------------------------------------------------
| TCP (RDMA, and so on)
______________v__________
| |
Remote storage | |
| Network Storage |
| |
|_________________________|
---------------------------------------------------------------------------------------------------------------------
It is difficult to implement PCIe passthrough live migration.
virtio-nvme:
---------------------------------------------------------------------------------------------------------------------
_____________________________________________________________________________
| ___________________________________________________________ |
| | _____________________________________________________ | |
| | | | | |
| | | __________________________________ | | |
| | | | ______ | | | | ______
| | | User | | Mem |-----------------------|----|-|----|-----> | |
| | | | |______| SPDK | | | | (gVA) |______|
| | | | (gVA) | | | | | |
| | | |______|___________________________| | | | | |
| | |--GuestOS----------|--------------------------------| | | | |
| | | ______v___________________________ | | | | |
| | VM | | VFIO | | | | | |
| User | | Kernel |___________ __________________| | | | | |
| | | | vfio-pci | | | | | | | |
| | | | (gIOVA) | | vfio_iommu_type1 | | | | | |
Software | | | |______|____|___|__________________| | | | | |
| | |___________________|________________________________| | | | |
| | ___________________|________________________________ | | | |
| | | ______v____ __________________ | | | v v
| | | |virtio-NVME| | | | | | ______
| | | | Instance | | vIOMMU | | | | | |
| | | QEMU | | | --|----|-|----|-----> |______|
| | | | vhost-vdpa| | (gIOVA-->gPA) | | | | (gPA) | |
| | | |__(gIOVA)__| |__________________| | | | | |
| | |__________________|_________________________________| | | | |
| |_______________________|___________________________________| | | |
|---HostOS---------------------------|----------------------------------------| | |
| _______________________|___________________________________ | | |
| | | | | | |
| | | vDPA | | | |
| Kernel |_______________________v_____ _________________________| | | |
| | | | | | | | |
| | vdpa-device | | | | | | |
| | (Virtual device) | | | | | | |
| |_______________________|_____|___|_________________________| | | |
|____________________________________|________________________________________| | |
-----------------------------------------------|-------------------------------------------------|----|--------------
____________________________________v_____ _________________________ ___v____v____________
| | | | | | | | | |
Hardware | | | DMA (gIOVA) --|---|-> IOMMU --|---------|->|______| |
| | |_____________________________| | (gIOVA-->hPA) | (hPA) | Physical Memory |
| DPU | | |_________________________| |_____________________|
| | virtio-nvme-of |
| |__________________________________|
| | |
|___________________________|______________|
--------------------------------------|------------------------------------------------------------------------------
| TCP (RDMA, and so on)
______________v__________
| |
Remote storage | |
| Network Storage |
| |
|_________________________|
---------------------------------------------------------------------------------------------------------------------
Based on the vDPA framework, it supports live migration.
Thanks
Leo Hou/houyingle
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Re: [virtio-comment] About adding a new device type virtio-nvme
[not found] ` <202301171730174296359@sudoinfotech.com>
@ 2023-01-17 9:41 ` 侯英乐
2023-01-17 15:34 ` Stefan Hajnoczi
0 siblings, 1 reply; 40+ messages in thread
From: 侯英乐 @ 2023-01-17 9:41 UTC (permalink / raw)
To: David Hildenbrand, Stefan Hajnoczi
Cc: virtio-comment, Christoph Hellwig, Keith Busch, Kevin Wolf,
Klaus Jensen, sgarzare, Michael S. Tsirkin
On Tue, 17 Jan 2023 09:32:05 +0100,David wrote:
>On 17.01.23 03:04, 侯英乐 wrote:
>> virtio-nvme advantages :
>> 1) live migration
>> 2) support remote storage
>At least 1) is an implementation detail in the NVME implementation in
>the hypervisor. I suspect 2) in a similar way, or is there a fundamental
>issue with that?
>One problematic thing about the NVME implementation in QEMU is that it
>will pin (via vfio) all guest RAM. Could that be avoided using
>virtio-NVME, or what exactly would be the difference between virtio-nvme
>and ordinary NVME?
In the virtualization scenario where devices are offload to hardware:
NVME:
---------------------------------------------------------------------------------------------------------------------
_____________________________________________________________________________
| ___________________________________________________________ |
| | _____________________________________________________ | |
| | | | | |
| | | __________________________________ | | |
| | | | ______ | | | | ______
| | | User | | Mem |-----------------------|----|-|----|-----> | |
| | | | |______| SPDK | | | | (gVA) |______|
| | | | (gVA) | | | | | |
| | | |______|___________________________| | | | | |
| | |--GuestOS----------|--------------------------------| | | | |
| | | ______\/__________________________ | | | | |
| | VM | | VFIO | | | | | |
| User | | Kernel |___________ __________________| | | | | |
| | | | vfio-pci | | | | | | | |
| | | | (gIOVA) | | vfio_iommu_type1 | | | | | |
Software | | | |______|____|___|__________________| | | | | |
| | |___________________|________________________________| | | | |
| | ___________________|________________________________ | | | |
| | | ______|____ __________________ | | | \/ \/
| | | | \/ | | | | | | ______
| | | | NVME | | vIOMMU | | | | | |
| | | QEMU | Instance | | --|----|-|----|-----> |______|
| | | | (gIOVA) | | (gIOVA-->gPA) | | | | (gPA) | |
| | | |_____|_____| |__________________| | | | | |
| | |__________________|_________________________________| | | | |
| |_______________________|___________________________________| | | |
|---HostOS---------------------------|----------------------------------------| | |
| _______________________|___________________________________ | | |
| | | | | | |
| | | VFIO | | | |
| Kernel |_______________________|_____ _________________________| | | |
| | \/ | | | | | |
| | vfio-pci | | vfio_iommu_type1 | | | |
| | | | | | | | |
| |_______________________|_____|___|_________________________| | | |
|____________________________________|________________________________________| | |
-----------------------------------------------|-------------------------------------------------|----|--------------
____________________________________\/____ _________________________ ___\/___\/___________
| | | | | | | | | |
Hardware | | | DMA (gIOVA) --|---|-> IOMMU --|---------|->|______| |
| | |_____________________________| | (gIOVA-->hPA) | (hPA) | Physical Memory |
| DPU | | |_________________________| |_____________________|
| | NVME-of |
| |__________________________________|
| | |
|___________________________|______________|
--------------------------------------|------------------------------------------------------------------------------
| TCP (RDMA, and so on)
______________v__________
| |
Remote storage | |
| Network Storage |
| |
|_________________________|
---------------------------------------------------------------------------------------------------------------------
It is difficult to implement PCIe passthrough live migration.
virtio-nvme:
---------------------------------------------------------------------------------------------------------------------
_____________________________________________________________________________
| ___________________________________________________________ |
| | _____________________________________________________ | |
| | | | | |
| | | __________________________________ | | |
| | | | ______ | | | | ______
| | | User | | Mem |-----------------------|----|-|----|-----> | |
| | | | |______| SPDK | | | | (gVA) |______|
| | | | (gVA) | | | | | |
| | | |______|___________________________| | | | | |
| | |--GuestOS----------|--------------------------------| | | | |
| | | ______v___________________________ | | | | |
| | VM | | VFIO | | | | | |
| User | | Kernel |___________ __________________| | | | | |
| | | | vfio-pci | | | | | | | |
| | | | (gIOVA) | | vfio_iommu_type1 | | | | | |
Software | | | |______|____|___|__________________| | | | | |
| | |___________________|________________________________| | | | |
| | ___________________|________________________________ | | | |
| | | ______v____ __________________ | | | v v
| | | |virtio-NVME| | | | | | ______
| | | | Instance | | vIOMMU | | | | | |
| | | QEMU | | | --|----|-|----|-----> |______|
| | | | vhost-vdpa| | (gIOVA-->gPA) | | | | (gPA) | |
| | | |__(gIOVA)__| |__________________| | | | | |
| | |__________________|_________________________________| | | | |
| |_______________________|___________________________________| | | |
|---HostOS---------------------------|----------------------------------------| | |
| _______________________|___________________________________ | | |
| | | | | | |
| | | vDPA | | | |
| Kernel |_______________________v_____ _________________________| | | |
| | | | | | | | |
| | vdpa-device | | | | | | |
| | (Virtual device) | | | | | | |
| |_______________________|_____|___|_________________________| | | |
|____________________________________|________________________________________| | |
-----------------------------------------------|-------------------------------------------------|----|--------------
____________________________________v_____ _________________________ ___v____v____________
| | | | | | | | | |
Hardware | | | DMA (gIOVA) --|---|-> IOMMU --|---------|->|______| |
| | |_____________________________| | (gIOVA-->hPA) | (hPA) | Physical Memory |
| DPU | | |_________________________| |_____________________|
| | virtio-nvme-of |
| |__________________________________|
| | |
|___________________________|______________|
--------------------------------------|------------------------------------------------------------------------------
| TCP (RDMA, and so on)
______________v__________
| |
Remote storage | |
| Network Storage |
| |
|_________________________|
---------------------------------------------------------------------------------------------------------------------
Based on the vDPA framework, it supports live migration.
--
Thanks,
Leo Hou/houyingle
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Re: [virtio-comment] About adding a new device type virtio-nvme
2023-01-17 9:41 ` 侯英乐
@ 2023-01-17 15:34 ` Stefan Hajnoczi
2023-01-17 15:47 ` David Hildenbrand
2023-01-18 2:15 ` 侯英乐
0 siblings, 2 replies; 40+ messages in thread
From: Stefan Hajnoczi @ 2023-01-17 15:34 UTC (permalink / raw)
To: 侯英乐
Cc: David Hildenbrand, virtio-comment, Christoph Hellwig, Keith Busch,
Kevin Wolf, Klaus Jensen, sgarzare, Michael S. Tsirkin
[-- Attachment #1: Type: text/plain, Size: 13819 bytes --]
On Tue, Jan 17, 2023 at 05:41:57PM +0800, 侯英乐 wrote:
> On Tue, 17 Jan 2023 09:32:05 +0100,David wrote:
> >On 17.01.23 03:04, 侯英乐 wrote:
>
>
>
> >> virtio-nvme advantages :
> >> 1) live migration
> >> 2) support remote storage
>
>
>
> >At least 1) is an implementation detail in the NVME implementation in
> >the hypervisor. I suspect 2) in a similar way, or is there a fundamental
> >issue with that?
>
>
>
> >One problematic thing about the NVME implementation in QEMU is that it
> >will pin (via vfio) all guest RAM. Could that be avoided using
> >virtio-NVME, or what exactly would be the difference between virtio-nvme
> >and ordinary NVME?
>
>
>
> In the virtualization scenario where devices are offload to hardware:
>
>
> NVME:
> ---------------------------------------------------------------------------------------------------------------------
> _____________________________________________________________________________
> | ___________________________________________________________ |
> | | _____________________________________________________ | |
> | | | | | |
> | | | __________________________________ | | |
> | | | | ______ | | | | ______
> | | | User | | Mem |-----------------------|----|-|----|-----> | |
> | | | | |______| SPDK | | | | (gVA) |______|
> | | | | (gVA) | | | | | |
> | | | |______|___________________________| | | | | |
> | | |--GuestOS----------|--------------------------------| | | | |
> | | | ______\/__________________________ | | | | |
> | | VM | | VFIO | | | | | |
> | User | | Kernel |___________ __________________| | | | | |
> | | | | vfio-pci | | | | | | | |
> | | | | (gIOVA) | | vfio_iommu_type1 | | | | | |
> Software | | | |______|____|___|__________________| | | | | |
> | | |___________________|________________________________| | | | |
> | | ___________________|________________________________ | | | |
> | | | ______|____ __________________ | | | \/ \/
> | | | | \/ | | | | | | ______
> | | | | NVME | | vIOMMU | | | | | |
> | | | QEMU | Instance | | --|----|-|----|-----> |______|
> | | | | (gIOVA) | | (gIOVA-->gPA) | | | | (gPA) | |
> | | | |_____|_____| |__________________| | | | | |
> | | |__________________|_________________________________| | | | |
> | |_______________________|___________________________________| | | |
> |---HostOS---------------------------|----------------------------------------| | |
> | _______________________|___________________________________ | | |
> | | | | | | |
> | | | VFIO | | | |
> | Kernel |_______________________|_____ _________________________| | | |
> | | \/ | | | | | |
> | | vfio-pci | | vfio_iommu_type1 | | | |
> | | | | | | | | |
> | |_______________________|_____|___|_________________________| | | |
> |____________________________________|________________________________________| | |
> -----------------------------------------------|-------------------------------------------------|----|--------------
> ____________________________________\/____ _________________________ ___\/___\/___________
> | | | | | | | | | |
> Hardware | | | DMA (gIOVA) --|---|-> IOMMU --|---------|->|______| |
> | | |_____________________________| | (gIOVA-->hPA) | (hPA) | Physical Memory |
> | DPU | | |_________________________| |_____________________|
> | | NVME-of |
> | |__________________________________|
> | | |
> |___________________________|______________|
> --------------------------------------|------------------------------------------------------------------------------
> | TCP (RDMA, and so on)
> ______________v__________
> | |
> Remote storage | |
> | Network Storage |
> | |
> |_________________________|
>
> ---------------------------------------------------------------------------------------------------------------------
>
>
> It is difficult to implement PCIe passthrough live migration.
Linux commit 115dcec65f61d53e25e1bed5e380468b30f98b14 ("vfio: Define
device migration protocol v2") defines the VFIO migration API and it's
implemented by several drivers in the kernel.
Can you explain the difficulty of implementing PCIe passthrough live
migration in more detail?
>
>
>
>
> virtio-nvme:
> ---------------------------------------------------------------------------------------------------------------------
> _____________________________________________________________________________
> | ___________________________________________________________ |
> | | _____________________________________________________ | |
> | | | | | |
> | | | __________________________________ | | |
> | | | | ______ | | | | ______
> | | | User | | Mem |-----------------------|----|-|----|-----> | |
> | | | | |______| SPDK | | | | (gVA) |______|
> | | | | (gVA) | | | | | |
> | | | |______|___________________________| | | | | |
> | | |--GuestOS----------|--------------------------------| | | | |
> | | | ______v___________________________ | | | | |
> | | VM | | VFIO | | | | | |
> | User | | Kernel |___________ __________________| | | | | |
> | | | | vfio-pci | | | | | | | |
> | | | | (gIOVA) | | vfio_iommu_type1 | | | | | |
> Software | | | |______|____|___|__________________| | | | | |
> | | |___________________|________________________________| | | | |
> | | ___________________|________________________________ | | | |
> | | | ______v____ __________________ | | | v v
> | | | |virtio-NVME| | | | | | ______
> | | | | Instance | | vIOMMU | | | | | |
> | | | QEMU | | | --|----|-|----|-----> |______|
> | | | | vhost-vdpa| | (gIOVA-->gPA) | | | | (gPA) | |
> | | | |__(gIOVA)__| |__________________| | | | | |
> | | |__________________|_________________________________| | | | |
> | |_______________________|___________________________________| | | |
> |---HostOS---------------------------|----------------------------------------| | |
> | _______________________|___________________________________ | | |
> | | | | | | |
> | | | vDPA | | | |
> | Kernel |_______________________v_____ _________________________| | | |
> | | | | | | | | |
> | | vdpa-device | | | | | | |
> | | (Virtual device) | | | | | | |
> | |_______________________|_____|___|_________________________| | | |
> |____________________________________|________________________________________| | |
> -----------------------------------------------|-------------------------------------------------|----|--------------
> ____________________________________v_____ _________________________ ___v____v____________
> | | | | | | | | | |
> Hardware | | | DMA (gIOVA) --|---|-> IOMMU --|---------|->|______| |
> | | |_____________________________| | (gIOVA-->hPA) | (hPA) | Physical Memory |
> | DPU | | |_________________________| |_____________________|
> | | virtio-nvme-of |
> | |__________________________________|
> | | |
> |___________________________|______________|
> --------------------------------------|------------------------------------------------------------------------------
> | TCP (RDMA, and so on)
> ______________v__________
> | |
> Remote storage | |
> | Network Storage |
> | |
> |_________________________|
>
> ---------------------------------------------------------------------------------------------------------------------
> Based on the vDPA framework, it supports live migration.
The two diagrams are quite similar. Did you want to highlight a
difference between the two approaches in the diagram?
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [virtio-comment] About adding a new device type virtio-nvme
2023-01-17 15:34 ` Stefan Hajnoczi
@ 2023-01-17 15:47 ` David Hildenbrand
2023-01-18 2:38 ` 侯英乐
2023-01-18 2:15 ` 侯英乐
1 sibling, 1 reply; 40+ messages in thread
From: David Hildenbrand @ 2023-01-17 15:47 UTC (permalink / raw)
To: Stefan Hajnoczi, 侯英乐
Cc: virtio-comment, Christoph Hellwig, Keith Busch, Kevin Wolf,
Klaus Jensen, sgarzare, Michael S. Tsirkin
On 17.01.23 16:34, Stefan Hajnoczi wrote:
> On Tue, Jan 17, 2023 at 05:41:57PM +0800, 侯英乐 wrote:
>> On Tue, 17 Jan 2023 09:32:05 +0100,David wrote:
>>> On 17.01.23 03:04, 侯英乐 wrote:
>>
>>
>>
>>>> virtio-nvme advantages :
>>>> 1) live migration
>>>> 2) support remote storage
>>
>>
>>
>>> At least 1) is an implementation detail in the NVME implementation in
>>> the hypervisor. I suspect 2) in a similar way, or is there a fundamental
>>> issue with that?
>>
>>
>>
>>> One problematic thing about the NVME implementation in QEMU is that it
>>> will pin (via vfio) all guest RAM. Could that be avoided using
>>> virtio-NVME, or what exactly would be the difference between virtio-nvme
>>> and ordinary NVME?
>>
>>
>>
>> In the virtualization scenario where devices are offload to hardware:
>>
>>
>> NVME:
>> ---------------------------------------------------------------------------------------------------------------------
>> _____________________________________________________________________________
>> | ___________________________________________________________ |
>> | | _____________________________________________________ | |
>> | | | | | |
>> | | | __________________________________ | | |
>> | | | | ______ | | | | ______
>> | | | User | | Mem |-----------------------|----|-|----|-----> | |
>> | | | | |______| SPDK | | | | (gVA) |______|
>> | | | | (gVA) | | | | | |
>> | | | |______|___________________________| | | | | |
>> | | |--GuestOS----------|--------------------------------| | | | |
>> | | | ______\/__________________________ | | | | |
>> | | VM | | VFIO | | | | | |
>> | User | | Kernel |___________ __________________| | | | | |
>> | | | | vfio-pci | | | | | | | |
>> | | | | (gIOVA) | | vfio_iommu_type1 | | | | | |
>> Software | | | |______|____|___|__________________| | | | | |
>> | | |___________________|________________________________| | | | |
>> | | ___________________|________________________________ | | | |
>> | | | ______|____ __________________ | | | \/ \/
>> | | | | \/ | | | | | | ______
>> | | | | NVME | | vIOMMU | | | | | |
>> | | | QEMU | Instance | | --|----|-|----|-----> |______|
>> | | | | (gIOVA) | | (gIOVA-->gPA) | | | | (gPA) | |
>> | | | |_____|_____| |__________________| | | | | |
>> | | |__________________|_________________________________| | | | |
>> | |_______________________|___________________________________| | | |
>> |---HostOS---------------------------|----------------------------------------| | |
>> | _______________________|___________________________________ | | |
>> | | | | | | |
>> | | | VFIO | | | |
>> | Kernel |_______________________|_____ _________________________| | | |
>> | | \/ | | | | | |
>> | | vfio-pci | | vfio_iommu_type1 | | | |
>> | | | | | | | | |
>> | |_______________________|_____|___|_________________________| | | |
>> |____________________________________|________________________________________| | |
>> -----------------------------------------------|-------------------------------------------------|----|--------------
>> ____________________________________\/____ _________________________ ___\/___\/___________
>> | | | | | | | | | |
>> Hardware | | | DMA (gIOVA) --|---|-> IOMMU --|---------|->|______| |
>> | | |_____________________________| | (gIOVA-->hPA) | (hPA) | Physical Memory |
>> | DPU | | |_________________________| |_____________________|
>> | | NVME-of |
>> | |__________________________________|
>> | | |
>> |___________________________|______________|
>> --------------------------------------|------------------------------------------------------------------------------
>> | TCP (RDMA, and so on)
>> ______________v__________
>> | |
>> Remote storage | |
>> | Network Storage |
>> | |
>> |_________________________|
>>
>> ---------------------------------------------------------------------------------------------------------------------
>>
>>
>> It is difficult to implement PCIe passthrough live migration.
>
> Linux commit 115dcec65f61d53e25e1bed5e380468b30f98b14 ("vfio: Define
> device migration protocol v2") defines the VFIO migration API and it's
> implemented by several drivers in the kernel.
>
> Can you explain the difficulty of implementing PCIe passthrough live
> migration in more detail?
>
>>
>>
>>
>>
>> virtio-nvme:
>> ---------------------------------------------------------------------------------------------------------------------
>> _____________________________________________________________________________
>> | ___________________________________________________________ |
>> | | _____________________________________________________ | |
>> | | | | | |
>> | | | __________________________________ | | |
>> | | | | ______ | | | | ______
>> | | | User | | Mem |-----------------------|----|-|----|-----> | |
>> | | | | |______| SPDK | | | | (gVA) |______|
>> | | | | (gVA) | | | | | |
>> | | | |______|___________________________| | | | | |
>> | | |--GuestOS----------|--------------------------------| | | | |
>> | | | ______v___________________________ | | | | |
>> | | VM | | VFIO | | | | | |
>> | User | | Kernel |___________ __________________| | | | | |
>> | | | | vfio-pci | | | | | | | |
>> | | | | (gIOVA) | | vfio_iommu_type1 | | | | | |
>> Software | | | |______|____|___|__________________| | | | | |
>> | | |___________________|________________________________| | | | |
>> | | ___________________|________________________________ | | | |
>> | | | ______v____ __________________ | | | v v
>> | | | |virtio-NVME| | | | | | ______
>> | | | | Instance | | vIOMMU | | | | | |
>> | | | QEMU | | | --|----|-|----|-----> |______|
>> | | | | vhost-vdpa| | (gIOVA-->gPA) | | | | (gPA) | |
>> | | | |__(gIOVA)__| |__________________| | | | | |
>> | | |__________________|_________________________________| | | | |
>> | |_______________________|___________________________________| | | |
>> |---HostOS---------------------------|----------------------------------------| | |
>> | _______________________|___________________________________ | | |
>> | | | | | | |
>> | | | vDPA | | | |
>> | Kernel |_______________________v_____ _________________________| | | |
>> | | | | | | | | |
>> | | vdpa-device | | | | | | |
>> | | (Virtual device) | | | | | | |
>> | |_______________________|_____|___|_________________________| | | |
>> |____________________________________|________________________________________| | |
>> -----------------------------------------------|-------------------------------------------------|----|--------------
>> ____________________________________v_____ _________________________ ___v____v____________
>> | | | | | | | | | |
>> Hardware | | | DMA (gIOVA) --|---|-> IOMMU --|---------|->|______| |
>> | | |_____________________________| | (gIOVA-->hPA) | (hPA) | Physical Memory |
>> | DPU | | |_________________________| |_____________________|
>> | | virtio-nvme-of |
>> | |__________________________________|
>> | | |
>> |___________________________|______________|
>> --------------------------------------|------------------------------------------------------------------------------
>> | TCP (RDMA, and so on)
>> ______________v__________
>> | |
>> Remote storage | |
>> | Network Storage |
>> | |
>> |_________________________|
>>
>> ---------------------------------------------------------------------------------------------------------------------
>> Based on the vDPA framework, it supports live migration.
>
> The two diagrams are quite similar. Did you want to highlight a
> difference between the two approaches in the diagram?
I also wondered why virtio-nvme is exactly needed, and why one couldn't
write a different "backend" for an ordinary NVME device, that talks to
the vdpa-device.
--
Thanks,
David / dhildenb
This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.
In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.
Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Re: [virtio-comment] About adding a new device type virtio-nvme
2023-01-17 2:04 ` 侯英乐
2023-01-17 8:32 ` David Hildenbrand
@ 2023-01-17 16:01 ` Stefan Hajnoczi
[not found] ` <20230117162114.GA24976@lst.de>
` (2 more replies)
2023-01-17 17:19 ` Max Gurtovoy
2 siblings, 3 replies; 40+ messages in thread
From: Stefan Hajnoczi @ 2023-01-17 16:01 UTC (permalink / raw)
To: 侯英乐
Cc: virtio-comment, Christoph Hellwig, Keith Busch, Kevin Wolf,
Klaus Jensen, sgarzare, Michael S. Tsirkin
[-- Attachment #1: Type: text/plain, Size: 2348 bytes --]
On Tue, Jan 17, 2023 at 10:04:07AM +0800, 侯英乐 wrote:
> On Wed, 11 Jan 2023 10:16:55 -0500, Stefan wrote:
>
>
> >>On Wed, Jan 11, 2023 at 11:21:35AM +0800, 侯英乐 wrote:
> >> As we know, nvme has more features than virtio-blk. For example, with the development of virtualization IO offloading to hardware, virtio-blk and NVME-OF offloading to hardware >are developing rapidly. So if virtio and nvme are combined into Virtio-NvMe, Is it necessary to add a device type Virtio-NvMe ?
>
>
>
>
>
>
>
> >Hi,
> >In theory, yes, virtio-nvme can be done. The question is why do it?
>
>
>
> >NVMe already provides a PCI hardware spec for software and hardware
> >implementations to follow. An NVMe PCI device can be exposed to the
> >guest and modern operating systems recognize it without requiring new
> >drivers.
>
>
> >The value of VIRTIO here is probably in the deep integration into the
> >virtualization stack with vDPA, vhost, etc. A virtio-nvme device can use
> >all these things whereas a PCI device needs to do everything from
> >scratch.
>
>
> The NVME technology and ecosystem are complete. However, in virtualization scenarios, NVME devices can only use PCIe pass-through . When NVME and virtio combine to connect to the vDPA ecosystem, live migration is supported.
>
>
> >Let's not forget that virtio-blk is widely used and new commands are
> >being added as needed. Which NVMe features are you missing in
> >virtio-blk?
>
> With the introduction of the concept of DPU, a large number of vendors are offloading virtual devices to hardware. The back-end of Virtio-blk does not support remote storage. Therefore, Virtio-Nvme-of can well combine the advantages of remote storage and virtio live migration
virtio-blk is just a storage interface, whether that storage is local or
remote is up to the device implementation. The block device could be
located on Ceph, NFS, etc.
Each virtio-blk device is a single block device. There is no
standardized management protocol in virtio-blk for connecting to remote
block devices. I'm aware of hardware virtio-blk devices that connect to
remote storage. Configuration is performed through an out-of-band
management interface.
Maybe when you say virtio-blk doesn't support remote storage this is
what you mean?
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [virtio-comment] About adding a new device type virtio-nvme
[not found] ` <20230117162114.GA24976@lst.de>
@ 2023-01-17 16:53 ` Stefan Hajnoczi
2023-01-18 11:22 ` Michael S. Tsirkin
1 sibling, 0 replies; 40+ messages in thread
From: Stefan Hajnoczi @ 2023-01-17 16:53 UTC (permalink / raw)
To: Christoph Hellwig
Cc: 侯英乐, virtio-comment, Keith Busch, Kevin Wolf,
Klaus Jensen, sgarzare, Michael S. Tsirkin
[-- Attachment #1: Type: text/plain, Size: 1008 bytes --]
On Tue, Jan 17, 2023 at 05:21:14PM +0100, Christoph Hellwig wrote:
> On Tue, Jan 17, 2023 at 11:01:37AM -0500, Stefan Hajnoczi wrote:
> > Each virtio-blk device is a single block device. There is no
> > standardized management protocol in virtio-blk for connecting to remote
> > block devices. I'm aware of hardware virtio-blk devices that connect to
> > remote storage. Configuration is performed through an out-of-band
> > management interface.
> >
> > Maybe when you say virtio-blk doesn't support remote storage this is
> > what you mean?
>
> That makes sense, but the same is true of non-fabrics nvme transports
> like nvme-pci and a hypothetical nvme-virtio.
Just to confirm (my NVMe knowledge is not great), "6 Fabrics Command
Set", "5.22 Namespace Attachment command", etc cannot be used over
nvme-pci to connect remote storage if a device supports those commands?
https://nvmexpress.org/wp-content/uploads/NVM-Express-Base-Specification-2.0c-2022.10.04-Ratified.pdf
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [virtio-comment] About adding a new device type virtio-nvme
2023-01-17 2:04 ` 侯英乐
2023-01-17 8:32 ` David Hildenbrand
2023-01-17 16:01 ` Stefan Hajnoczi
@ 2023-01-17 17:19 ` Max Gurtovoy
2023-01-18 3:23 ` 侯英乐
2 siblings, 1 reply; 40+ messages in thread
From: Max Gurtovoy @ 2023-01-17 17:19 UTC (permalink / raw)
To: 侯英乐, Stefan Hajnoczi
Cc: virtio-comment, Christoph Hellwig, Keith Busch, Kevin Wolf,
Klaus Jensen, sgarzare, Michael S. Tsirkin
On 17/01/2023 4:04, 侯英乐 wrote:
> On Wed, 11 Jan 2023 10:16:55 -0500, Stefan wrote:
>
>
>>> On Wed, Jan 11, 2023 at 11:21:35AM +0800, 侯英乐 wrote:
>>> As we know, nvme has more features than virtio-blk. For example, with the development of virtualization IO offloading to hardware, virtio-blk and NVME-OF offloading to hardware >are developing rapidly. So if virtio and nvme are combined into Virtio-NvMe, Is it necessary to add a device type Virtio-NvMe ?
>
>
>
>
>
>
>> Hi,
>> In theory, yes, virtio-nvme can be done. The question is why do it?
>
>
>> NVMe already provides a PCI hardware spec for software and hardware
>> implementations to follow. An NVMe PCI device can be exposed to the
>> guest and modern operating systems recognize it without requiring new
>> drivers.
>
>> The value of VIRTIO here is probably in the deep integration into the
>> virtualization stack with vDPA, vhost, etc. A virtio-nvme device can use
>> all these things whereas a PCI device needs to do everything from
>> scratch.
>
> The NVME technology and ecosystem are complete. However, in virtualization scenarios, NVME devices can only use PCIe pass-through . When NVME and virtio combine to connect to the vDPA ecosystem, live migration is supported.
>
>
>> Let's not forget that virtio-blk is widely used and new commands are
>> being added as needed. Which NVMe features are you missing in
>> virtio-blk?
> With the introduction of the concept of DPU, a large number of vendors are offloading virtual devices to hardware. The back-end of Virtio-blk does not support remote storage. Therefore, Virtio-Nvme-of can well combine the advantages of remote storage and virtio live migration
>
>
>
>> I guess this is why virtio-nvme hasn't been done before: people who want
>> NVMe can already do NVMe PCI, people who want VIRTIO can use virtio-blk,
>> and so there hasn't been a great need to combine VIRTIO and NVMe yet.
>
>> What advantages do you see in having virtio-nvme?
>
>
> virtio-nvme advantages :
> 1) live migration
This is WIP and will use VFIO live migration framework.
> 2) support remote storage
There are solutions today that can use remote storage as an NVMe
Namespace. For example, DPU based NVMe device such as NVIDIA'S NVMe SNAP
device.
>
>
>
> Leo Hou/侯英乐
>
This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.
In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.
Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Re: [virtio-comment] About adding a new device type virtio-nvme
2023-01-17 15:34 ` Stefan Hajnoczi
2023-01-17 15:47 ` David Hildenbrand
@ 2023-01-18 2:15 ` 侯英乐
2023-01-18 14:08 ` Stefan Hajnoczi
` (2 more replies)
1 sibling, 3 replies; 40+ messages in thread
From: 侯英乐 @ 2023-01-18 2:15 UTC (permalink / raw)
To: Stefan Hajnoczi, jasowang
Cc: David Hildenbrand, virtio-comment, Christoph Hellwig, Keith Busch,
Kevin Wolf, Klaus Jensen, sgarzare, Michael S. Tsirkin
On Tue, 17 Jan 2023 10:34:09 -0500, Stefan wrote:
>On Tue, Jan 17, 2023 at 05:41:57PM +0800, 侯英乐 wrote:
>> On Tue, 17 Jan 2023 09:32:05 +0100,David wrote:
>> >On 17.01.23 03:04, 侯英乐 wrote:
>>
>>
>>
>> >> virtio-nvme advantages :
>> >> 1) live migration
>> >> 2) support remote storage
>>
>>
>>
>> >At least 1) is an implementation detail in the NVME implementation in
>> >the hypervisor. I suspect 2) in a similar way, or is there a fundamental
>> >issue with that?
>>
>>
>>
>> >One problematic thing about the NVME implementation in QEMU is that it
>> >will pin (via vfio) all guest RAM. Could that be avoided using
>> >virtio-NVME, or what exactly would be the difference between virtio-nvme
>> >and ordinary NVME?
>>
>>
>>
>> In the virtualization scenario where devices are offload to hardware:
>>
>>
>> NVME:
>> ---------------------------------------------------------------------------------------------------------------------
>> _____________________________________________________________________________
>> | ___________________________________________________________ |
>> | | _____________________________________________________ | |
>> | | | | | |
>> | | | __________________________________ | | |
>> | | | | ______ | | | | ______
>> | | | User | | Mem |-----------------------|----|-|----|-----> | |
>> | | | | |______| SPDK | | | | (gVA) |______|
>> | | | | (gVA) | | | | | |
>> | | | |______|___________________________| | | | | |
>> | | |--GuestOS----------|--------------------------------| | | | |
>> | | | ______\/__________________________ | | | | |
>> | | VM | | VFIO | | | | | |
>> | User | | Kernel |___________ __________________| | | | | |
>> | | | | vfio-pci | | | | | | | |
>> | | | | (gIOVA) | | vfio_iommu_type1 | | | | | |
>> Software | | | |______|____|___|__________________| | | | | |
>> | | |___________________|________________________________| | | | |
>> | | ___________________|________________________________ | | | |
>> | | | ______|____ __________________ | | | \/ \/
>> | | | | \/ | | | | | | ______
>> | | | | NVME | | vIOMMU | | | | | |
>> | | | QEMU | Instance | | --|----|-|----|-----> |______|
>> | | | | (gIOVA) | | (gIOVA-->gPA) | | | | (gPA) | |
>> | | | |_____|_____| |__________________| | | | | |
>> | | |__________________|_________________________________| | | | |
>> | |_______________________|___________________________________| | | |
>> |---HostOS---------------------------|----------------------------------------| | |
>> | _______________________|___________________________________ | | |
>> | | | | | | |
>> | | | VFIO | | | |
>> | Kernel |_______________________|_____ _________________________| | | |
>> | | \/ | | | | | |
>> | | vfio-pci | | vfio_iommu_type1 | | | |
>> | | | | | | | | |
>> | |_______________________|_____|___|_________________________| | | |
>> |____________________________________|________________________________________| | |
>> -----------------------------------------------|-------------------------------------------------|----|--------------
>> ____________________________________\/____ _________________________ ___\/___\/___________
>> | | | | | | | | | |
>> Hardware | | | DMA (gIOVA) --|---|-> IOMMU --|---------|->|______| |
>> | | |_____________________________| | (gIOVA-->hPA) | (hPA) | Physical Memory |
>> | DPU | | |_________________________| |_____________________|
>> | | NVME-of |
>> | |__________________________________|
>> | | |
>> |___________________________|______________|
>> --------------------------------------|------------------------------------------------------------------------------
>> | TCP (RDMA, and so on)
>> ______________v__________
>> | |
>> Remote storage | |
>> | Network Storage |
>> | |
>> |_________________________|
>>
>> ---------------------------------------------------------------------------------------------------------------------
>>
>>
>> It is difficult to implement PCIe passthrough live migration.
>Linux commit 115dcec65f61d53e25e1bed5e380468b30f98b14 ("vfio: Define
>device migration protocol v2") defines the VFIO migration API and it's
>implemented by several drivers in the kernel.
Yes, this commit supports VFIO live migration, but the feature is a work in progress,
recent submission: https://lore.kernel.org/all/20230116141135.12021-10-avihaih@nvidia.com/
>Can you explain the difficulty of implementing PCIe passthrough live
>migration in more detail?
VFIO live migration requires IOMMU to support dirty page tracking. Currently,
no IOMMU device supports this feature. So, VFIO live migration will take a long time.
Detailed information reference:https://www.qemu.org/docs/master/devel/vfio-migration.html
>> virtio-nvme:
>> ---------------------------------------------------------------------------------------------------------------------
>> _____________________________________________________________________________
>> | ___________________________________________________________ |
>> | | _____________________________________________________ | |
>> | | | | | |
>> | | | __________________________________ | | |
>> | | | | ______ | | | | ______
>> | | | User | | Mem |-----------------------|----|-|----|-----> | |
>> | | | | |______| SPDK | | | | (gVA) |______|
>> | | | | (gVA) | | | | | |
>> | | | |______|___________________________| | | | | |
>> | | |--GuestOS----------|--------------------------------| | | | |
>> | | | ______v___________________________ | | | | |
>> | | VM | | VFIO | | | | | |
>> | User | | Kernel |___________ __________________| | | | | |
>> | | | | vfio-pci | | | | | | | |
>> | | | | (gIOVA) | | vfio_iommu_type1 | | | | | |
>> Software | | | |______|____|___|__________________| | | | | |
>> | | |___________________|________________________________| | | | |
>> | | ___________________|________________________________ | | | |
>> | | | ______v____ __________________ | | | v v
>> | | | |virtio-NVME| | | | | | ______
>> | | | | Instance | | vIOMMU | | | | | |
>> | | | QEMU | | | --|----|-|----|-----> |______|
>> | | | | vhost-vdpa| | (gIOVA-->gPA) | | | | (gPA) | |
>> | | | |__(gIOVA)__| |__________________| | | | | |
>> | | |__________________|_________________________________| | | | |
>> | |_______________________|___________________________________| | | |
>> |---HostOS---------------------------|----------------------------------------| | |
>> | _______________________|___________________________________ | | |
>> | | | | | | |
>> | | | vDPA | | | |
>> | Kernel |_______________________v_____ _________________________| | | |
>> | | | | | | | | |
>> | | vdpa-device | | | | | | |
>> | | (Virtual device) | | | | | | |
>> | |_______________________|_____|___|_________________________| | | |
>> |____________________________________|________________________________________| | |
>> -----------------------------------------------|-------------------------------------------------|----|--------------
>> ____________________________________v_____ _________________________ ___v____v____________
>> | | | | | | | | | |
>> Hardware | | | DMA (gIOVA) --|---|-> IOMMU --|---------|->|______| |
>> | | |_____________________________| | (gIOVA-->hPA) | (hPA) | Physical Memory |
>> | DPU | | |_________________________| |_____________________|
>> | | virtio-nvme-of |
>> | |__________________________________|
>> | | |
>> |___________________________|______________|
>> --------------------------------------|------------------------------------------------------------------------------
>> | TCP (RDMA, and so on)
>> ______________v__________
>> | |
>> Remote storage | |
>> | Network Storage |
>> | |
>> |_________________________|
>>
>> ---------------------------------------------------------------------------------------------------------------------
>> Based on the vDPA framework, it supports live migration.
>The two diagrams are quite similar. Did you want to highlight a
>difference between the two approaches in the diagram?
The biggest difference is the VFIO and vDPA frameworks. The vDPA (virtio data path acceleration) kernel framework
is a pillar in productizing the end-to-end vDPA solution and it enables NIC vendors to integrate their vDPA NIC kernel
drivers into the framework as part of their productization efforts.
Detailed information reference:https://www.redhat.com/en/blog/introduction-vdpa-kernel-framework
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Re: [virtio-comment] About adding a new device type virtio-nvme
2023-01-17 15:47 ` David Hildenbrand
@ 2023-01-18 2:38 ` 侯英乐
0 siblings, 0 replies; 40+ messages in thread
From: 侯英乐 @ 2023-01-18 2:38 UTC (permalink / raw)
To: David Hildenbrand, Stefan Hajnoczi
Cc: virtio-comment, Christoph Hellwig, Keith Busch, Kevin Wolf,
Klaus Jensen, sgarzare, Michael S. Tsirkin
On Tue, 17 Jan 2023 16:47:23 +0100, David wrote:
>On 17.01.23 16:34, Stefan Hajnoczi wrote:
>> On Tue, Jan 17, 2023 at 05:41:57PM +0800, 侯英乐 wrote:
>>> On Tue, 17 Jan 2023 09:32:05 +0100,David wrote:
>>>> On 17.01.23 03:04, 侯英乐 wrote:
>>>
>>>
>>>
>>>>> virtio-nvme advantages :
>>>>> 1) live migration
>>>>> 2) support remote storage
>>>
>>>
>>>
>>>> At least 1) is an implementation detail in the NVME implementation in
>>>> the hypervisor. I suspect 2) in a similar way, or is there a fundamental
>>>> issue with that?
>>>
>>>
>>>
>>>> One problematic thing about the NVME implementation in QEMU is that it
>>>> will pin (via vfio) all guest RAM. Could that be avoided using
>>>> virtio-NVME, or what exactly would be the difference between virtio-nvme
>>>> and ordinary NVME?
>>>
>>>
>>>
>>> In the virtualization scenario where devices are offload to hardware:
>>>
>>>
>>> NVME:
>>> ---------------------------------------------------------------------------------------------------------------------
>>> _____________________________________________________________________________
>>> | ___________________________________________________________ |
>>> | | _____________________________________________________ | |
>>> | | | | | |
>>> | | | __________________________________ | | |
>>> | | | | ______ | | | | ______
>>> | | | User | | Mem |-----------------------|----|-|----|-----> | |
>>> | | | | |______| SPDK | | | | (gVA) |______|
>>> | | | | (gVA) | | | | | |
>>> | | | |______|___________________________| | | | | |
>>> | | |--GuestOS----------|--------------------------------| | | | |
>>> | | | ______\/__________________________ | | | | |
>>> | | VM | | VFIO | | | | | |
>>> | User | | Kernel |___________ __________________| | | | | |
>>> | | | | vfio-pci | | | | | | | |
>>> | | | | (gIOVA) | | vfio_iommu_type1 | | | | | |
>>> Software | | | |______|____|___|__________________| | | | | |
>>> | | |___________________|________________________________| | | | |
>>> | | ___________________|________________________________ | | | |
>>> | | | ______|____ __________________ | | | \/ \/
>>> | | | | \/ | | | | | | ______
>>> | | | | NVME | | vIOMMU | | | | | |
>>> | | | QEMU | Instance | | --|----|-|----|-----> |______|
>>> | | | | (gIOVA) | | (gIOVA-->gPA) | | | | (gPA) | |
>>> | | | |_____|_____| |__________________| | | | | |
>>> | | |__________________|_________________________________| | | | |
>>> | |_______________________|___________________________________| | | |
>>> |---HostOS---------------------------|----------------------------------------| | |
>>> | _______________________|___________________________________ | | |
>>> | | | | | | |
>>> | | | VFIO | | | |
>>> | Kernel |_______________________|_____ _________________________| | | |
>>> | | \/ | | | | | |
>>> | | vfio-pci | | vfio_iommu_type1 | | | |
>>> | | | | | | | | |
>>> | |_______________________|_____|___|_________________________| | | |
>>> |____________________________________|________________________________________| | |
>>> -----------------------------------------------|-------------------------------------------------|----|--------------
>>> ____________________________________\/____ _________________________ ___\/___\/___________
>>> | | | | | | | | | |
>>> Hardware | | | DMA (gIOVA) --|---|-> IOMMU --|---------|->|______| |
>>> | | |_____________________________| | (gIOVA-->hPA) | (hPA) | Physical Memory |
>>> | DPU | | |_________________________| |_____________________|
>>> | | NVME-of |
>>> | |__________________________________|
>>> | | |
>>> |___________________________|______________|
>>> --------------------------------------|------------------------------------------------------------------------------
>>> | TCP (RDMA, and so on)
>>> ______________v__________
>>> | |
>>> Remote storage | |
>>> | Network Storage |
>>> | |
>>> |_________________________|
>>>
>>> ---------------------------------------------------------------------------------------------------------------------
>>>
>>>
>>> It is difficult to implement PCIe passthrough live migration.
>>
>> Linux commit 115dcec65f61d53e25e1bed5e380468b30f98b14 ("vfio: Define
>> device migration protocol v2") defines the VFIO migration API and it's
>> implemented by several drivers in the kernel.
>>
>>Can you explain the difficulty of implementing PCIe passthrough live
>> migration in more detail?
>>
>>>
>>>
>>>
>>>
>>> virtio-nvme:
>>> ---------------------------------------------------------------------------------------------------------------------
>>> _____________________________________________________________________________
>>> | ___________________________________________________________ |
>>> | | _____________________________________________________ | |
>>> | | | | | |
>>> | | | __________________________________ | | |
>>> | | | | ______ | | | | ______
>>> | | | User | | Mem |-----------------------|----|-|----|-----> | |
>>> | | | | |______| SPDK | | | | (gVA) |______|
>>> | | | | (gVA) | | | | | |
>>> | | | |______|___________________________| | | | | |
>>> | | |--GuestOS----------|--------------------------------| | | | |
>>> | | | ______v___________________________ | | | | |
>>> | | VM | | VFIO | | | | | |
>>> | User | | Kernel |___________ __________________| | | | | |
>>> | | | | vfio-pci | | | | | | | |
>>> | | | | (gIOVA) | | vfio_iommu_type1 | | | | | |
>>> Software | | | |______|____|___|__________________| | | | | |
>>> | | |___________________|________________________________| | | | |
>>> | | ___________________|________________________________ | | | |
>>> | | | ______v____ __________________ | | | v v
>>> | | | |virtio-NVME| | | | | | ______
>>> | | | | Instance | | vIOMMU | | | | | |
>>> | | | QEMU | | | --|----|-|----|-----> |______|
>>> | | | | vhost-vdpa| | (gIOVA-->gPA) | | | | (gPA) | |
>>> | | | |__(gIOVA)__| |__________________| | | | | |
>>> | | |__________________|_________________________________| | | | |
>>> | |_______________________|___________________________________| | | |
>>> |---HostOS---------------------------|----------------------------------------| | |
>>> | _______________________|___________________________________ | | |
>>> | | | | | | |
>>> | | | vDPA | | | |
>>> | Kernel |_______________________v_____ _________________________| | | |
>>> | | | | | | | | |
>>> | | vdpa-device | | | | | | |
>>> | | (Virtual device) | | | | | | |
>>> | |_______________________|_____|___|_________________________| | | |
>>> |____________________________________|________________________________________| | |
>>> -----------------------------------------------|-------------------------------------------------|----|--------------
>>> ____________________________________v_____ _________________________ ___v____v____________
>>> | | | | | | | | | |
>>> Hardware | | | DMA (gIOVA) --|---|-> IOMMU --|---------|->|______| |
>>> | | |_____________________________| | (gIOVA-->hPA) | (hPA) | Physical Memory |
>>> | DPU | | |_________________________| |_____________________|
>>> | | virtio-nvme-of |
>>> | |__________________________________|
>>> | | |
>>> |___________________________|______________|
>>> --------------------------------------|------------------------------------------------------------------------------
>>> | TCP (RDMA, and so on)
>>> ______________v__________
>>> | |
>>> Remote storage | |
>>> | Network Storage |
>>> | |
>>> |_________________________|
>>>
>>> ---------------------------------------------------------------------------------------------------------------------
>>> Based on the vDPA framework, it supports live migration.
>>
>> The two diagrams are quite similar. Did you want to highlight a
>> difference between the two approaches in the diagram?
>I also wondered why virtio-nvme is exactly needed,
Traditional NVME devices can only rely on the VFIO framework for
pcie passthrough in virtualization scenarios.The current VFIO framework
does not support live migration well.
In addition to the VFIO-based PCIe passthrough, virtio-nvme physical
hardware can be combined with the vDPA framework to achieve data
plane acceleration.
>and why one couldn't
>write a different "backend" for an ordinary NVME device, that talks to
>the vdpa-device.
The vDPA (virtio data path acceleration) kernel framework is a pillar in
productizing the end-to-end vDPA solution and it enables NIC vendors
to integrate their vDPA NIC kernel drivers into the framework as part of
their productization efforts.
So NVME can't talks to vDPA framework.
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Re: [virtio-comment] About adding a new device type virtio-nvme
2023-01-17 16:01 ` Stefan Hajnoczi
[not found] ` <20230117162114.GA24976@lst.de>
@ 2023-01-18 2:49 ` 侯英乐
2023-02-05 12:33 ` Michael S. Tsirkin
2023-01-19 11:42 ` Michael S. Tsirkin
2 siblings, 1 reply; 40+ messages in thread
From: 侯英乐 @ 2023-01-18 2:49 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: virtio-comment, Christoph Hellwig, Keith Busch, Kevin Wolf,
Klaus Jensen, sgarzare, Michael S. Tsirkin
On Tue, 17 Jan 2023 11:01:37 -0500, Stefan wrote:
>On Tue, Jan 17, 2023 at 10:04:07AM +0800, 侯英乐 wrote:
>> On Wed, 11 Jan 2023 10:16:55 -0500, Stefan wrote:
>>
>>
>> >>On Wed, Jan 11, 2023 at 11:21:35AM +0800, 侯英乐 wrote:
>> >> As we know, nvme has more features than virtio-blk. For example, with the development of virtualization IO offloading to hardware, virtio-blk and NVME-OF offloading to hardware >are developing rapidly. So if virtio and nvme are combined into Virtio-NvMe, Is it necessary to add a device type Virtio-NvMe ?
>>
>>
>>
>>
>>
>>
>>
>> >Hi,
>> >In theory, yes, virtio-nvme can be done. The question is why do it?
>>
>>
>>
>> >NVMe already provides a PCI hardware spec for software and hardware
>> >implementations to follow. An NVMe PCI device can be exposed to the
>> >guest and modern operating systems recognize it without requiring new
>> >drivers.
>>
>>
>> >The value of VIRTIO here is probably in the deep integration into the
>> >virtualization stack with vDPA, vhost, etc. A virtio-nvme device can use
>> >all these things whereas a PCI device needs to do everything from
>> >scratch.
>>
>> The NVME technology and ecosystem are complete. However, in virtualization scenarios, NVME devices can only use PCIe pass-through . When NVME and virtio combine to connect to the vDPA ecosystem, live migration is supported.
>>
>>
>> >Let's not forget that virtio-blk is widely used and new commands are
>> >being added as needed. Which NVMe features are you missing in
>> >virtio-blk?
>>
>> With the introduction of the concept of DPU, a large number of vendors are offloading virtual devices to hardware. The back-end of Virtio-blk does not support remote storage. Therefore, Virtio-Nvme-of can well combine the advantages of remote storage and virtio live migration
>
>virtio-blk is just a storage interface, whether that storage is local or
>remote is up to the device implementation. The block device could be
>located on Ceph, NFS, etc.
>
>Each virtio-blk device is a single block device. There is no
>standardized management protocol in virtio-blk for connecting to remote
>block devices. I'm aware of hardware virtio-blk devices that connect to
>remote storage. Configuration is performed through an out-of-band
>management interface.
>
>Maybe when you say virtio-blk doesn't support remote storage this is
>what you mean?
Yes, virtio-blk devices offlaod to hardware, For example, DPU and SmartNIC.
So, compare virtio-nvme with virtio-blk and NVME.
Stefan
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Re: [virtio-comment] About adding a new device type virtio-nvme
2023-01-17 17:19 ` Max Gurtovoy
@ 2023-01-18 3:23 ` 侯英乐
2023-01-18 10:09 ` Max Gurtovoy
2023-01-18 10:28 ` Michael S. Tsirkin
0 siblings, 2 replies; 40+ messages in thread
From: 侯英乐 @ 2023-01-18 3:23 UTC (permalink / raw)
To: Max Gurtovoy, Stefan Hajnoczi
Cc: virtio-comment, Christoph Hellwig, Keith Busch, Kevin Wolf,
Klaus Jensen, sgarzare, Michael S. Tsirkin
On Tue, 17 Jan 2023 19:19:59 +0200, Max Gurtovoy wrote:
>On 17/01/2023 4:04, 侯英乐 wrote:
>> On Wed, 11 Jan 2023 10:16:55 -0500, Stefan wrote:
>>
>>
>>>> On Wed, Jan 11, 2023 at 11:21:35AM +0800, 侯英乐 wrote:
>>>> As we know, nvme has more features than virtio-blk. For example, with the development of virtualization IO offloading to hardware, virtio-blk and NVME-OF offloading to hardware >are developing rapidly. So if virtio and nvme are combined into Virtio-NvMe, Is it necessary to add a device type Virtio-NvMe ?
>>
>>
>>
>>
>>
>>
>>> Hi,
>>> In theory, yes, virtio-nvme can be done. The question is why do it?
>>
>>
>>> NVMe already provides a PCI hardware spec for software and hardware
>>> implementations to follow. An NVMe PCI device can be exposed to the
>>> guest and modern operating systems recognize it without requiring new
>>> drivers.
>>
>>> The value of VIRTIO here is probably in the deep integration into the
>>> virtualization stack with vDPA, vhost, etc. A virtio-nvme device can use
>>> all these things whereas a PCI device needs to do everything from
>>> scratch.
>>
>> The NVME technology and ecosystem are complete. However, in virtualization scenarios, NVME devices can only use PCIe pass-through . When NVME and virtio combine to connect to the vDPA ecosystem, live migration is supported.
>>
>>
>>> Let's not forget that virtio-blk is widely used and new commands are
>>> being added as needed. Which NVMe features are you missing in
>>> virtio-blk?
>> With the introduction of the concept of DPU, a large number of vendors are offloading virtual devices to hardware. The back-end of Virtio-blk does not support remote storage. Therefore, Virtio-Nvme-of can well combine the advantages of remote storage and virtio live migration
>>
>>
>>
>>> I guess this is why virtio-nvme hasn't been done before: people who want
>>> NVMe can already do NVMe PCI, people who want VIRTIO can use virtio-blk,
>>> and so there hasn't been a great need to combine VIRTIO and NVMe yet.
>>
>>> What advantages do you see in having virtio-nvme?
>>
>>
>> virtio-nvme advantages :
>> 1) live migration
>
>
>This is WIP and will use VFIO live migration framework.
Yes, VFIO live migration framework is WIP, but I still think vdpa is a friendlier framework.
>
>
>> 2) support remote storage
>
>
>There are solutions today that can use remote storage as an NVMe
>Namespace. For example, DPU based NVMe device such as NVIDIA'S NVMe SNAP
>device.
Yes, you're right. Nvme has a built-in advantage over virtio-blk hardware offloading.
The reason why I propose Virtio-NVMe is to combine nvme and virtio, so that NVME
can adapt to virtio ecosystem based on virtio interface specifications, such as vdpa.
--
Leo Hou/houyingle
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [virtio-comment] About adding a new device type virtio-nvme
2023-01-18 3:23 ` 侯英乐
@ 2023-01-18 10:09 ` Max Gurtovoy
2023-01-18 11:12 ` Michael S. Tsirkin
2023-01-19 10:19 ` 侯英乐
2023-01-18 10:28 ` Michael S. Tsirkin
1 sibling, 2 replies; 40+ messages in thread
From: Max Gurtovoy @ 2023-01-18 10:09 UTC (permalink / raw)
To: 侯英乐, Stefan Hajnoczi
Cc: virtio-comment, Christoph Hellwig, Keith Busch, Kevin Wolf,
Klaus Jensen, sgarzare, Michael S. Tsirkin
On 18/01/2023 5:23, 侯英乐 wrote:
> On Tue, 17 Jan 2023 19:19:59 +0200, Max Gurtovoy wrote:
>
>> On 17/01/2023 4:04, 侯英乐 wrote:
>>> On Wed, 11 Jan 2023 10:16:55 -0500, Stefan wrote:
>>>>> On Wed, Jan 11, 2023 at 11:21:35AM +0800, 侯英乐 wrote:
>>>>> As we know, nvme has more features than virtio-blk. For example, with the development of virtualization IO offloading to hardware, virtio-blk and NVME-OF offloading to hardware >are developing rapidly. So if virtio and nvme are combined into Virtio-NvMe, Is it necessary to add a device type Virtio-NvMe ?
>>>
>>>
>>>> Hi,
>>>> In theory, yes, virtio-nvme can be done. The question is why do it?
>>>> NVMe already provides a PCI hardware spec for software and hardware
>>>> implementations to follow. An NVMe PCI device can be exposed to the
>>>> guest and modern operating systems recognize it without requiring new
>>>> drivers.
>>>> The value of VIRTIO here is probably in the deep integration into the
>>>> virtualization stack with vDPA, vhost, etc. A virtio-nvme device can use
>>>> all these things whereas a PCI device needs to do everything from
>>>> scratch.
>>> The NVME technology and ecosystem are complete. However, in virtualization scenarios, NVME devices can only use PCIe pass-through . When NVME and virtio combine to connect to the vDPA ecosystem, live migration is supported.
>>>> Let's not forget that virtio-blk is widely used and new commands are
>>>> being added as needed. Which NVMe features are you missing in
>>>> virtio-blk?
>>> With the introduction of the concept of DPU, a large number of vendors are offloading virtual devices to hardware. The back-end of Virtio-blk does not support remote storage. Therefore, Virtio-Nvme-of can well combine the advantages of remote storage and virtio live migration
>>>> I guess this is why virtio-nvme hasn't been done before: people who want
>>>> NVMe can already do NVMe PCI, people who want VIRTIO can use virtio-blk,
>>>> and so there hasn't been a great need to combine VIRTIO and NVMe yet.
>>>> What advantages do you see in having virtio-nvme?
>>> virtio-nvme advantages :
>>> 1) live migration
>>
>> This is WIP and will use VFIO live migration framework.
> Yes, VFIO live migration framework is WIP, but I still think vdpa is a friendlier framework.
Not sure what you consider friendly ?
The community agreed that in SR-IOV - VF migration is done via PF interface.
Any device specific migration (e.g. vdpa/virtio) is not as generic as
VFIO migration. Also it will be maintained by a smaller group of engineers.
If you would like to use vdpa - I suggest using virtio-blk and not
inventing virtio-nvme device that will for sure be with less feature set
than pure NVMe.
In case you're missing some feature in virtio-blk that exist in NVMe,
you're welcome to submit a proposal to the technical group with that
feature.
NVIDIA also has a DPU based physical Virtio-blk device (NVIDIA'S
virtio-blk SNAP) that support SR-IOV and remote storage access.
Live migration specification is WIP in both NVMe and Virtio working
groups. I can't say who will be merge first.
>
>
>>
>>> 2) support remote storage
>> There are solutions today that can use remote storage as an NVMe
>> Namespace. For example, DPU based NVMe device such as NVIDIA'S NVMe SNAP
>> device.
>
>
> Yes, you're right. Nvme has a built-in advantage over virtio-blk hardware offloading.
> The reason why I propose Virtio-NVMe is to combine nvme and virtio, so that NVME
> can adapt to virtio ecosystem based on virtio interface specifications, such as vdpa.
>
>
>
> --
> Leo Hou/houyingle
>
This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.
In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.
Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Re: [virtio-comment] About adding a new device type virtio-nvme
2023-01-18 3:23 ` 侯英乐
2023-01-18 10:09 ` Max Gurtovoy
@ 2023-01-18 10:28 ` Michael S. Tsirkin
1 sibling, 0 replies; 40+ messages in thread
From: Michael S. Tsirkin @ 2023-01-18 10:28 UTC (permalink / raw)
To: 侯英乐
Cc: Max Gurtovoy, Stefan Hajnoczi, virtio-comment, Christoph Hellwig,
Keith Busch, Kevin Wolf, Klaus Jensen, sgarzare
On Wed, Jan 18, 2023 at 11:23:43AM +0800, 侯英乐 wrote:
> Yes, VFIO live migration framework is WIP, but I still think vdpa is a friendlier framework.
Thanks, we try!
--
MST
This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.
In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.
Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [virtio-comment] About adding a new device type virtio-nvme
2023-01-18 10:09 ` Max Gurtovoy
@ 2023-01-18 11:12 ` Michael S. Tsirkin
2023-01-18 11:27 ` Max Gurtovoy
2023-01-19 10:19 ` 侯英乐
1 sibling, 1 reply; 40+ messages in thread
From: Michael S. Tsirkin @ 2023-01-18 11:12 UTC (permalink / raw)
To: Max Gurtovoy
Cc: 侯英乐, Stefan Hajnoczi, virtio-comment,
Christoph Hellwig, Keith Busch, Kevin Wolf, Klaus Jensen,
sgarzare
On Wed, Jan 18, 2023 at 12:09:59PM +0200, Max Gurtovoy wrote:
> The community agreed that in SR-IOV - VF migration is done via PF interface.
Which community? You are sending this to virtio comment ML.
At this point we don't yet have finalized support for migration in
the virtio spec, just some WIP. While yes, this WIP is using
PF for migration of VFs I think it is premature to claim that
it's a done deal. E.g. Jason Wang is interested in looking
into an in-band memory mapped transport for that. Probably in
addition to the option of using the PF not instead of that.
--
MST
This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.
In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.
Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [virtio-comment] About adding a new device type virtio-nvme
[not found] ` <20230117162114.GA24976@lst.de>
2023-01-17 16:53 ` Stefan Hajnoczi
@ 2023-01-18 11:22 ` Michael S. Tsirkin
1 sibling, 0 replies; 40+ messages in thread
From: Michael S. Tsirkin @ 2023-01-18 11:22 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Stefan Hajnoczi, 侯英乐, virtio-comment,
Keith Busch, Kevin Wolf, Klaus Jensen, sgarzare
On Tue, Jan 17, 2023 at 05:21:14PM +0100, Christoph Hellwig wrote:
> On Tue, Jan 17, 2023 at 11:01:37AM -0500, Stefan Hajnoczi wrote:
> > Each virtio-blk device is a single block device. There is no
> > standardized management protocol in virtio-blk for connecting to remote
> > block devices. I'm aware of hardware virtio-blk devices that connect to
> > remote storage. Configuration is performed through an out-of-band
> > management interface.
> >
> > Maybe when you say virtio-blk doesn't support remote storage this is
> > what you mean?
>
> That makes sense, but the same is true of non-fabrics nvme transports
> like nvme-pci and a hypothetical nvme-virtio.
I can hypothesise and come up more with theoretical use-cases
but I feel the question remains what motivates this proposal
initially.
--
MST
This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.
In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.
Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [virtio-comment] About adding a new device type virtio-nvme
2023-01-18 11:12 ` Michael S. Tsirkin
@ 2023-01-18 11:27 ` Max Gurtovoy
2023-01-18 13:29 ` Michael S. Tsirkin
0 siblings, 1 reply; 40+ messages in thread
From: Max Gurtovoy @ 2023-01-18 11:27 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: 侯英乐, Stefan Hajnoczi, virtio-comment,
Christoph Hellwig, Keith Busch, Kevin Wolf, Klaus Jensen,
sgarzare
On 18/01/2023 13:12, Michael S. Tsirkin wrote:
> On Wed, Jan 18, 2023 at 12:09:59PM +0200, Max Gurtovoy wrote:
>> The community agreed that in SR-IOV - VF migration is done via PF interface.
> Which community? You are sending this to virtio comment ML.
I meant Linux community is going to a direction of management device +
managed device couple.
Management device is managing the migration process mastered by the
migration SW.
Managed device is in the guest that is not aware of the process.
> At this point we don't yet have finalized support for migration in
> the virtio spec, just some WIP. While yes, this WIP is using
> PF for migration of VFs I think it is premature to claim that
> it's a done deal. E.g. Jason Wang is interested in looking
> into an in-band memory mapped transport for that. Probably in
> addition to the option of using the PF not instead of that.
I'm aware of the process that tries to make vDPA migration be merged
before virtio but this just says that virtio will be behind other
specifications that do believe that using a management device + managed
device is the right thing to do.
What I don't understand is why we can't start merging the infrastructure
we worked on almost 2 years in parallel to unrelated vDPA efforts.
This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.
In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.
Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [virtio-comment] About adding a new device type virtio-nvme
2023-01-18 11:27 ` Max Gurtovoy
@ 2023-01-18 13:29 ` Michael S. Tsirkin
0 siblings, 0 replies; 40+ messages in thread
From: Michael S. Tsirkin @ 2023-01-18 13:29 UTC (permalink / raw)
To: Max Gurtovoy
Cc: 侯英乐, Stefan Hajnoczi, virtio-comment,
Christoph Hellwig, Keith Busch, Kevin Wolf, Klaus Jensen,
sgarzare
On Wed, Jan 18, 2023 at 01:27:35PM +0200, Max Gurtovoy wrote:
> What I don't understand is why we can't start merging the infrastructure we
> worked on almost 2 years in parallel to unrelated vDPA efforts.
Is all this pertinent to the virtio specification in some way?
--
MST
This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.
In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.
Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Re: [virtio-comment] About adding a new device type virtio-nvme
2023-01-18 2:15 ` 侯英乐
@ 2023-01-18 14:08 ` Stefan Hajnoczi
2023-01-19 8:31 ` 侯英乐
2023-01-18 14:14 ` Stefan Hajnoczi
2023-01-19 3:38 ` Jason Wang
2 siblings, 1 reply; 40+ messages in thread
From: Stefan Hajnoczi @ 2023-01-18 14:08 UTC (permalink / raw)
To: 侯英乐
Cc: jasowang, David Hildenbrand, virtio-comment, Christoph Hellwig,
Keith Busch, Kevin Wolf, Klaus Jensen, sgarzare,
Michael S. Tsirkin, Alex Williamson
[-- Attachment #1: Type: text/plain, Size: 11494 bytes --]
On Wed, Jan 18, 2023 at 10:15:12AM +0800, 侯英乐 wrote:
> On Tue, 17 Jan 2023 10:34:09 -0500, Stefan wrote:
>
> >On Tue, Jan 17, 2023 at 05:41:57PM +0800, 侯英乐 wrote:
> >> On Tue, 17 Jan 2023 09:32:05 +0100,David wrote:
> >> >On 17.01.23 03:04, 侯英乐 wrote:
> >>
> >>
> >>
> >> >> virtio-nvme advantages :
> >> >> 1) live migration
> >> >> 2) support remote storage
> >>
> >>
> >>
> >> >At least 1) is an implementation detail in the NVME implementation in
> >> >the hypervisor. I suspect 2) in a similar way, or is there a fundamental
> >> >issue with that?
> >>
> >>
> >>
> >> >One problematic thing about the NVME implementation in QEMU is that it
> >> >will pin (via vfio) all guest RAM. Could that be avoided using
> >> >virtio-NVME, or what exactly would be the difference between virtio-nvme
> >> >and ordinary NVME?
> >>
> >>
> >>
> >> In the virtualization scenario where devices are offload to hardware:
> >>
> >>
> >> NVME:
> >> ---------------------------------------------------------------------------------------------------------------------
> >> _____________________________________________________________________________
> >> | ___________________________________________________________ |
> >> | | _____________________________________________________ | |
> >> | | | | | |
> >> | | | __________________________________ | | |
> >> | | | | ______ | | | | ______
> >> | | | User | | Mem |-----------------------|----|-|----|-----> | |
> >> | | | | |______| SPDK | | | | (gVA) |______|
> >> | | | | (gVA) | | | | | |
> >> | | | |______|___________________________| | | | | |
> >> | | |--GuestOS----------|--------------------------------| | | | |
> >> | | | ______\/__________________________ | | | | |
> >> | | VM | | VFIO | | | | | |
> >> | User | | Kernel |___________ __________________| | | | | |
> >> | | | | vfio-pci | | | | | | | |
> >> | | | | (gIOVA) | | vfio_iommu_type1 | | | | | |
> >> Software | | | |______|____|___|__________________| | | | | |
> >> | | |___________________|________________________________| | | | |
> >> | | ___________________|________________________________ | | | |
> >> | | | ______|____ __________________ | | | \/ \/
> >> | | | | \/ | | | | | | ______
> >> | | | | NVME | | vIOMMU | | | | | |
> >> | | | QEMU | Instance | | --|----|-|----|-----> |______|
> >> | | | | (gIOVA) | | (gIOVA-->gPA) | | | | (gPA) | |
> >> | | | |_____|_____| |__________________| | | | | |
> >> | | |__________________|_________________________________| | | | |
> >> | |_______________________|___________________________________| | | |
> >> |---HostOS---------------------------|----------------------------------------| | |
> >> | _______________________|___________________________________ | | |
> >> | | | | | | |
> >> | | | VFIO | | | |
> >> | Kernel |_______________________|_____ _________________________| | | |
> >> | | \/ | | | | | |
> >> | | vfio-pci | | vfio_iommu_type1 | | | |
> >> | | | | | | | | |
> >> | |_______________________|_____|___|_________________________| | | |
> >> |____________________________________|________________________________________| | |
> >> -----------------------------------------------|-------------------------------------------------|----|--------------
> >> ____________________________________\/____ _________________________ ___\/___\/___________
> >> | | | | | | | | | |
> >> Hardware | | | DMA (gIOVA) --|---|-> IOMMU --|---------|->|______| |
> >> | | |_____________________________| | (gIOVA-->hPA) | (hPA) | Physical Memory |
> >> | DPU | | |_________________________| |_____________________|
> >> | | NVME-of |
> >> | |__________________________________|
> >> | | |
> >> |___________________________|______________|
> >> --------------------------------------|------------------------------------------------------------------------------
> >> | TCP (RDMA, and so on)
> >> ______________v__________
> >> | |
> >> Remote storage | |
> >> | Network Storage |
> >> | |
> >> |_________________________|
> >>
> >> ---------------------------------------------------------------------------------------------------------------------
> >>
> >>
> >> It is difficult to implement PCIe passthrough live migration.
>
>
>
>
> >Linux commit 115dcec65f61d53e25e1bed5e380468b30f98b14 ("vfio: Define
> >device migration protocol v2") defines the VFIO migration API and it's
> >implemented by several drivers in the kernel.
>
>
> Yes, this commit supports VFIO live migration, but the feature is a work in progress,
> recent submission: https://lore.kernel.org/all/20230116141135.12021-10-avihaih@nvidia.com/
>
>
> >Can you explain the difficulty of implementing PCIe passthrough live
> >migration in more detail?
>
> VFIO live migration requires IOMMU to support dirty page tracking. Currently,
> no IOMMU device supports this feature. So, VFIO live migration will take a long time.
> Detailed information reference:https://www.qemu.org/docs/master/devel/vfio-migration.html
Can physical devices can do their own dirty page tracking in the
meantime since they know which pages are being written to?
I have CCed Alex Williamson regarding VFIO.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Re: [virtio-comment] About adding a new device type virtio-nvme
2023-01-18 2:15 ` 侯英乐
2023-01-18 14:08 ` Stefan Hajnoczi
@ 2023-01-18 14:14 ` Stefan Hajnoczi
2023-01-19 3:40 ` Jason Wang
2023-01-19 9:03 ` 侯英乐
2023-01-19 3:38 ` Jason Wang
2 siblings, 2 replies; 40+ messages in thread
From: Stefan Hajnoczi @ 2023-01-18 14:14 UTC (permalink / raw)
To: 侯英乐
Cc: jasowang, David Hildenbrand, virtio-comment, Christoph Hellwig,
Keith Busch, Kevin Wolf, Klaus Jensen, sgarzare,
Michael S. Tsirkin
[-- Attachment #1: Type: text/plain, Size: 1034 bytes --]
On Wed, Jan 18, 2023 at 10:15:12AM +0800, 侯英乐 wrote:
> On Tue, 17 Jan 2023 10:34:09 -0500, Stefan wrote:
> >On Tue, Jan 17, 2023 at 05:41:57PM +0800, 侯英乐 wrote:
> >> On Tue, 17 Jan 2023 09:32:05 +0100,David wrote:
> >> >On 17.01.23 03:04, 侯英乐 wrote:
> >The two diagrams are quite similar. Did you want to highlight a
>
> >difference between the two approaches in the diagram?
>
> The biggest difference is the VFIO and vDPA frameworks. The vDPA (virtio data path acceleration) kernel framework
> is a pillar in productizing the end-to-end vDPA solution and it enables NIC vendors to integrate their vDPA NIC kernel
> drivers into the framework as part of their productization efforts.
> Detailed information reference:https://www.redhat.com/en/blog/introduction-vdpa-kernel-framework
For the sake of the argument, let's assume VFIO can't be used in your
situation so vDPA is required. The part I don't understand is which
specific NVMe features you need that virtio-blk lacks?
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [virtio-comment] About adding a new device type virtio-nvme
2023-01-18 2:15 ` 侯英乐
2023-01-18 14:08 ` Stefan Hajnoczi
2023-01-18 14:14 ` Stefan Hajnoczi
@ 2023-01-19 3:38 ` Jason Wang
2023-01-19 7:22 ` 侯英乐
2 siblings, 1 reply; 40+ messages in thread
From: Jason Wang @ 2023-01-19 3:38 UTC (permalink / raw)
To: virtio-comment
在 2023/1/18 10:15, 侯英乐 写道:
> On Tue, 17 Jan 2023 10:34:09 -0500, Stefan wrote:
>
>> On Tue, Jan 17, 2023 at 05:41:57PM +0800, 侯英乐 wrote:
>>> On Tue, 17 Jan 2023 09:32:05 +0100,David wrote:
>>>> On 17.01.23 03:04, 侯英乐 wrote:
>>>
>>>
>>>>> virtio-nvme advantages :
>>>>> 1) live migration
>>>>> 2) support remote storage
>>>
>>>
>>>> At least 1) is an implementation detail in the NVME implementation in
>>>> the hypervisor. I suspect 2) in a similar way, or is there a fundamental
>>>> issue with that?
>>>
>>>
>>>> One problematic thing about the NVME implementation in QEMU is that it
>>>> will pin (via vfio) all guest RAM. Could that be avoided using
>>>> virtio-NVME, or what exactly would be the difference between virtio-nvme
>>>> and ordinary NVME?
>>>
>>>
>>> In the virtualization scenario where devices are offload to hardware:
>>>
>>>
>>> NVME:
>>> ---------------------------------------------------------------------------------------------------------------------
>>> _____________________________________________________________________________
>>> | ___________________________________________________________ |
>>> | | _____________________________________________________ | |
>>> | | | | | |
>>> | | | __________________________________ | | |
>>> | | | | ______ | | | | ______
>>> | | | User | | Mem |-----------------------|----|-|----|-----> | |
>>> | | | | |______| SPDK | | | | (gVA) |______|
>>> | | | | (gVA) | | | | | |
>>> | | | |______|___________________________| | | | | |
>>> | | |--GuestOS----------|--------------------------------| | | | |
>>> | | | ______\/__________________________ | | | | |
>>> | | VM | | VFIO | | | | | |
>>> | User | | Kernel |___________ __________________| | | | | |
>>> | | | | vfio-pci | | | | | | | |
>>> | | | | (gIOVA) | | vfio_iommu_type1 | | | | | |
>>> Software | | | |______|____|___|__________________| | | | | |
>>> | | |___________________|________________________________| | | | |
>>> | | ___________________|________________________________ | | | |
>>> | | | ______|____ __________________ | | | \/ \/
>>> | | | | \/ | | | | | | ______
>>> | | | | NVME | | vIOMMU | | | | | |
>>> | | | QEMU | Instance | | --|----|-|----|-----> |______|
>>> | | | | (gIOVA) | | (gIOVA-->gPA) | | | | (gPA) | |
>>> | | | |_____|_____| |__________________| | | | | |
>>> | | |__________________|_________________________________| | | | |
>>> | |_______________________|___________________________________| | | |
>>> |---HostOS---------------------------|----------------------------------------| | |
>>> | _______________________|___________________________________ | | |
>>> | | | | | | |
>>> | | | VFIO | | | |
>>> | Kernel |_______________________|_____ _________________________| | | |
>>> | | \/ | | | | | |
>>> | | vfio-pci | | vfio_iommu_type1 | | | |
>>> | | | | | | | | |
>>> | |_______________________|_____|___|_________________________| | | |
>>> |____________________________________|________________________________________| | |
>>> -----------------------------------------------|-------------------------------------------------|----|--------------
>>> ____________________________________\/____ _________________________ ___\/___\/___________
>>> | | | | | | | | | |
>>> Hardware | | | DMA (gIOVA) --|---|-> IOMMU --|---------|->|______| |
>>> | | |_____________________________| | (gIOVA-->hPA) | (hPA) | Physical Memory |
>>> | DPU | | |_________________________| |_____________________|
>>> | | NVME-of |
>>> | |__________________________________|
>>> | | |
>>> |___________________________|______________|
>>> --------------------------------------|------------------------------------------------------------------------------
>>> | TCP (RDMA, and so on)
>>> ______________v__________
>>> | |
>>> Remote storage | |
>>> | Network Storage |
>>> | |
>>> |_________________________|
>>>
>>> ---------------------------------------------------------------------------------------------------------------------
>>>
>>>
>>> It is difficult to implement PCIe passthrough live migration.
>
>
>
>> Linux commit 115dcec65f61d53e25e1bed5e380468b30f98b14 ("vfio: Define
>> device migration protocol v2") defines the VFIO migration API and it's
>> implemented by several drivers in the kernel.
>
> Yes, this commit supports VFIO live migration, but the feature is a work in progress,
> recent submission: https://lore.kernel.org/all/20230116141135.12021-10-avihaih@nvidia.com/
>
>
>> Can you explain the difficulty of implementing PCIe passthrough live
>> migration in more detail?
> VFIO live migration requires IOMMU to support dirty page tracking. Currently,
> no IOMMU device supports this feature. So, VFIO live migration will take a long time.
> Detailed information reference:https://www.qemu.org/docs/master/devel/vfio-migration.html
>
>
>
>
>>> virtio-nvme:
>>> ---------------------------------------------------------------------------------------------------------------------
>>> _____________________________________________________________________________
>>> | ___________________________________________________________ |
>>> | | _____________________________________________________ | |
>>> | | | | | |
>>> | | | __________________________________ | | |
>>> | | | | ______ | | | | ______
>>> | | | User | | Mem |-----------------------|----|-|----|-----> | |
>>> | | | | |______| SPDK | | | | (gVA) |______|
>>> | | | | (gVA) | | | | | |
>>> | | | |______|___________________________| | | | | |
>>> | | |--GuestOS----------|--------------------------------| | | | |
>>> | | | ______v___________________________ | | | | |
>>> | | VM | | VFIO | | | | | |
>>> | User | | Kernel |___________ __________________| | | | | |
>>> | | | | vfio-pci | | | | | | | |
>>> | | | | (gIOVA) | | vfio_iommu_type1 | | | | | |
>>> Software | | | |______|____|___|__________________| | | | | |
>>> | | |___________________|________________________________| | | | |
>>> | | ___________________|________________________________ | | | |
>>> | | | ______v____ __________________ | | | v v
>>> | | | |virtio-NVME| | | | | | ______
>>> | | | | Instance | | vIOMMU | | | | | |
>>> | | | QEMU | | | --|----|-|----|-----> |______|
>>> | | | | vhost-vdpa| | (gIOVA-->gPA) | | | | (gPA) | |
>>> | | | |__(gIOVA)__| |__________________| | | | | |
>>> | | |__________________|_________________________________| | | | |
>>> | |_______________________|___________________________________| | | |
>>> |---HostOS---------------------------|----------------------------------------| | |
>>> | _______________________|___________________________________ | | |
>>> | | | | | | |
>>> | | | vDPA | | | |
>>> | Kernel |_______________________v_____ _________________________| | | |
>>> | | | | | | | | |
>>> | | vdpa-device | | | | | | |
>>> | | (Virtual device) | | | | | | |
>>> | |_______________________|_____|___|_________________________| | | |
>>> |____________________________________|________________________________________| | |
>>> -----------------------------------------------|-------------------------------------------------|----|--------------
>>> ____________________________________v_____ _________________________ ___v____v____________
>>> | | | | | | | | | |
>>> Hardware | | | DMA (gIOVA) --|---|-> IOMMU --|---------|->|______| |
>>> | | |_____________________________| | (gIOVA-->hPA) | (hPA) | Physical Memory |
>>> | DPU | | |_________________________| |_____________________|
>>> | | virtio-nvme-of |
>>> | |__________________________________|
>>> | | |
>>> |___________________________|______________|
>>> --------------------------------------|------------------------------------------------------------------------------
>>> | TCP (RDMA, and so on)
>>> ______________v__________
>>> | |
>>> Remote storage | |
>>> | Network Storage |
>>> | |
>>> |_________________________|
>>>
>>> ---------------------------------------------------------------------------------------------------------------------
>>> Based on the vDPA framework, it supports live migration.
>
>
>
>
>> The two diagrams are quite similar. Did you want to highlight a
>> difference between the two approaches in the diagram?
> The biggest difference is the VFIO and vDPA frameworks. The vDPA (virtio data path acceleration) kernel framework
> is a pillar in productizing the end-to-end vDPA solution and it enables NIC vendors to integrate their vDPA NIC kernel
> drivers into the framework as part of their productization efforts.
> Detailed information reference:https://www.redhat.com/en/blog/introduction-vdpa-kernel-framework
Note that vDPA is not solely software concept but also a hardware one.
And with vDPA kernel support the device can talks to kernel I/O
subsystem via virtio-vDPA bus drivers.
Thanks
>
>
>
>
>
>
>
>
>
This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.
In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.
Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [virtio-comment] About adding a new device type virtio-nvme
2023-01-18 14:14 ` Stefan Hajnoczi
@ 2023-01-19 3:40 ` Jason Wang
2023-01-19 16:59 ` Stefan Hajnoczi
2023-01-19 9:03 ` 侯英乐
1 sibling, 1 reply; 40+ messages in thread
From: Jason Wang @ 2023-01-19 3:40 UTC (permalink / raw)
To: Stefan Hajnoczi, 侯英乐
Cc: David Hildenbrand, virtio-comment, Christoph Hellwig, Keith Busch,
Kevin Wolf, Klaus Jensen, sgarzare, Michael S. Tsirkin
在 2023/1/18 22:14, Stefan Hajnoczi 写道:
> On Wed, Jan 18, 2023 at 10:15:12AM +0800, 侯英乐 wrote:
>> On Tue, 17 Jan 2023 10:34:09 -0500, Stefan wrote:
>>> On Tue, Jan 17, 2023 at 05:41:57PM +0800, 侯英乐 wrote:
>>>> On Tue, 17 Jan 2023 09:32:05 +0100,David wrote:
>>>>> On 17.01.23 03:04, 侯英乐 wrote:
>>> The two diagrams are quite similar. Did you want to highlight a
>>> difference between the two approaches in the diagram?
>> The biggest difference is the VFIO and vDPA frameworks. The vDPA (virtio data path acceleration) kernel framework
>> is a pillar in productizing the end-to-end vDPA solution and it enables NIC vendors to integrate their vDPA NIC kernel
>> drivers into the framework as part of their productization efforts.
>> Detailed information reference:https://www.redhat.com/en/blog/introduction-vdpa-kernel-framework
> For the sake of the argument, let's assume VFIO can't be used in your
> situation so vDPA is required. The part I don't understand is which
> specific NVMe features you need that virtio-blk lacks?
I can think one:
Avoid guest application migration from NVMe to virtio-blk?
Thanks
>
> Stefan
This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.
In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.
Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Re: [virtio-comment] About adding a new device type virtio-nvme
2023-01-19 3:38 ` Jason Wang
@ 2023-01-19 7:22 ` 侯英乐
0 siblings, 0 replies; 40+ messages in thread
From: 侯英乐 @ 2023-01-19 7:22 UTC (permalink / raw)
To: jasowang, virtio-comment
On Thu, 19 Jan 2023 11:38:51 +0800,Jason Wang wrote:
>在 2023/1/18 10:15, 侯英乐 写道:
>> On Tue, 17 Jan 2023 10:34:09 -0500, Stefan wrote:
>>
>> On Tue, Jan 17, 2023 at 05:41:57PM +0800, 侯英乐 wrote:
>>>> On Tue, 17 Jan 2023 09:32:05 +0100,David wrote:
>>>> On 17.01.23 03:04, 侯英乐 wrote:
>>>>
>>>>
>>>>>> virtio-nvme advantages :
>>>>>> 1) live migration
>>>>>> 2) support remote storage
>>>>
>>>>
>>>>> At least 1) is an implementation detail in the NVME implementation in
>>>> the hypervisor. I suspect 2) in a similar way, or is there a fundamental
>>>>> issue with that?
>>>>
>>>>
>>>>> One problematic thing about the NVME implementation in QEMU is that it
>>>>> will pin (via vfio) all guest RAM. Could that be avoided using
>>>>> virtio-NVME, or what exactly would be the difference between virtio-nvme
>>>>> and ordinary NVME?
>>>>
>>>>
>>>> In the virtualization scenario where devices are offload to hardware:
>>>>
>>>>
>>>> NVME:
>>>> ---------------------------------------------------------------------------------------------------------------------
>>>> _____________________________________________________________________________
>>>> | ___________________________________________________________ |
>>>> | | _____________________________________________________ | |
>>>> | | | | | |
>>>> | | | __________________________________ | | |
>>>> | | | | ______ | | | | ______
>>>> | | | User | | Mem |-----------------------|----|-|----|-----> | |
>>>> | | | | |______| SPDK | | | | (gVA) |______|
>>>> | | | | (gVA) | | | | | |
>>>> | | | |______|___________________________| | | | | |
>>>> | | |--GuestOS----------|--------------------------------| | | | |
>>>> | | | ______\/__________________________ | | | | |
>>>> | | VM | | VFIO | | | | | |
>>>> | User | | Kernel |___________ __________________| | | | | |
>>>> | | | | vfio-pci | | | | | | | |
>>>> | | | | (gIOVA) | | vfio_iommu_type1 | | | | | |
>>>> Software | | | |______|____|___|__________________| | | | | |
>>>> | | |___________________|________________________________| | | | |
>>>> | | ___________________|________________________________ | | | |
>>>> | | | ______|____ __________________ | | | \/ \/
>>>> | | | | \/ | | | | | | ______
>>>> | | | | NVME | | vIOMMU | | | | | |
>>>> | | | QEMU | Instance | | --|----|-|----|-----> |______|
>>>> | | | | (gIOVA) | | (gIOVA-->gPA) | | | | (gPA) | |
>>>> | | | |_____|_____| |__________________| | | | | |
>>>> | | |__________________|_________________________________| | | | |
>>>> | |_______________________|___________________________________| | | |
>>>> |---HostOS---------------------------|----------------------------------------| | |
>>>> | _______________________|___________________________________ | | |
>>>> | | | | | | |
>>>> | | | VFIO | | | |
>>>> | Kernel |_______________________|_____ _________________________| | | |
>>>> | | \/ | | | | | |
>>>> | | vfio-pci | | vfio_iommu_type1 | | | |
>>>> | | | | | | | | |
>>>> | |_______________________|_____|___|_________________________| | | |
>>>> |____________________________________|________________________________________| | |
>>>> -----------------------------------------------|-------------------------------------------------|----|--------------
>>>> ____________________________________\/____ _________________________ ___\/___\/___________
>>>> | | | | | | | | | |
>>>> Hardware | | | DMA (gIOVA) --|---|-> IOMMU --|---------|->|______| |
>>>> | | |_____________________________| | (gIOVA-->hPA) | (hPA) | Physical Memory |
>>>> | DPU | | |_________________________| |_____________________|
>>>> | | NVME-of |
>>>> | |__________________________________|
>>>> | | |
>>>> |___________________________|______________|
>>>> --------------------------------------|------------------------------------------------------------------------------
>>>> | TCP (RDMA, and so on)
>>>> ______________v__________
>>>> | |
>>>> Remote storage | |
>>>> | Network Storage |
>>>> | |
>>>> |_________________________|
>>>>
>>>> ---------------------------------------------------------------------------------------------------------------------
>>>>
>>>>
>>>> It is difficult to implement PCIe passthrough live migration.
>>
>>
>>
>>> Linux commit 115dcec65f61d53e25e1bed5e380468b30f98b14 ("vfio: Define
>>> device migration protocol v2") defines the VFIO migration API and it's
>>> implemented by several drivers in the kernel.
>>
>> Yes, this commit supports VFIO live migration, but the feature is a work in progress,
>> recent submission: https://lore.kernel.org/all/20230116141135.12021-10-avihaih@nvidia.com/
>>
>>
>>> Can you explain the difficulty of implementing PCIe passthrough live
>>> migration in more detail?
>> VFIO live migration requires IOMMU to support dirty page tracking. Currently,
>> no IOMMU device supports this feature. So, VFIO live migration will take a long time.
>> Detailed information reference:https://www.qemu.org/docs/master/devel/vfio-migration.html
>>
>>
>>
>>
>>>> virtio-nvme:
>>>> ---------------------------------------------------------------------------------------------------------------------
>>>> _____________________________________________________________________________
>>>> | ___________________________________________________________ |
>>>> | | _____________________________________________________ | |
>>>> | | | | | |
>>>> | | | __________________________________ | | |
>>>> | | | | ______ | | | | ______
>>>> | | | User | | Mem |-----------------------|----|-|----|-----> | |
>>>> | | | | |______| SPDK | | | | (gVA) |______|
>>>> | | | | (gVA) | | | | | |
>>>> | | | |______|___________________________| | | | | |
>>>> | | |--GuestOS----------|--------------------------------| | | | |
>>>> | | | ______v___________________________ | | | | |
>>>> | | VM | | VFIO | | | | | |
>>>> | User | | Kernel |___________ __________________| | | | | |
>>>> | | | | vfio-pci | | | | | | | |
>>>> | | | | (gIOVA) | | vfio_iommu_type1 | | | | | |
>>>> Software | | | |______|____|___|__________________| | | | | |
>>>> | | |___________________|________________________________| | | | |
>>>> | | ___________________|________________________________ | | | |
>>>> | | | ______v____ __________________ | | | v v
>>>> | | | |virtio-NVME| | | | | | ______
>>>> | | | | Instance | | vIOMMU | | | | | |
>>>> | | | QEMU | | | --|----|-|----|-----> |______|
>>>> | | | | vhost-vdpa| | (gIOVA-->gPA) | | | | (gPA) | |
>>>> | | | |__(gIOVA)__| |__________________| | | | | |
>>>> | | |__________________|_________________________________| | | | |
>>>> | |_______________________|___________________________________| | | |
>>>> |---HostOS---------------------------|----------------------------------------| | |
>>>> | _______________________|___________________________________ | | |
>>>> | | | | | | |
>>>> | | | vDPA | | | |
>>>> | Kernel |_______________________v_____ _________________________| | | |
>>>> | | | | | | | | |
>>>> | | vdpa-device | | | | | | |
>>>> | | (Virtual device) | | | | | | |
>>>> | |_______________________|_____|___|_________________________| | | |
>>>> |____________________________________|________________________________________| | |
>>>> -----------------------------------------------|-------------------------------------------------|----|--------------
>>>> ____________________________________v_____ _________________________ ___v____v____________
>>>> | | | | | | | | | |
>>>> Hardware | | | DMA (gIOVA) --|---|-> IOMMU --|---------|->|______| |
>>>> | | |_____________________________| | (gIOVA-->hPA) | (hPA) | Physical Memory |
>>>> | DPU | | |_________________________| |_____________________|
>>>> | | virtio-nvme-of |
>>>> | |__________________________________|
>>>> | | |
>>>> |___________________________|______________|
>>>> --------------------------------------|------------------------------------------------------------------------------
>>>> | TCP (RDMA, and so on)
>>>> ______________v__________
>>>> | |
>>>> Remote storage | |
>>>> | Network Storage |
>>>> | |
>>>> |_________________________|
>>>>
>>>> ---------------------------------------------------------------------------------------------------------------------
>>>> Based on the vDPA framework, it supports live migration.
>>
>>
>>
>>
>>> The two diagrams are quite similar. Did you want to highlight a
>>> difference between the two approaches in the diagram?
>> The biggest difference is the VFIO and vDPA frameworks. The vDPA (virtio data path acceleration) kernel framework
>> is a pillar in productizing the end-to-end vDPA solution and it enables NIC vendors to integrate their vDPA NIC kernel
>> drivers into the framework as part of their productization efforts.
>> Detailed information reference:https://www.redhat.com/en/blog/introduction-vdpa-kernel-framework
>
>
>Note that vDPA is not solely software concept but also a hardware one.
>And with vDPA kernel support the device can talks to kernel I/O
>subsystem via virtio-vDPA bus drivers.
Yes, I'm not being precise enough. vDPA is created for virtio devices to be offloading to hardware. Hardware is also part of vDPA.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Re: [virtio-comment] About adding a new device type virtio-nvme
2023-01-18 14:08 ` Stefan Hajnoczi
@ 2023-01-19 8:31 ` 侯英乐
0 siblings, 0 replies; 40+ messages in thread
From: 侯英乐 @ 2023-01-19 8:31 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: jasowang, David Hildenbrand, virtio-comment, Christoph Hellwig,
Keith Busch, Kevin Wolf, Klaus Jensen, sgarzare,
Michael S. Tsirkin, Alex Williamson
On Wed, 18 Jan 2023 09:08:36 -0500, Stefan wrote:
>On Wed, Jan 18, 2023 at 10:15:12AM +0800, 侯英乐 wrote:
>> On Tue, 17 Jan 2023 10:34:09 -0500, Stefan wrote:
>> >Can you explain the difficulty of implementing PCIe passthrough live
>> >migration in more detail?
>>
>> VFIO live migration requires IOMMU to support dirty page tracking. Currently,
>> no IOMMU device supports this feature. So, VFIO live migration will take a long time.
>> Detailed information reference:https://www.qemu.org/docs/master/devel/vfio-migration.html
>Can physical devices can do their own dirty page tracking in the
>meantime since they know which pages are being written to?
That's one solution, and I think it could work in theory, but I'm not aware of any current devices that support this feature.
>I have CCed Alex Williamson regarding VFIO.
Thanks,
Leo Hou/Houyingle
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Re: [virtio-comment] About adding a new device type virtio-nvme
2023-01-18 14:14 ` Stefan Hajnoczi
2023-01-19 3:40 ` Jason Wang
@ 2023-01-19 9:03 ` 侯英乐
2023-01-19 17:03 ` Stefan Hajnoczi
1 sibling, 1 reply; 40+ messages in thread
From: 侯英乐 @ 2023-01-19 9:03 UTC (permalink / raw)
To: Stefan Hajnoczi, jasowang
Cc: jasowang, David Hildenbrand, virtio-comment, Christoph Hellwig,
Keith Busch, Kevin Wolf, Klaus Jensen, sgarzare,
Michael S. Tsirkin
Wed, 18 Jan 2023 09:14:41 -0500, Stefan wrote:
>On Wed, Jan 18, 2023 at 10:15:12AM +0800, 侯英乐 wrote:
>> On Tue, 17 Jan 2023 10:34:09 -0500, Stefan wrote:
>> >On Tue, Jan 17, 2023 at 05:41:57PM +0800, 侯英乐 wrote:
>> >> On Tue, 17 Jan 2023 09:32:05 +0100,David wrote:
>> >> >On 17.01.23 03:04, 侯英乐 wrote:
>> >The two diagrams are quite similar. Did you want to highlight a
>>
>> >difference between the two approaches in the diagram?
>>
>> The biggest difference is the VFIO and vDPA frameworks. The vDPA (virtio data path acceleration) kernel framework
>> is a pillar in productizing the end-to-end vDPA solution and it enables NIC vendors to integrate their vDPA NIC kernel
>> drivers into the framework as part of their productization efforts.
>> Detailed information reference:https://www.redhat.com/en/blog/introduction-vdpa-kernel-framework
>For the sake of the argument, let's assume VFIO can't be used in your
>situation so vDPA is required. The part I don't understand is which
>specific NVMe features you need that virtio-blk lacks?
During the DPU chip design process, "Fabrics connect" commands are not supported on standard nvme-pci devices,
but I can be delivered to remote storage at the back-end of the nvme-pci device.
In the case of a virtio-blk device, I am not clear how the back-end of virtio-blk connects to remote storage.Although
NVIDIA claims to support virtio-blk SNAP (Software-defined Network Accelerated Processing), their implementation
is not expected to be an open source standard, other vendors may have developed based on proprietary specifications.
All of this is from a hardware offloading perspective. There are two solutions to the problem I'm facing:
1) virtio combines nvme, add a new virtio-nvme device.
2) virtio-blk Adds Fabrics related commands to enable virtio-blk to support Virtio-blk-of (over Fabric).
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Re: [virtio-comment] About adding a new device type virtio-nvme
2023-01-18 10:09 ` Max Gurtovoy
2023-01-18 11:12 ` Michael S. Tsirkin
@ 2023-01-19 10:19 ` 侯英乐
2023-01-19 10:33 ` Max Gurtovoy
1 sibling, 1 reply; 40+ messages in thread
From: 侯英乐 @ 2023-01-19 10:19 UTC (permalink / raw)
To: Max Gurtovoy, Stefan Hajnoczi
Cc: virtio-comment, Christoph Hellwig, Keith Busch, Kevin Wolf,
Klaus Jensen, sgarzare, Michael S. Tsirkin
Wed, 18 Jan 2023 12:09:59 +0200, Max Gurtovoy wrote:
>On 18/01/2023 5:23, 侯英乐 wrote:
>> On Tue, 17 Jan 2023 19:19:59 +0200, Max Gurtovoy wrote:
>>
>>> On 17/01/2023 4:04, 侯英乐 wrote:
>>>> On Wed, 11 Jan 2023 10:16:55 -0500, Stefan wrote:
>>>>>> On Wed, Jan 11, 2023 at 11:21:35AM +0800, 侯英乐 wrote:
>>>>>> As we know, nvme has more features than virtio-blk. For example, with the development of virtualization IO offloading to hardware, virtio-blk and NVME-OF offloading to hardware >are developing rapidly. So if virtio and nvme are combined into Virtio-NvMe, Is it necessary to add a device type Virtio-NvMe ?
>>>>
>>>>
>>>>> Hi,
>>>>> In theory, yes, virtio-nvme can be done. The question is why do it?
>>>>> NVMe already provides a PCI hardware spec for software and hardware
>>>>> implementations to follow. An NVMe PCI device can be exposed to the
>>>>> guest and modern operating systems recognize it without requiring new
>>>>> drivers.
>>>>> The value of VIRTIO here is probably in the deep integration into the
>>>>> virtualization stack with vDPA, vhost, etc. A virtio-nvme device can use
>>>>> all these things whereas a PCI device needs to do everything from
>>>>> scratch.
>>>> The NVME technology and ecosystem are complete. However, in virtualization scenarios, NVME devices can only use PCIe pass-through . When NVME and virtio combine to connect to the vDPA ecosystem, live migration is supported.
>>>>> Let's not forget that virtio-blk is widely used and new commands are
>>>>> being added as needed. Which NVMe features are you missing in
>>>>> virtio-blk?
>>>> With the introduction of the concept of DPU, a large number of vendors are offloading virtual devices to hardware. The back-end of Virtio-blk does not support remote storage. Therefore, Virtio-Nvme-of can well combine the advantages of remote storage and virtio live migration
>>>>> I guess this is why virtio-nvme hasn't been done before: people who want
>>>> NVMe can already do NVMe PCI, people who want VIRTIO can use virtio-blk,
>>>>> and so there hasn't been a great need to combine VIRTIO and NVMe yet.
>>>>> What advantages do you see in having virtio-nvme?
>>>> virtio-nvme advantages :
>>>> 1) live migration
>>>
>>> This is WIP and will use VFIO live migration framework.
>> Yes, VFIO live migration framework is WIP, but I still think vdpa is a friendlier framework.
>Not sure what you consider friendly ?
My personal opinion: VFIO live migration requires device design requirements.
But vDPA-based live migration, the software-abstracted vDPA device in the vDPA
framework can do some state recording, The design requirements for virtio devices
that are offloading to hardware may be lower.
>The community agreed that in SR-IOV - VF migration is done via PF interface.
>Any device specific migration (e.g. vdpa/virtio) is not as generic as
>VFIO migration. Also it will be maintained by a smaller group of engineers.
>If you would like to use vdpa - I suggest using virtio-blk and not
>inventing virtio-nvme device that will for sure be with less feature set
>than pure NVMe.
>In case you're missing some feature in virtio-blk that exist in NVMe,
>you're welcome to submit a proposal to the technical group with that
>feature.
Yes, this is good advice.
virtio-blk adds Fabrics related commands to enable virtio-blk to support
virtio-blk-of (over Fabric), I wonder if it is feasible.
>NVIDIA also has a DPU based physical Virtio-blk device (NVIDIA'S
>virtio-blk SNAP) that support SR-IOV and remote storage access.
For remote storage access, how is the physical Virtio-blk device's back-end implemented?
What protocol is used?
Is it an open source solution?
>Live migration specification is WIP in both NVMe and Virtio working
>groups. I can't say who will be merge first.
Yes, I agree with you on that point.
>>>
>>>> 2) support remote storage
>>> There are solutions today that can use remote storage as an NVMe
>>> Namespace. For example, DPU based NVMe device such as NVIDIA'S NVMe SNAP
>>> device.
>>
>>
>> Yes, you're right. Nvme has a built-in advantage over virtio-blk hardware offloading.
>> The reason why I propose Virtio-NVMe is to combine nvme and virtio, so that NVME
>> can adapt to virtio ecosystem based on virtio interface specifications, such as vdpa.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [virtio-comment] About adding a new device type virtio-nvme
2023-01-19 10:19 ` 侯英乐
@ 2023-01-19 10:33 ` Max Gurtovoy
2023-01-19 11:02 ` 侯英乐
0 siblings, 1 reply; 40+ messages in thread
From: Max Gurtovoy @ 2023-01-19 10:33 UTC (permalink / raw)
To: 侯英乐, Stefan Hajnoczi
Cc: virtio-comment, Christoph Hellwig, Keith Busch, Kevin Wolf,
Klaus Jensen, sgarzare, Michael S. Tsirkin
On 19/01/2023 12:19, 侯英乐 wrote:
> Wed, 18 Jan 2023 12:09:59 +0200, Max Gurtovoy wrote:
>> On 18/01/2023 5:23, 侯英乐 wrote:
>>> On Tue, 17 Jan 2023 19:19:59 +0200, Max Gurtovoy wrote:
>>>> On 17/01/2023 4:04, 侯英乐 wrote:
>>>>> On Wed, 11 Jan 2023 10:16:55 -0500, Stefan wrote:
>>>>>>> On Wed, Jan 11, 2023 at 11:21:35AM +0800, 侯英乐 wrote:
>>>>>>> As we know, nvme has more features than virtio-blk. For example, with the development of virtualization IO offloading to hardware, virtio-blk and NVME-OF offloading to hardware >are developing rapidly. So if virtio and nvme are combined into Virtio-NvMe, Is it necessary to add a device type Virtio-NvMe ?
>>>>>
>>>>>> Hi,
>>>>>> In theory, yes, virtio-nvme can be done. The question is why do it?
>>>>>> NVMe already provides a PCI hardware spec for software and hardware
>>>>>> implementations to follow. An NVMe PCI device can be exposed to the
>>>>>> guest and modern operating systems recognize it without requiring new
>>>>>> drivers.
>>>>>> The value of VIRTIO here is probably in the deep integration into the
>>>>>> virtualization stack with vDPA, vhost, etc. A virtio-nvme device can use
>>>>>> all these things whereas a PCI device needs to do everything from
>>>>>> scratch.
>>>>> The NVME technology and ecosystem are complete. However, in virtualization scenarios, NVME devices can only use PCIe pass-through . When NVME and virtio combine to connect to the vDPA ecosystem, live migration is supported.
>>>>>> Let's not forget that virtio-blk is widely used and new commands are
>>>>>> being added as needed. Which NVMe features are you missing in
>>>>>> virtio-blk?
>>>>> With the introduction of the concept of DPU, a large number of vendors are offloading virtual devices to hardware. The back-end of Virtio-blk does not support remote storage. Therefore, Virtio-Nvme-of can well combine the advantages of remote storage and virtio live migration
>>>>>> I guess this is why virtio-nvme hasn't been done before: people who want
>>>>> NVMe can already do NVMe PCI, people who want VIRTIO can use virtio-blk,
>>>>>> and so there hasn't been a great need to combine VIRTIO and NVMe yet.
>>>>>> What advantages do you see in having virtio-nvme?
>>>>> virtio-nvme advantages :
>>>>> 1) live migration
>>>>
>>>> This is WIP and will use VFIO live migration framework.
>>> Yes, VFIO live migration framework is WIP, but I still think vdpa is a friendlier framework.
>
>
>> Not sure what you consider friendly ?
> My personal opinion: VFIO live migration requires device design requirements.
> But vDPA-based live migration, the software-abstracted vDPA device in the vDPA
> framework can do some state recording, The design requirements for virtio devices
> that are offloading to hardware may be lower.
>
>
>
>
>> The community agreed that in SR-IOV - VF migration is done via PF interface.
>
>
>> Any device specific migration (e.g. vdpa/virtio) is not as generic as
>> VFIO migration. Also it will be maintained by a smaller group of engineers.
>
>> If you would like to use vdpa - I suggest using virtio-blk and not
>> inventing virtio-nvme device that will for sure be with less feature set
>> than pure NVMe.
>
>
>> In case you're missing some feature in virtio-blk that exist in NVMe,
>> you're welcome to submit a proposal to the technical group with that
>> feature.
>
> Yes, this is good advice.
> virtio-blk adds Fabrics related commands to enable virtio-blk to support
> virtio-blk-of (over Fabric), I wonder if it is feasible.
I'm totally confused.
I thought you're are trying to build some virtualized environment and
you're looking for storage devices that support Live migration.
How does virtio-blk-of will assist here ?
And how will it be better than using existing over fabric solutions that
can be the backend of the storage device (iscsi, nvmf, etc..) ?
>
>
>
>
>> NVIDIA also has a DPU based physical Virtio-blk device (NVIDIA'S
>> virtio-blk SNAP) that support SR-IOV and remote storage access.
>
> For remote storage access, how is the physical Virtio-blk device's back-end implemented?
> What protocol is used?
> Is it an open source solution?
>
>
>
>
>> Live migration specification is WIP in both NVMe and Virtio working
>> groups. I can't say who will be merge first.
> Yes, I agree with you on that point.
>
>
>
>>>>
>>>>> 2) support remote storage
>>>> There are solutions today that can use remote storage as an NVMe
>>>> Namespace. For example, DPU based NVMe device such as NVIDIA'S NVMe SNAP
>>>> device.
>>> Yes, you're right. Nvme has a built-in advantage over virtio-blk hardware offloading.
>>> The reason why I propose Virtio-NVMe is to combine nvme and virtio, so that NVME
>>> can adapt to virtio ecosystem based on virtio interface specifications, such as vdpa.
This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.
In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.
Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Re: [virtio-comment] About adding a new device type virtio-nvme
2023-01-19 10:33 ` Max Gurtovoy
@ 2023-01-19 11:02 ` 侯英乐
2023-01-19 17:15 ` Stefan Hajnoczi
0 siblings, 1 reply; 40+ messages in thread
From: 侯英乐 @ 2023-01-19 11:02 UTC (permalink / raw)
To: Max Gurtovoy, Stefan Hajnoczi
Cc: virtio-comment, Christoph Hellwig, Keith Busch, Kevin Wolf,
Klaus Jensen, sgarzare, Michael S. Tsirkin
Thu, 19 Jan 2023 12:33:54 +0200, Max Gurtovoy wrote:
>On 19/01/2023 12:19, 侯英乐 wrote:
>> Wed, 18 Jan 2023 12:09:59 +0200, Max Gurtovoy wrote:
>>> On 18/01/2023 5:23, 侯英乐 wrote:
>>>> On Tue, 17 Jan 2023 19:19:59 +0200, Max Gurtovoy wrote:
>>>>> On 17/01/2023 4:04, 侯英乐 wrote:
>>>>>> On Wed, 11 Jan 2023 10:16:55 -0500, Stefan wrote:
>>>>>>>> On Wed, Jan 11, 2023 at 11:21:35AM +0800, 侯英乐 wrote:
>>>>>>>> As we know, nvme has more features than virtio-blk. For example, with the development of virtualization IO offloading to hardware, virtio-blk and NVME-OF offloading to hardware >are developing rapidly. So if virtio and nvme are combined into Virtio-NvMe, Is it necessary to add a device type Virtio-NvMe ?
>>>>>>
>>>>>>> Hi,
>>>>>>> In theory, yes, virtio-nvme can be done. The question is why do it?
>>>>>>> NVMe already provides a PCI hardware spec for software and hardware
>>>>>>> implementations to follow. An NVMe PCI device can be exposed to the
>>>>>>> guest and modern operating systems recognize it without requiring new
>>>>>>> drivers.
>>>>>>> The value of VIRTIO here is probably in the deep integration into the
>>>>>>> virtualization stack with vDPA, vhost, etc. A virtio-nvme device can use
>>>>>>> all these things whereas a PCI device needs to do everything from
>>>>>>> scratch.
>>>>>> The NVME technology and ecosystem are complete. However, in virtualization scenarios, NVME devices can only use PCIe pass-through . When NVME and virtio combine to connect to the vDPA ecosystem, live migration is supported.
>>>>>>> Let's not forget that virtio-blk is widely used and new commands are
>>>>>>> being added as needed. Which NVMe features are you missing in
>>>>>>> virtio-blk?
>>>>>> With the introduction of the concept of DPU, a large number of vendors are offloading virtual devices to hardware. The back-end of Virtio-blk does not support remote storage. Therefore, Virtio-Nvme-of can well combine the advantages of remote storage and virtio live migration
>>>>>>> I guess this is why virtio-nvme hasn't been done before: people who want
>>>>>> NVMe can already do NVMe PCI, people who want VIRTIO can use virtio-blk,
>>>>>>> and so there hasn't been a great need to combine VIRTIO and NVMe yet.
>>>>>>> What advantages do you see in having virtio-nvme?
>>>>>> virtio-nvme advantages :
>>>>>> 1) live migration
>>>>>
>>>>> This is WIP and will use VFIO live migration framework.
>>>> Yes, VFIO live migration framework is WIP, but I still think vdpa is a friendlier framework.
>>
>>
>>> Not sure what you consider friendly ?
>> My personal opinion: VFIO live migration requires device design requirements.
>> But vDPA-based live migration, the software-abstracted vDPA device in the vDPA
>> framework can do some state recording, The design requirements for virtio devices
>> that are offloading to hardware may be lower.
>>
>>
>>
>>
>>> The community agreed that in SR-IOV - VF migration is done via PF interface.
>>
>>
>>> Any device specific migration (e.g. vdpa/virtio) is not as generic as
>>> VFIO migration. Also it will be maintained by a smaller group of engineers.
>>
>>> If you would like to use vdpa - I suggest using virtio-blk and not
>>> inventing virtio-nvme device that will for sure be with less feature set
>>> than pure NVMe.
>>
>>
>>> In case you're missing some feature in virtio-blk that exist in NVMe,
>>> you're welcome to submit a proposal to the technical group with that
>>> feature.
>>
>> Yes, this is good advice.
>> virtio-blk adds Fabrics related commands to enable virtio-blk to support
>> virtio-blk-of (over Fabric), I wonder if it is feasible.
>
>I'm totally confused.
>I thought you're are trying to build some virtualized environment and
>you're looking for storage devices that support Live migration.
>How does virtio-blk-of will assist here ?
I would like to push virtio storage in hardware offloading scenarios,
enabling open source solutions that support remote storage access.
So we're talking about the need for virtio-nvme and virtio-blk-of.
>And how will it be better than using existing over fabric solutions that
>can be the backend of the storage device (iscsi, nvmf, etc..) ?
>>
>>
>>
>>
>>> NVIDIA also has a DPU based physical Virtio-blk device (NVIDIA'S
>>> virtio-blk SNAP) that support SR-IOV and remote storage access.
>>
>> For remote storage access, how is the physical Virtio-blk device's back-end implemented?
>> What protocol is used?
>> Is it an open source solution?
>>
>>
>>
>>
>>> Live migration specification is WIP in both NVMe and Virtio working
>>> groups. I can't say who will be merge first.
>> Yes, I agree with you on that point.
>>
>>
>>
>>>>>
>>>>>> 2) support remote storage
>>>>> There are solutions today that can use remote storage as an NVMe
>>>>> Namespace. For example, DPU based NVMe device such as NVIDIA'S NVMe SNAP
>>>>> device.
>>>> Yes, you're right. Nvme has a built-in advantage over virtio-blk hardware offloading.
>>>> The reason why I propose Virtio-NVMe is to combine nvme and virtio, so that NVME
>>>> can adapt to virtio ecosystem based on virtio interface specifications, such as vdpa.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Re: [virtio-comment] About adding a new device type virtio-nvme
2023-01-17 16:01 ` Stefan Hajnoczi
[not found] ` <20230117162114.GA24976@lst.de>
2023-01-18 2:49 ` 侯英乐
@ 2023-01-19 11:42 ` Michael S. Tsirkin
2023-01-29 3:32 ` 侯英乐
2 siblings, 1 reply; 40+ messages in thread
From: Michael S. Tsirkin @ 2023-01-19 11:42 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: 侯英乐, virtio-comment, Christoph Hellwig,
Keith Busch, Kevin Wolf, Klaus Jensen, sgarzare
On Tue, Jan 17, 2023 at 11:01:37AM -0500, Stefan Hajnoczi wrote:
> > The NVME technology and ecosystem are complete. However, in virtualization scenarios, NVME devices can only use PCIe pass-through . When NVME and virtio combine to connect to the vDPA ecosystem, live migration is supported.
> >
> >
> > >Let's not forget that virtio-blk is widely used and new commands are
> > >being added as needed. Which NVMe features are you missing in
> > >virtio-blk?
> >
> > With the introduction of the concept of DPU, a large number of vendors are offloading virtual devices to hardware. The back-end of Virtio-blk does not support remote storage. Therefore, Virtio-Nvme-of can well combine the advantages of remote storage and virtio live migration
>
> virtio-blk is just a storage interface, whether that storage is local or
> remote is up to the device implementation. The block device could be
> located on Ceph, NFS, etc.
>
> Each virtio-blk device is a single block device. There is no
> standardized management protocol in virtio-blk for connecting to remote
> block devices. I'm aware of hardware virtio-blk devices that connect to
> remote storage. Configuration is performed through an out-of-band
> management interface.
>
> Maybe when you say virtio-blk doesn't support remote storage this is
> what you mean?
>
> Stefan
侯英乐 you never answered this question.
--
MST
This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.
In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.
Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [virtio-comment] About adding a new device type virtio-nvme
2023-01-19 3:40 ` Jason Wang
@ 2023-01-19 16:59 ` Stefan Hajnoczi
0 siblings, 0 replies; 40+ messages in thread
From: Stefan Hajnoczi @ 2023-01-19 16:59 UTC (permalink / raw)
To: Jason Wang
Cc: 侯英乐, David Hildenbrand, virtio-comment,
Christoph Hellwig, Keith Busch, Kevin Wolf, Klaus Jensen,
sgarzare, Michael S. Tsirkin
[-- Attachment #1: Type: text/plain, Size: 1798 bytes --]
On Thu, Jan 19, 2023 at 11:40:02AM +0800, Jason Wang wrote:
>
> 在 2023/1/18 22:14, Stefan Hajnoczi 写道:
> > On Wed, Jan 18, 2023 at 10:15:12AM +0800, 侯英乐 wrote:
> > > On Tue, 17 Jan 2023 10:34:09 -0500, Stefan wrote:
> > > > On Tue, Jan 17, 2023 at 05:41:57PM +0800, 侯英乐 wrote:
> > > > > On Tue, 17 Jan 2023 09:32:05 +0100,David wrote:
> > > > > > On 17.01.23 03:04, 侯英乐 wrote:
> > > > The two diagrams are quite similar. Did you want to highlight a
> > > > difference between the two approaches in the diagram?
> > > The biggest difference is the VFIO and vDPA frameworks. The vDPA (virtio data path acceleration) kernel framework
> > > is a pillar in productizing the end-to-end vDPA solution and it enables NIC vendors to integrate their vDPA NIC kernel
> > > drivers into the framework as part of their productization efforts.
> > > Detailed information reference:https://www.redhat.com/en/blog/introduction-vdpa-kernel-framework
> > For the sake of the argument, let's assume VFIO can't be used in your
> > situation so vDPA is required. The part I don't understand is which
> > specific NVMe features you need that virtio-blk lacks?
>
>
> I can think one:
>
> Avoid guest application migration from NVMe to virtio-blk?
To get the best fidelity in that situation NVMe PCI would be the natural
choice. For example, if the application is SPDK then it won't just work
with virtio-nvme because it has a userspace NVMe PCI driver.
There might be applications that break when moving from NVMe to
virtio-blk but don't depend on NVMe PCI, but it seems like a very niche
case.
Most applications don't use NVMe directly or if they do, then they speak
NVMe PCI or NVME over TCP directly, so they won't work with virtio-nvme.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Re: [virtio-comment] About adding a new device type virtio-nvme
2023-01-19 9:03 ` 侯英乐
@ 2023-01-19 17:03 ` Stefan Hajnoczi
0 siblings, 0 replies; 40+ messages in thread
From: Stefan Hajnoczi @ 2023-01-19 17:03 UTC (permalink / raw)
To: 侯英乐
Cc: jasowang, David Hildenbrand, virtio-comment, Christoph Hellwig,
Keith Busch, Kevin Wolf, Klaus Jensen, sgarzare,
Michael S. Tsirkin
[-- Attachment #1: Type: text/plain, Size: 2177 bytes --]
On Thu, Jan 19, 2023 at 05:03:38PM +0800, 侯英乐 wrote:
> Wed, 18 Jan 2023 09:14:41 -0500, Stefan wrote:
>
> >On Wed, Jan 18, 2023 at 10:15:12AM +0800, 侯英乐 wrote:
>
> >> On Tue, 17 Jan 2023 10:34:09 -0500, Stefan wrote:
>
> >> >On Tue, Jan 17, 2023 at 05:41:57PM +0800, 侯英乐 wrote:
>
> >> >> On Tue, 17 Jan 2023 09:32:05 +0100,David wrote:
>
> >> >> >On 17.01.23 03:04, 侯英乐 wrote:
>
> >> >The two diagrams are quite similar. Did you want to highlight a
>
> >>
>
> >> >difference between the two approaches in the diagram?
>
> >>
>
> >> The biggest difference is the VFIO and vDPA frameworks. The vDPA (virtio data path acceleration) kernel framework
>
> >> is a pillar in productizing the end-to-end vDPA solution and it enables NIC vendors to integrate their vDPA NIC kernel
>
> >> drivers into the framework as part of their productization efforts.
>
> >> Detailed information reference:https://www.redhat.com/en/blog/introduction-vdpa-kernel-framework
>
>
>
> >For the sake of the argument, let's assume VFIO can't be used in your
>
> >situation so vDPA is required. The part I don't understand is which
>
> >specific NVMe features you need that virtio-blk lacks?
>
>
>
> During the DPU chip design process, "Fabrics connect" commands are not supported on standard nvme-pci devices,
> but I can be delivered to remote storage at the back-end of the nvme-pci device.
>
> In the case of a virtio-blk device, I am not clear how the back-end of virtio-blk connects to remote storage.Although
> NVIDIA claims to support virtio-blk SNAP (Software-defined Network Accelerated Processing), their implementation
> is not expected to be an open source standard, other vendors may have developed based on proprietary specifications.
>
> All of this is from a hardware offloading perspective. There are two solutions to the problem I'm facing:
Wait, what is the problem you are facing? Why do you need NVMe?
> 1) virtio combines nvme, add a new virtio-nvme device.
> 2) virtio-blk Adds Fabrics related commands to enable virtio-blk to support Virtio-blk-of (over Fabric).
>
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Re: [virtio-comment] About adding a new device type virtio-nvme
2023-01-19 11:02 ` 侯英乐
@ 2023-01-19 17:15 ` Stefan Hajnoczi
0 siblings, 0 replies; 40+ messages in thread
From: Stefan Hajnoczi @ 2023-01-19 17:15 UTC (permalink / raw)
To: 侯英乐
Cc: Max Gurtovoy, virtio-comment, Christoph Hellwig, Keith Busch,
Kevin Wolf, Klaus Jensen, sgarzare, Michael S. Tsirkin
[-- Attachment #1: Type: text/plain, Size: 5499 bytes --]
On Thu, Jan 19, 2023 at 07:02:00PM +0800, 侯英乐 wrote:
> Thu, 19 Jan 2023 12:33:54 +0200, Max Gurtovoy wrote:
> >On 19/01/2023 12:19, 侯英乐 wrote:
>
> >> Wed, 18 Jan 2023 12:09:59 +0200, Max Gurtovoy wrote:
>
> >>> On 18/01/2023 5:23, 侯英乐 wrote:
>
> >>>> On Tue, 17 Jan 2023 19:19:59 +0200, Max Gurtovoy wrote:
>
> >>>>> On 17/01/2023 4:04, 侯英乐 wrote:
>
> >>>>>> On Wed, 11 Jan 2023 10:16:55 -0500, Stefan wrote:
>
> >>>>>>>> On Wed, Jan 11, 2023 at 11:21:35AM +0800, 侯英乐 wrote:
>
> >>>>>>>> As we know, nvme has more features than virtio-blk. For example, with the development of virtualization IO offloading to hardware, virtio-blk and NVME-OF offloading to hardware >are developing rapidly. So if virtio and nvme are combined into Virtio-NvMe, Is it necessary to add a device type Virtio-NvMe ?
>
> >>>>>>
>
> >>>>>>> Hi,
>
> >>>>>>> In theory, yes, virtio-nvme can be done. The question is why do it?
>
> >>>>>>> NVMe already provides a PCI hardware spec for software and hardware
>
> >>>>>>> implementations to follow. An NVMe PCI device can be exposed to the
>
> >>>>>>> guest and modern operating systems recognize it without requiring new
>
> >>>>>>> drivers.
>
> >>>>>>> The value of VIRTIO here is probably in the deep integration into the
>
> >>>>>>> virtualization stack with vDPA, vhost, etc. A virtio-nvme device can use
>
> >>>>>>> all these things whereas a PCI device needs to do everything from
>
> >>>>>>> scratch.
>
> >>>>>> The NVME technology and ecosystem are complete. However, in virtualization scenarios, NVME devices can only use PCIe pass-through . When NVME and virtio combine to connect to the vDPA ecosystem, live migration is supported.
>
> >>>>>>> Let's not forget that virtio-blk is widely used and new commands are
>
> >>>>>>> being added as needed. Which NVMe features are you missing in
>
> >>>>>>> virtio-blk?
>
> >>>>>> With the introduction of the concept of DPU, a large number of vendors are offloading virtual devices to hardware. The back-end of Virtio-blk does not support remote storage. Therefore, Virtio-Nvme-of can well combine the advantages of remote storage and virtio live migration
>
> >>>>>>> I guess this is why virtio-nvme hasn't been done before: people who want
>
> >>>>>> NVMe can already do NVMe PCI, people who want VIRTIO can use virtio-blk,
>
> >>>>>>> and so there hasn't been a great need to combine VIRTIO and NVMe yet.
>
> >>>>>>> What advantages do you see in having virtio-nvme?
>
> >>>>>> virtio-nvme advantages :
>
> >>>>>> 1) live migration
>
> >>>>>
>
> >>>>> This is WIP and will use VFIO live migration framework.
>
> >>>> Yes, VFIO live migration framework is WIP, but I still think vdpa is a friendlier framework.
>
> >>
>
> >>
>
> >>> Not sure what you consider friendly ?
>
> >> My personal opinion: VFIO live migration requires device design requirements.
>
> >> But vDPA-based live migration, the software-abstracted vDPA device in the vDPA
>
> >> framework can do some state recording, The design requirements for virtio devices
>
> >> that are offloading to hardware may be lower.
>
> >>
>
> >>
>
> >>
>
> >>
>
> >>> The community agreed that in SR-IOV - VF migration is done via PF interface.
>
> >>
>
> >>
>
> >>> Any device specific migration (e.g. vdpa/virtio) is not as generic as
>
> >>> VFIO migration. Also it will be maintained by a smaller group of engineers.
>
> >>
>
> >>> If you would like to use vdpa - I suggest using virtio-blk and not
>
> >>> inventing virtio-nvme device that will for sure be with less feature set
>
> >>> than pure NVMe.
>
> >>
>
> >>
>
> >>> In case you're missing some feature in virtio-blk that exist in NVMe,
>
> >>> you're welcome to submit a proposal to the technical group with that
>
> >>> feature.
>
> >>
>
> >> Yes, this is good advice.
>
> >> virtio-blk adds Fabrics related commands to enable virtio-blk to support
>
> >> virtio-blk-of (over Fabric), I wonder if it is feasible.
>
> >
>
> >I'm totally confused.
>
> >I thought you're are trying to build some virtualized environment and
>
> >you're looking for storage devices that support Live migration.
>
> >How does virtio-blk-of will assist here ?
>
>
>
> I would like to push virtio storage in hardware offloading scenarios,
> enabling open source solutions that support remote storage access.
> So we're talking about the need for virtio-nvme and virtio-blk-of.
If I understand correctly you're saying the guest driver needs to speak
the same protocol as the remote storage?
That's a good idea for local storage because it avoids extra layers of
software that parses/translates commands.
However, I don't understand why it matters for remote storage because
commands needs to be parsed by the DPU and sent as messages over a
fabric anyway. Whether you go virtio-blk<->NVMeoF,
virtio-blk<->virtio-blk-of, or nvme-pci<->NVMeoF, it's still the same
path. None of them presents a significant optimization opportunity.
The main optimization is to configure some sort of RDMA to avoid copying
around I/O buffers, but the buffers only contain data and are not
protocol-specific so virtio-blk<->NVMeoF should work.
Can you explain what you wrote in a bit more detail, I don't understand
why virtio-blk-of is needed?
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Re: [virtio-comment] About adding a new device type virtio-nvme
2023-01-19 11:42 ` Michael S. Tsirkin
@ 2023-01-29 3:32 ` 侯英乐
2023-01-30 20:30 ` Stefan Hajnoczi
0 siblings, 1 reply; 40+ messages in thread
From: 侯英乐 @ 2023-01-29 3:32 UTC (permalink / raw)
To: Michael S. Tsirkin, Stefan Hajnoczi
Cc: virtio-comment, Christoph Hellwig, Keith Busch, Kevin Wolf,
Klaus Jensen, sgarzare
Thu, 19 Jan 2023 06:42:49 -0500, Michael S. Tsirkin wrote:
>On Tue, Jan 17, 2023 at 11:01:37AM -0500, Stefan Hajnoczi wrote:
>> > The NVME technology and ecosystem are complete. However, in virtualization scenarios, NVME devices can only use PCIe pass-through . When NVME and virtio combine to connect to the vDPA ecosystem, live migration is supported.
>> >
>> >
>> > >Let's not forget that virtio-blk is widely used and new commands are
>> > >being added as needed. Which NVMe features are you missing in
>> > >virtio-blk?
>> >
>> > With the introduction of the concept of DPU, a large number of vendors are offloading virtual devices to hardware. The back-end of Virtio-blk does not support remote storage. Therefore, Virtio-Nvme-of can well combine the advantages of remote storage and virtio live migration
>>
>> virtio-blk is just a storage interface, whether that storage is local or
>> remote is up to the device implementation. The block device could be
>> located on Ceph, NFS, etc.
>>
>> Each virtio-blk device is a single block device. There is no
>> standardized management protocol in virtio-blk for connecting to remote
>> block devices. I'm aware of hardware virtio-blk devices that connect to
>> remote storage. Configuration is performed through an out-of-band
>> management interface.
>>
>> Maybe when you say virtio-blk doesn't support remote storage this is
>> what you mean?
>>
>> Stefan
>
>侯英乐 you never answered this question.
All the topics revolve around the scenario where the virtio device is offloading into the DPU.
Virtio has a front-end and a back-end. What I mean is that I want to implement the back-end of Virtio-blk on the DPU.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Re: [virtio-comment] About adding a new device type virtio-nvme
2023-01-29 3:32 ` 侯英乐
@ 2023-01-30 20:30 ` Stefan Hajnoczi
0 siblings, 0 replies; 40+ messages in thread
From: Stefan Hajnoczi @ 2023-01-30 20:30 UTC (permalink / raw)
To: 侯英乐
Cc: Michael S. Tsirkin, virtio-comment, Christoph Hellwig,
Keith Busch, Kevin Wolf, Klaus Jensen, sgarzare
[-- Attachment #1: Type: text/plain, Size: 2092 bytes --]
On Sun, Jan 29, 2023 at 11:32:43AM +0800, 侯英乐 wrote:
> Thu, 19 Jan 2023 06:42:49 -0500, Michael S. Tsirkin wrote:
>
> >On Tue, Jan 17, 2023 at 11:01:37AM -0500, Stefan Hajnoczi wrote:
>
> >> > The NVME technology and ecosystem are complete. However, in virtualization scenarios, NVME devices can only use PCIe pass-through . When NVME and virtio combine to connect to the vDPA ecosystem, live migration is supported.
>
> >> >
>
> >> >
>
> >> > >Let's not forget that virtio-blk is widely used and new commands are
>
> >> > >being added as needed. Which NVMe features are you missing in
>
> >> > >virtio-blk?
>
> >> >
>
> >> > With the introduction of the concept of DPU, a large number of vendors are offloading virtual devices to hardware. The back-end of Virtio-blk does not support remote storage. Therefore, Virtio-Nvme-of can well combine the advantages of remote storage and virtio live migration
>
> >>
>
> >> virtio-blk is just a storage interface, whether that storage is local or
>
> >> remote is up to the device implementation. The block device could be
>
> >> located on Ceph, NFS, etc.
>
> >>
>
> >> Each virtio-blk device is a single block device. There is no
>
> >> standardized management protocol in virtio-blk for connecting to remote
>
> >> block devices. I'm aware of hardware virtio-blk devices that connect to
>
> >> remote storage. Configuration is performed through an out-of-band
>
> >> management interface.
>
> >>
>
> >> Maybe when you say virtio-blk doesn't support remote storage this is
>
> >> what you mean?
>
> >>
>
> >> Stefan
>
> >
> >侯英乐 you never answered this question.
>
>
> All the topics revolve around the scenario where the virtio device is offloading into the DPU.
> Virtio has a front-end and a back-end. What I mean is that I want to implement the back-end of Virtio-blk on the DPU.
You can do that today and the underlying storage can be remote.
Can you explain what you mean when you say virtio-blk doesn't support
remote storage?
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Re: [virtio-comment] About adding a new device type virtio-nvme
2023-01-18 2:49 ` 侯英乐
@ 2023-02-05 12:33 ` Michael S. Tsirkin
2023-02-09 3:28 ` 侯英乐
0 siblings, 1 reply; 40+ messages in thread
From: Michael S. Tsirkin @ 2023-02-05 12:33 UTC (permalink / raw)
To: 侯英乐
Cc: Stefan Hajnoczi, virtio-comment, Christoph Hellwig, Keith Busch,
Kevin Wolf, Klaus Jensen, sgarzare
On Wed, Jan 18, 2023 at 10:49:23AM +0800, 侯英乐 wrote:
> On Tue, 17 Jan 2023 11:01:37 -0500, Stefan wrote:
>
> >On Tue, Jan 17, 2023 at 10:04:07AM +0800, 侯英乐 wrote:
>
> >> On Wed, 11 Jan 2023 10:16:55 -0500, Stefan wrote:
> >>
> >>
> >> >>On Wed, Jan 11, 2023 at 11:21:35AM +0800, 侯英乐 wrote:
> >> >> As we know, nvme has more features than virtio-blk. For example, with the development of virtualization IO offloading to hardware, virtio-blk and NVME-OF offloading to hardware >are developing rapidly. So if virtio and nvme are combined into Virtio-NvMe, Is it necessary to add a device type Virtio-NvMe ?
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> >Hi,
> >> >In theory, yes, virtio-nvme can be done. The question is why do it?
> >>
> >>
> >>
> >> >NVMe already provides a PCI hardware spec for software and hardware
> >> >implementations to follow. An NVMe PCI device can be exposed to the
> >> >guest and modern operating systems recognize it without requiring new
> >> >drivers.
> >>
> >>
> >> >The value of VIRTIO here is probably in the deep integration into the
> >> >virtualization stack with vDPA, vhost, etc. A virtio-nvme device can use
> >> >all these things whereas a PCI device needs to do everything from
> >> >scratch.
> >>
> >> The NVME technology and ecosystem are complete. However, in virtualization scenarios, NVME devices can only use PCIe pass-through . When NVME and virtio combine to connect to the vDPA ecosystem, live migration is supported.
> >>
> >>
> >> >Let's not forget that virtio-blk is widely used and new commands are
> >> >being added as needed. Which NVMe features are you missing in
> >> >virtio-blk?
> >>
> >> With the introduction of the concept of DPU, a large number of vendors are offloading virtual devices to hardware. The back-end of Virtio-blk does not support remote storage. Therefore, Virtio-Nvme-of can well combine the advantages of remote storage and virtio live migration
> >
> >virtio-blk is just a storage interface, whether that storage is local or
> >remote is up to the device implementation. The block device could be
> >located on Ceph, NFS, etc.
> >
> >Each virtio-blk device is a single block device. There is no
> >standardized management protocol in virtio-blk for connecting to remote
> >block devices. I'm aware of hardware virtio-blk devices that connect to
> >remote storage. Configuration is performed through an out-of-band
> >management interface.
> >
> >Maybe when you say virtio-blk doesn't support remote storage this is
> >what you mean?
>
>
>
> Yes, virtio-blk devices offlaod to hardware, For example, DPU and SmartNIC.
> So, compare virtio-nvme with virtio-blk and NVME.
>
>
>
>
>
> Stefan
>
Let's circle back a bit. You asked:
So if virtio and nvme are combined into Virtio-NvMe, Is it
necessary to add a device type Virtio-NvMe ?
And the answer seems to be:
not without much more in the way of motivation.
--
MST
This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.
In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.
Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Re: [virtio-comment] About adding a new device type virtio-nvme
2023-02-05 12:33 ` Michael S. Tsirkin
@ 2023-02-09 3:28 ` 侯英乐
2023-02-17 3:01 ` Parav Pandit
0 siblings, 1 reply; 40+ messages in thread
From: 侯英乐 @ 2023-02-09 3:28 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Stefan Hajnoczi, virtio-comment, Christoph Hellwig, Keith Busch,
Kevin Wolf, Klaus Jensen, sgarzare
On Sun, 5 Feb 2023 07:33:17 -0500, Michael S. Tsirkin wrote:
>On Wed, Jan 18, 2023 at 10:49:23AM +0800, 侯英乐 wrote:
>> On Tue, 17 Jan 2023 11:01:37 -0500, Stefan wrote:
>>
>> >On Tue, Jan 17, 2023 at 10:04:07AM +0800, 侯英乐 wrote:
>>
>> >> On Wed, 11 Jan 2023 10:16:55 -0500, Stefan wrote:
>> >>
>> >>
>> >> >>On Wed, Jan 11, 2023 at 11:21:35AM +0800, 侯英乐 wrote:
>> >> >> As we know, nvme has more features than virtio-blk. For example, with the development of virtualization IO offloading to hardware, virtio-blk and NVME-OF offloading to hardware >are developing rapidly. So if virtio and nvme are combined into Virtio-NvMe, Is it necessary to add a device type Virtio-NvMe ?
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> >Hi,
>> >> >In theory, yes, virtio-nvme can be done. The question is why do it?
>> >>
>> >>
>> >>
>> >> >NVMe already provides a PCI hardware spec for software and hardware
>> >> >implementations to follow. An NVMe PCI device can be exposed to the
>> >> >guest and modern operating systems recognize it without requiring new
>> >> >drivers.
>> >>
>> >>
>> >> >The value of VIRTIO here is probably in the deep integration into the
>> >> >virtualization stack with vDPA, vhost, etc. A virtio-nvme device can use
>> >> >all these things whereas a PCI device needs to do everything from
>> >> >scratch.
>> >>
>> >> The NVME technology and ecosystem are complete. However, in virtualization scenarios, NVME devices can only use PCIe pass-through . When NVME and virtio combine to connect to the vDPA ecosystem, live migration is supported.
>> >>
>> >>
>> >> >Let's not forget that virtio-blk is widely used and new commands are
>> >> >being added as needed. Which NVMe features are you missing in
>> >> >virtio-blk?
>> >>
>> >> With the introduction of the concept of DPU, a large number of vendors are offloading virtual devices to hardware. The back-end of Virtio-blk does not support remote storage. Therefore, Virtio-Nvme-of can well combine the advantages of remote storage and virtio live migration
>> >
>> >virtio-blk is just a storage interface, whether that storage is local or
>> >remote is up to the device implementation. The block device could be
>> >located on Ceph, NFS, etc.
>> >
>> >Each virtio-blk device is a single block device. There is no
>> >standardized management protocol in virtio-blk for connecting to remote
>> >block devices. I'm aware of hardware virtio-blk devices that connect to
>> >remote storage. Configuration is performed through an out-of-band
>> >management interface.
>> >
>> >Maybe when you say virtio-blk doesn't support remote storage this is
>> >what you mean?
>>
>>
>>
>> Yes, virtio-blk devices offlaod to hardware, For example, DPU and SmartNIC.
>> So, compare virtio-nvme with virtio-blk and NVME.
>>
>>
>>
>>
>>
>> Stefan
>>
>
>Let's circle back a bit. You asked:
>So if virtio and nvme are combined into Virtio-NvMe, Is it
>necessary to add a device type Virtio-NvMe ?
>And the answer seems to be:
>not without much more in the way of motivation.
Yes, the conclusion is this.
Thanks!
----
Leo
^ permalink raw reply [flat|nested] 40+ messages in thread
* RE: Re: [virtio-comment] About adding a new device type virtio-nvme
2023-02-09 3:28 ` 侯英乐
@ 2023-02-17 3:01 ` Parav Pandit
0 siblings, 0 replies; 40+ messages in thread
From: Parav Pandit @ 2023-02-17 3:01 UTC (permalink / raw)
To: 侯英乐, Michael S. Tsirkin
Cc: Stefan Hajnoczi, virtio-comment, Christoph Hellwig, Keith Busch,
Kevin Wolf, Klaus Jensen, sgarzare
> From: virtio-comment@lists.oasis-open.org <virtio-comment@lists.oasis-
> open.org> On Behalf Of ???
> >So if virtio and nvme are combined into Virtio-NvMe, Is it
>
> >necessary to add a device type Virtio-NvMe ?
>
> >And the answer seems to be:
>
> >not without much more in the way of motivation.
>
>
>
> Yes, the conclusion is this.
Since this concluded now, can you please close the github issue [1]?
[1] https://github.com/oasis-tcs/virtio-spec/issues/149
^ permalink raw reply [flat|nested] 40+ messages in thread
end of thread, other threads:[~2023-02-17 3:01 UTC | newest]
Thread overview: 40+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-01-11 3:21 [virtio-comment] About adding a new device type virtio-nvme 侯英乐
2023-01-11 15:16 ` Stefan Hajnoczi
2023-01-17 2:04 ` 侯英乐
2023-01-17 8:32 ` David Hildenbrand
2023-01-17 9:30 ` 侯英乐
[not found] ` <202301171730174296359@sudoinfotech.com>
2023-01-17 9:41 ` 侯英乐
2023-01-17 15:34 ` Stefan Hajnoczi
2023-01-17 15:47 ` David Hildenbrand
2023-01-18 2:38 ` 侯英乐
2023-01-18 2:15 ` 侯英乐
2023-01-18 14:08 ` Stefan Hajnoczi
2023-01-19 8:31 ` 侯英乐
2023-01-18 14:14 ` Stefan Hajnoczi
2023-01-19 3:40 ` Jason Wang
2023-01-19 16:59 ` Stefan Hajnoczi
2023-01-19 9:03 ` 侯英乐
2023-01-19 17:03 ` Stefan Hajnoczi
2023-01-19 3:38 ` Jason Wang
2023-01-19 7:22 ` 侯英乐
2023-01-17 16:01 ` Stefan Hajnoczi
[not found] ` <20230117162114.GA24976@lst.de>
2023-01-17 16:53 ` Stefan Hajnoczi
2023-01-18 11:22 ` Michael S. Tsirkin
2023-01-18 2:49 ` 侯英乐
2023-02-05 12:33 ` Michael S. Tsirkin
2023-02-09 3:28 ` 侯英乐
2023-02-17 3:01 ` Parav Pandit
2023-01-19 11:42 ` Michael S. Tsirkin
2023-01-29 3:32 ` 侯英乐
2023-01-30 20:30 ` Stefan Hajnoczi
2023-01-17 17:19 ` Max Gurtovoy
2023-01-18 3:23 ` 侯英乐
2023-01-18 10:09 ` Max Gurtovoy
2023-01-18 11:12 ` Michael S. Tsirkin
2023-01-18 11:27 ` Max Gurtovoy
2023-01-18 13:29 ` Michael S. Tsirkin
2023-01-19 10:19 ` 侯英乐
2023-01-19 10:33 ` Max Gurtovoy
2023-01-19 11:02 ` 侯英乐
2023-01-19 17:15 ` Stefan Hajnoczi
2023-01-18 10:28 ` Michael S. Tsirkin
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.