All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@redhat.com>
To: 侯英乐 <houyingle@sudoinfotech.com>
Cc: jasowang <jasowang@redhat.com>,
	David Hildenbrand <david@redhat.com>,
	virtio-comment <virtio-comment@lists.oasis-open.org>,
	Christoph Hellwig <hch@lst.de>, Keith Busch <kbusch@kernel.org>,
	Kevin Wolf <kwolf@redhat.com>,
	Klaus Jensen <k.jensen@samsung.com>,
	sgarzare <sgarzare@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Alex Williamson <alex.williamson@redhat.com>
Subject: Re: Re: [virtio-comment] About adding a new device type virtio-nvme
Date: Wed, 18 Jan 2023 09:08:36 -0500	[thread overview]
Message-ID: <Y8f9ZLgPP2Ou9qWP@fedora> (raw)
In-Reply-To: <CB1D0D0BB5D3E5E6+2023011810151168808928@sudoinfotech.com>

[-- Attachment #1: Type: text/plain, Size: 11494 bytes --]

On Wed, Jan 18, 2023 at 10:15:12AM +0800, 侯英乐 wrote:
> On Tue, 17 Jan 2023 10:34:09 -0500, Stefan wrote:
> 
> >On Tue, Jan 17, 2023 at 05:41:57PM +0800, 侯英乐 wrote:
> >> On Tue, 17 Jan 2023 09:32:05 +0100,David wrote:
> >> >On 17.01.23 03:04, 侯英乐 wrote:
> >>
> >>
> >>
> >> >> virtio-nvme advantages :
> >> >> 1)  live migration
> >> >> 2)  support remote storage
> >>
> >>
> >>
> >> >At least 1) is an implementation detail in the NVME implementation in
> >> >the hypervisor. I suspect 2) in a similar way, or is there a fundamental
> >> >issue with that?
> >>
> >>
> >>
> >> >One problematic thing about the NVME implementation in QEMU is that it
> >> >will pin (via vfio) all guest RAM. Could that be avoided using
> >> >virtio-NVME, or what exactly would be the difference between virtio-nvme
> >> >and ordinary NVME?
> >>
> >>
> >>
> >> In the virtualization scenario where devices are offload to hardware:
> >>
> >>
> >> NVME:
> >> ---------------------------------------------------------------------------------------------------------------------
> >>            _____________________________________________________________________________                            
> >>           |             ___________________________________________________________     |                           
> >>           |            |    _____________________________________________________  |    |                           
> >>           |            |    |                                                    | |    |                           
> >>           |            |    |             __________________________________     | |    |                           
> >>           |            |    |            |    ______                        |    | |    |        ______             
> >>           |            |    |   User     |   | Mem  |-----------------------|----|-|----|-----> |      |            
> >>           |            |    |            |   |______|        SPDK           |    | |    | (gVA) |______|            
> >>           |            |    |            |    (gVA)                         |    | |    |        |    |             
> >>           |            |    |            |______|___________________________|    | |    |        |    |             
> >>           |            |    |--GuestOS----------|--------------------------------| |    |        |    |             
> >>           |            |    |             ______\/__________________________     | |    |        |    |             
> >>           |            | VM |            |                VFIO              |    | |    |        |    |             
> >>           |   User     |    |   Kernel   |___________     __________________|    | |    |        |    |             
> >>           |            |    |            | vfio-pci  |   |                  |    | |    |        |    |             
> >>           |            |    |            |  (gIOVA)  |   | vfio_iommu_type1 |    | |    |        |    |             
> >> Software  |            |    |            |______|____|___|__________________|    | |    |        |    |             
> >>           |            |    |___________________|________________________________| |    |        |    |             
> >>           |            |     ___________________|________________________________  |    |        |    |             
> >>           |            |    |             ______|____     __________________     | |    |        \/   \/            
> >>           |            |    |            |      \/   |   |                  |    | |    |        ______             
> >>           |            |    |            |    NVME   |   |      vIOMMU      |    | |    |       |      |            
> >>           |            |    |   QEMU     |  Instance |   |                --|----|-|----|-----> |______|            
> >>           |            |    |            |  (gIOVA)  |   |   (gIOVA-->gPA)  |    | |    | (gPA)  |    |             
> >>           |            |    |            |_____|_____|   |__________________|    | |    |        |    |             
> >>           |            |    |__________________|_________________________________| |    |        |    |             
> >>           |            |_______________________|___________________________________|    |        |    |             
> >>           |---HostOS---------------------------|----------------------------------------|        |    |             
> >>           |             _______________________|___________________________________     |        |    |             
> >>           |            |                       |                                   |    |        |    |             
> >>           |            |                       |    VFIO                           |    |        |    |             
> >>           |   Kernel   |_______________________|_____     _________________________|    |        |    |             
> >>           |            |                       \/    |   |                         |    |        |    |             
> >>           |            |        vfio-pci             |   |    vfio_iommu_type1     |    |        |    |             
> >>           |            |                       |     |   |                         |    |        |    |             
> >>           |            |_______________________|_____|___|_________________________|    |        |    |             
> >>           |____________________________________|________________________________________|        |    |             
> >> -----------------------------------------------|-------------------------------------------------|----|--------------
> >>            ____________________________________\/____     _________________________           ___\/___\/___________ 
> >>           |       |    |                             |   |                         |         |  |      |           |
> >> Hardware  |       |    |          DMA     (gIOVA)  --|---|->        IOMMU        --|---------|->|______|           |
> >>           |       |    |_____________________________|   |      (gIOVA-->hPA)      |   (hPA) |   Physical Memory   |
> >>           | DPU   |                                  |   |_________________________|         |_____________________|
> >>           |       |        NVME-of                   |
> >>           |       |__________________________________|
> >>           |                           |              |
> >>           |___________________________|______________|
> >> --------------------------------------|------------------------------------------------------------------------------
> >>                                       | TCP (RDMA, and so on)
> >>                         ______________v__________
> >>                        |                         |
> >> Remote storage         |                         |
> >>                        |     Network Storage     |
> >>                        |                         |
> >>                        |_________________________|
> >>                       
> >> ---------------------------------------------------------------------------------------------------------------------
> >>
> >>
> >> It is difficult to implement PCIe passthrough live migration.
> 
> 
> 
>  
> >Linux commit 115dcec65f61d53e25e1bed5e380468b30f98b14 ("vfio: Define
> >device migration protocol v2") defines the VFIO migration API and it's
> >implemented by several drivers in the kernel.
> 
> 
> Yes, this commit supports VFIO live migration, but the feature is a work in progress,
> recent submission: https://lore.kernel.org/all/20230116141135.12021-10-avihaih@nvidia.com/
>  
> 
> >Can you explain the difficulty of implementing PCIe passthrough live
> >migration in more detail?
> 
> VFIO live migration requires IOMMU to support dirty page tracking. Currently, 
> no IOMMU device supports this feature. So, VFIO live migration will take a long time.
> Detailed information reference:https://www.qemu.org/docs/master/devel/vfio-migration.html

Can physical devices can do their own dirty page tracking in the
meantime since they know which pages are being written to?

I have CCed Alex Williamson regarding VFIO.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2023-01-18 14:08 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-11  3:21 [virtio-comment] About adding a new device type virtio-nvme 侯英乐
2023-01-11 15:16 ` Stefan Hajnoczi
2023-01-17  2:04   ` 侯英乐
2023-01-17  8:32     ` David Hildenbrand
2023-01-17  9:30       ` 侯英乐
     [not found]       ` <202301171730174296359@sudoinfotech.com>
2023-01-17  9:41         ` 侯英乐
2023-01-17 15:34           ` Stefan Hajnoczi
2023-01-17 15:47             ` David Hildenbrand
2023-01-18  2:38               ` 侯英乐
2023-01-18  2:15             ` 侯英乐
2023-01-18 14:08               ` Stefan Hajnoczi [this message]
2023-01-19  8:31                 ` 侯英乐
2023-01-18 14:14               ` Stefan Hajnoczi
2023-01-19  3:40                 ` Jason Wang
2023-01-19 16:59                   ` Stefan Hajnoczi
2023-01-19  9:03                 ` 侯英乐
2023-01-19 17:03                   ` Stefan Hajnoczi
2023-01-19  3:38               ` Jason Wang
2023-01-19  7:22                 ` 侯英乐
2023-01-17 16:01     ` Stefan Hajnoczi
2023-01-18  2:49       ` 侯英乐
2023-02-05 12:33         ` Michael S. Tsirkin
2023-02-09  3:28           ` 侯英乐
2023-02-17  3:01             ` Parav Pandit
     [not found]       ` <20230117162114.GA24976@lst.de>
2023-01-17 16:53         ` Stefan Hajnoczi
2023-01-18 11:22         ` Michael S. Tsirkin
2023-01-19 11:42       ` Michael S. Tsirkin
2023-01-29  3:32         ` 侯英乐
2023-01-30 20:30           ` Stefan Hajnoczi
2023-01-17 17:19     ` Max Gurtovoy
2023-01-18  3:23       ` 侯英乐
2023-01-18 10:09         ` Max Gurtovoy
2023-01-18 11:12           ` Michael S. Tsirkin
2023-01-18 11:27             ` Max Gurtovoy
2023-01-18 13:29               ` Michael S. Tsirkin
2023-01-19 10:19           ` 侯英乐
2023-01-19 10:33             ` Max Gurtovoy
2023-01-19 11:02               ` 侯英乐
2023-01-19 17:15                 ` Stefan Hajnoczi
2023-01-18 10:28         ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y8f9ZLgPP2Ou9qWP@fedora \
    --to=stefanha@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=david@redhat.com \
    --cc=hch@lst.de \
    --cc=houyingle@sudoinfotech.com \
    --cc=jasowang@redhat.com \
    --cc=k.jensen@samsung.com \
    --cc=kbusch@kernel.org \
    --cc=kwolf@redhat.com \
    --cc=mst@redhat.com \
    --cc=sgarzare@redhat.com \
    --cc=virtio-comment@lists.oasis-open.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.