kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] Add new VFIO PCI driver for NVMe devices
@ 2025-08-03  2:47 Chaitanya Kulkarni
  2025-08-03  2:47 ` [RFC PATCH 1/4] vfio-nvme: add vfio-nvme lm driver infrastructure Chaitanya Kulkarni
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Chaitanya Kulkarni @ 2025-08-03  2:47 UTC (permalink / raw)
  To: kbusch, axboe, hch, sagi, alex.williamson, cohuck, jgg, yishaih,
	shameerali.kolothum.thodi, kevin.tian, mjrosato, mgurtovoy
  Cc: linux-nvme, kvm, Konrad.wilk, martin.petersen, jmeneghi, arnd,
	schnelle, bhelgaas, joao.m.martins, Chaitanya Kulkarni

Hi,

Some devices, such as Infrastructure Processing Units (IPUs),
Data Processing Units (DPUs), and  SSDs expose SR-IOV-capable NVMe
devices to the host. These virtual function (VF) devices support live 
migration via specific NVMe admin commands issued through the parent
PF's admin queue.

NVMe TP4159 defines support for basic live migration operations,
including Suspend, Resume, Get Controller State, and Set Controller
State. While TP4159 standardizes the command interface, it does not
yet define a fixed layout for controller state, NVIDIA and others
in NVMe TWG is actively working on defining this layout.

This series introduces a vfio-pci driver to enable live migration of
SR-IOV NVMe devices. It also adds interface hooks to the core NVMe
driver to allow VF command submission through the PF's admin queue.
Adding support for migration of non SR-IOV devices can be added
incrementally.

This RFC complies with the TP4159 specification and is derived from
initial submission of Intel and NVIDIA’s vendor-specific
implementation.

Objective for this RFC
----------------------

Our initial submission received feedback encouraging standardization
of live migration support for NVMe. In response, NVIDIA and Intel
collaborated to merge architectural elements from TP4173 into TP4159.

Now that TP4159 has been ratified with core live migration commands,
we aim to resume discussion with the upstream community and solicit
feedback on what remains to support NVMe live migration in mainline.

What is implemented in this RFC?
--------------------------------

1. Patch 0001 introduces the core vfio-nvme driver infrastructure
   including helper routines and basic driver registration.

2. Patch 0002 adds TP4159-specific command definitions and updates
   existing NVMe data structures, such as `nvme_id_ctrl`.

3. Patch 0003 exports helpers from pci. (Needs a discussion)

4. Patch 0004 implements the TP4159 commands: Suspend, Resume,
   Get Controller State, and Set Controller State. It also includes
   debug helpers and command parsing logic.

Open Issues and Discussion Points
---------------------------------

1. This RFC exposes two new interfaces from the nvme-pci driver to
   submit admin commands for VF devices through the PF. We welcome
   input on the correct or preferred upstream approach for this.

2. Are there any gaps between the current VFIO live migration
   architecture and what is required to fully support NVMe VF
   migration?

3. TP4193 is under development in NVMe TWG, it will define subsystem 
   state and missing configuration functionality. Are there additional
   capabilities or architecture changes needed beyond what TP4193 will
   cover to upstream the VFIO NVMe Live Migration support from spec or 
   from linux kernel point of view ?

NVIDIA and Intel has started the NVMe Live Migration upstreaming work
and fully committed to upstreaming NVMe live migration support, we are
also eager to align ongoing development with community expectations and
bring the feedback to the standards representing the kernel community.

This RFC is compiles and generated on linux-nvme tree branch nvme-6.17
HEAD :-

commit 70d12a283303b1241884b04f77dc1b07fdbbc90e (origin/nvme-6.17)
Author: Maurizio Lombardi <mlombard@redhat.com>
Date:   Wed Jul 2 16:06:29 2025 +0200

    nvme-tcp: log TLS handshake failures at error level

We greatly appreciate your feedback and comments on this work.

-ck

Chaitanya Kulkarni (4):
  vfio-nvme: add vfio-nvme lm driver infrastructure
  nvme: add live migration TP 4159 definitions
  nvme: export helpers to implement vfio-nvme lm
  vfio-nvme: implement TP4159 live migration cmds

 drivers/nvme/host/core.c       |    5 +-
 drivers/nvme/host/nvme.h       |    5 +
 drivers/nvme/host/pci.c        |   34 ++
 drivers/vfio/pci/Kconfig       |    2 +
 drivers/vfio/pci/Makefile      |    2 +
 drivers/vfio/pci/nvme/Kconfig  |   10 +
 drivers/vfio/pci/nvme/Makefile |    6 +
 drivers/vfio/pci/nvme/nvme.c   | 1036 ++++++++++++++++++++++++++++++++
 drivers/vfio/pci/nvme/nvme.h   |   39 ++
 include/linux/nvme.h           |  334 +++++++++-
 10 files changed, 1471 insertions(+), 2 deletions(-)
 create mode 100644 drivers/vfio/pci/nvme/Kconfig
 create mode 100644 drivers/vfio/pci/nvme/Makefile
 create mode 100644 drivers/vfio/pci/nvme/nvme.c
 create mode 100644 drivers/vfio/pci/nvme/nvme.h

-- 
2.40.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-08-04 17:16 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-03  2:47 [RFC PATCH 0/4] Add new VFIO PCI driver for NVMe devices Chaitanya Kulkarni
2025-08-03  2:47 ` [RFC PATCH 1/4] vfio-nvme: add vfio-nvme lm driver infrastructure Chaitanya Kulkarni
2025-08-04 15:20   ` Shameerali Kolothum Thodi
2025-08-04 16:43   ` Bjorn Helgaas
2025-08-04 17:15   ` Alex Williamson
2025-08-03  2:47 ` [RFC PATCH 2/4] nvme: add live migration TP 4159 definitions Chaitanya Kulkarni
2025-08-03  2:47 ` [RFC PATCH 3/4] nvme: export helpers to implement vfio-nvme lm Chaitanya Kulkarni
2025-08-03  2:47 ` [RFC PATCH 4/4] vfio-nvme: implement TP4159 live migration cmds Chaitanya Kulkarni
2025-08-04 16:41   ` Bjorn Helgaas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).