public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 00/11] nvmet: Add NVMe target mdev/vfio driver
@ 2025-03-13  5:18 Mike Christie
  2025-03-13  5:18 ` [PATCH RFC 01/11] nvmet: Remove duplicate uuid_copy Mike Christie
                   ` (12 more replies)
  0 siblings, 13 replies; 29+ messages in thread
From: Mike Christie @ 2025-03-13  5:18 UTC (permalink / raw)
  To: chaitanyak, kbusch, hch, sagi, joao.m.martins, linux-nvme, kvm,
	kwankhede, alex.williamson, mlevitsk

The following patches were made over Linus's tree. They implement
a virtual PCI NVMe device using mdev/vfio. The device can be used
by QEMU and in the guest will look like a normal old local PCI
NVMe drive.

They are based on Maxim Levitsky's mdev patches:

https://lore.kernel.org/lkml/20190506125752.GA5288@lst.de/t/

but instead of trying to export a physical NVMe device to a guest, they
are only focused on exporting a virtual device using the nvmet layer.

Why another driver when we have so many? Performance.
=====================================================
Without any tuning and major locks still in the main IO path, 4K IOPS for
a single controller with a single namespace are higher than the kernel
vhost-scsi driver and SPDK vhost-scsi/blk user when using lower number
of queues/cpus/jobs. At just 2 queues, we are able to hit 1M IOPS:

Note: the nvme mdev values below have the shadow doorbell enabled

        mdev vhost-scsi vhost-scsi-usr vhost-blk-usr
numjobs
1       518K    198K        332K        301K
2       1037K   363K        609K        664K
4       974K    633K        1369K       1383K
8       813K    1788K       1358K       1363K

However, by default we can't scale. But, tuning mdev to pre-pin pages
(this requires patches to the vfio layer to support) then it also performs
better at lower and higher number of queues/cpus/jobs used with it
reaching 2.3M IOPS woth only 4 cpus/queues used:

        mdev
numjobs
1       505K
2       1037K
4       2375K
8       2162K

If we agree on a new virtual NVMe driver being ok, why mdev vs vhost?
=====================================================================
The problem with a vhost nvme is:

2.1. If we do a fully vhost nvmet solution, it will require new guest
drivers that present NVMe interfaces to userspace then perform the
vhost spec on the backend like how vhost-scsi does.

I don't want to implement a windows or even a linux nvme vhost
driver. I don't think anyone wants the extra headache.

2.2. We can do a hybrid approach where in the guest it looks like we
are a normal old local NVMe drive and use the guest's native NVMe driver.
However in QEMU we would have a vhost nvme module that instead of using
vhost virtqueues handles virtual PCI memory accesses as well as a vhost
nvme kernel or user driver to process IO.

So not as much extra code as option 1 since we don't have to worry about
the guest but still extra QEMU code.

3. The mdev based solution does not have these drawbacks as it can
look like a normal old local NVMe drive to the guest and can use QEMU's
existing vfio layer. So it just requires the kernel driver.

Why not a new blk driver or why not vdpa blk?
=============================================
Applications want standardized interfaces for things like persistent
reservations. They have to support them with SCSI and NVMe already
and don't want to have to support a new virtio block interface.

Also the nvmet-mdev-pci driver in this patchset can perform was well
as SPDK vhost blk so that doesn't have the perf advantage like it
used to.

Status
======
This patchset is RFC quality only. You can discover a drive and do
IO but it's not stable. There's several TODO items mentioned in the
last patch. However, I think the patches are at the point where I
wanted to get some feedback about if this even acceptable because
the last time they were posted some people did not like how
they hooked into drivers/nvme/host (this has been fixed in this
posting). There's some other issues like:

1. Should the driver integrate with pci-epf (the drivers work very
differently but could share some code)?

2. Should it try to fit into the existing configfs interface or implement
it's own like how pci-epf did? I did an attempt for this but it feels
wrong.




^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2025-03-14  8:31 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-13  5:18 [PATCH RFC 00/11] nvmet: Add NVMe target mdev/vfio driver Mike Christie
2025-03-13  5:18 ` [PATCH RFC 01/11] nvmet: Remove duplicate uuid_copy Mike Christie
2025-03-13  6:36   ` Christoph Hellwig
2025-03-13  8:59   ` Damien Le Moal
2025-03-13 17:20   ` Keith Busch
2025-03-13  5:18 ` [PATCH RFC 02/11] nvmet: Export nvmet_add_async_event and add definitions Mike Christie
2025-03-13  6:36   ` Christoph Hellwig
2025-03-13 17:50     ` Mike Christie
2025-03-13  5:18 ` [PATCH RFC 03/11] nvmet: Add nvmet_fabrics_ops flag to indicate SGLs not supported Mike Christie
2025-03-13  6:37   ` Christoph Hellwig
2025-03-13  9:02   ` Damien Le Moal
2025-03-13  9:13     ` Christoph Hellwig
2025-03-13  9:16       ` Damien Le Moal
2025-03-13 17:19         ` Mike Christie
2025-03-13  5:18 ` [PATCH RFC 04/11] nvmet: Add function to get nvmet_fabrics_ops from trtype Mike Christie
2025-03-13  9:03   ` Damien Le Moal
2025-03-13  5:18 ` [PATCH RFC 05/11] nvmet: Add function to print trtype Mike Christie
2025-03-13  5:18 ` [PATCH RFC 06/11] nvmet: Allow nvmet_alloc_ctrl users to specify the cntlid Mike Christie
2025-03-13  5:18 ` [PATCH RFC 07/11] nvmet: Add static controller support to configfs Mike Christie
2025-03-13  5:18 ` [PATCH RFC 08/11] nvmet: Add shadow doorbell support Mike Christie
2025-03-13  5:18 ` [PATCH RFC 09/11] nvmet: Add helpers to find and get static controllers Mike Christie
2025-03-13  5:18 ` [PATCH RFC 10/11] nvmet: Add addr fam and trtype for mdev pci driver Mike Christie
2025-03-13  6:42   ` Christoph Hellwig
2025-03-13 17:56     ` Mike Christie
2025-03-13  5:18 ` [PATCH RFC 11/11] nvmet: Add nvmet-mdev-pci driver Mike Christie
2025-03-13  5:32 ` [PATCH RFC 00/11] nvmet: Add NVMe target mdev/vfio driver Damien Le Moal
2025-03-13  6:47 ` Christoph Hellwig
2025-03-13 17:17   ` Mike Christie
2025-03-14  8:31     ` Hannes Reinecke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox