* [PATCH 0/1] nvmet: add basic in-memory backend support
@ 2025-11-04 8:06 Chaitanya Kulkarni
2025-11-04 8:06 ` [PATCH 1/1] " Chaitanya Kulkarni
2025-11-04 10:36 ` [PATCH 0/1] " Hannes Reinecke
0 siblings, 2 replies; 12+ messages in thread
From: Chaitanya Kulkarni @ 2025-11-04 8:06 UTC (permalink / raw)
To: linux-nvme; +Cc: hch, sagi, kch, Chaitanya Kulkarni
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=y, Size: 37986 bytes --]
Hi,
Add a new memory backend (io-cmd-mem.c) that provides RAM-backed storage
for NVMe target namespaces, enabling high-performance volatile storage
without requiring physical block devices or filesystem backing.
* Implementation Overview:
==========================
The memory backend introduces a new namespace configuration option via
configfs that allows users to create memory-backed namespaces by
setting the 'mem_size' attribute instead of 'device_path'.
1. Lazy Page Allocation
- Uses xarray for sparse page storage
- Pages are allocated on first write (copy-on-write semantics)
2. Configfs Interface
New attribute: ${NVMET_CFGFS}/subsystems/<subsys>/namespaces/<ns>/mem_size
- Accepts size in bytes (e.g., "1073741824" for 1GB)
- Can only be set when namespace is disabled
- Mutually exclusive with device_path attribute
- Limited to 80% of total system memory for safety
3. I/O Command Support
- Read
- Write
- Flush
- Discard
- Write Zeroes
4. Backend Detection Logic
Namespace backend is selected based on configuration:
- If mem_size is set (no device_path): Use memory backend
- If device_path points to block device: Use bdev backend
- If device_path points to regular file: Use file backend
5. The implementation follows existing nvmet backend pattern with three
main entry points:
- nvmet_mem_ns_enable(): Initialize namespace with xarray storage
- nvmet_mem_ns_disable(): Release all pages and cleanup
- nvmet_mem_parse_io_cmd(): Dispatch I/O commands to handlers
I/O processing uses scatter-gather iteration similar to existing backends,
with per-page operations that handle alignment and boundary cases.
* Usecase Ephemeral Scratch Space for High-Performance Workloads:
=================================================================
Training job starts, creates scratch namespace
# echo 10737418240 > /sys/kernel/config/nvmet/.../mem_size # 10GB
# echo 1 > /sys/kernel/config/nvmet/.../enable
Pod connects via NVMe-oF, downloads dataset to scratch space
Training job processes data with sub-millisecond latency
Intermediate results stored temporarily in memory-backed namespace
Job completes, namespace destroyed
# echo 0 > /sys/kernel/config/nvmet/.../enable
Memory automatically reclaimed, zero cleanup required
* Testing with blktests support enabled for mem backend for nvme-loop and tcp:
==============================================================================
nvme/006 (tr=loop bd=device) (create an NVMeOF target) [passed]
nvme/006 (tr=loop bd=file) (create an NVMeOF target) [passed]
nvme/006 (tr=loop bd=mem) (create an NVMeOF target) [passed]
--
nvme/008 (tr=loop bd=device) (create an NVMeOF host) [passed]
nvme/008 (tr=loop bd=file) (create an NVMeOF host) [passed]
nvme/008 (tr=loop bd=mem) (create an NVMeOF host) [passed]
--
nvme/010 (tr=loop bd=device) (run data verification fio job) [passed]
nvme/010 (tr=loop bd=file) (run data verification fio job) [passed]
nvme/010 (tr=loop bd=mem) (run data verification fio job) [passed]
--
nvme/012 (tr=loop bd=device) (run mkfs and data verification fio) [passed]
nvme/012 (tr=loop bd=file) (run mkfs and data verification fio) [passed]
nvme/012 (tr=loop bd=mem) (run mkfs and data verification fio) [passed]
--
nvme/014 (tr=loop bd=device) (flush a command from host) [passed]
nvme/014 (tr=loop bd=file) (flush a command from host) [passed]
nvme/014 (tr=loop bd=mem) (flush a command from host) [passed]
--
nvme/019 (tr=loop bd=device) (test NVMe DSM Discard command) [passed]
nvme/019 (tr=loop bd=file) (test NVMe DSM Discard command) [passed]
nvme/019 (tr=loop bd=mem) (test NVMe DSM Discard command) [passed]
--
nvme/021 (tr=loop bd=device) (test NVMe list command) [passed]
nvme/021 (tr=loop bd=file) (test NVMe list command) [passed]
nvme/021 (tr=loop bd=mem) (test NVMe list command) [passed]
--
nvme/022 (tr=loop bd=device) (test NVMe reset command) [passed]
nvme/022 (tr=loop bd=file) (test NVMe reset command) [passed]
nvme/022 (tr=loop bd=mem) (test NVMe reset command) [passed]
--
nvme/023 (tr=loop bd=device) (test NVMe smart-log command) [passed]
nvme/023 (tr=loop bd=file) (test NVMe smart-log command) [passed]
nvme/023 (tr=loop bd=mem) (test NVMe smart-log command) [passed]
--
nvme/025 (tr=loop bd=device) (test NVMe effects-log) [passed]
nvme/025 (tr=loop bd=file) (test NVMe effects-log) [passed]
nvme/025 (tr=loop bd=mem) (test NVMe effects-log) [passed]
--
nvme/026 (tr=loop bd=device) (test NVMe ns-descs) [passed]
nvme/026 (tr=loop bd=file) (test NVMe ns-descs) [passed]
nvme/026 (tr=loop bd=mem) (test NVMe ns-descs) [passed]
--
nvme/027 (tr=loop bd=device) (test NVMe ns-rescan command) [passed]
nvme/027 (tr=loop bd=file) (test NVMe ns-rescan command) [passed]
nvme/027 (tr=loop bd=mem) (test NVMe ns-rescan command) [passed]
--
nvme/028 (tr=loop bd=device) (test NVMe list-subsys) [passed]
nvme/028 (tr=loop bd=file) (test NVMe list-subsys) [passed]
nvme/028 (tr=loop bd=mem) (test NVMe list-subsys) [passed]
--
nvme/006 (tr=tcp bd=device) (create an NVMeOF target) [passed]
nvme/006 (tr=tcp bd=file) (create an NVMeOF target) [passed]
nvme/006 (tr=tcp bd=mem) (create an NVMeOF target) [passed]
--
nvme/008 (tr=tcp bd=device) (create an NVMeOF host) [passed]
nvme/008 (tr=tcp bd=file) (create an NVMeOF host) [passed]
nvme/008 (tr=tcp bd=mem) (create an NVMeOF host) [passed]
--
nvme/010 (tr=tcp bd=device) (run data verification fio job) [passed]
nvme/010 (tr=tcp bd=file) (run data verification fio job) [passed]
nvme/010 (tr=tcp bd=mem) (run data verification fio job) [passed]
--
nvme/012 (tr=tcp bd=device) (run mkfs and data verification fio) [passed]
nvme/012 (tr=tcp bd=file) (run mkfs and data verification fio) [passed]
nvme/012 (tr=tcp bd=mem) (run mkfs and data verification fio) [passed]
--
nvme/014 (tr=tcp bd=device) (flush a command from host) [passed]
nvme/014 (tr=tcp bd=file) (flush a command from host) [passed]
nvme/014 (tr=tcp bd=mem) (flush a command from host) [passed]
--
nvme/019 (tr=tcp bd=device) (test NVMe DSM Discard command) [passed]
nvme/019 (tr=tcp bd=file) (test NVMe DSM Discard command) [passed]
nvme/019 (tr=tcp bd=mem) (test NVMe DSM Discard command) [passed]
--
nvme/021 (tr=tcp bd=device) (test NVMe list command) [passed]
nvme/021 (tr=tcp bd=file) (test NVMe list command) [passed]
nvme/021 (tr=tcp bd=mem) (test NVMe list command) [passed]
--
nvme/022 (tr=tcp bd=device) (test NVMe reset command) [passed]
nvme/022 (tr=tcp bd=file) (test NVMe reset command) [passed]
nvme/022 (tr=tcp bd=mem) (test NVMe reset command) [passed]
--
nvme/023 (tr=tcp bd=device) (test NVMe smart-log command) [passed]
nvme/023 (tr=tcp bd=file) (test NVMe smart-log command) [passed]
nvme/023 (tr=tcp bd=mem) (test NVMe smart-log command) [passed]
--
nvme/025 (tr=tcp bd=device) (test NVMe effects-log) [passed]
nvme/025 (tr=tcp bd=file) (test NVMe effects-log) [passed]
nvme/025 (tr=tcp bd=mem) (test NVMe effects-log) [passed]
--
nvme/026 (tr=tcp bd=device) (test NVMe ns-descs) [passed]
nvme/026 (tr=tcp bd=file) (test NVMe ns-descs) [passed]
nvme/026 (tr=tcp bd=mem) (test NVMe ns-descs) [passed]
--
nvme/027 (tr=tcp bd=device) (test NVMe ns-rescan command) [passed]
nvme/027 (tr=tcp bd=file) (test NVMe ns-rescan command) [passed]
nvme/027 (tr=tcp bd=mem) (test NVMe ns-rescan command) [passed]
--
nvme/028 (tr=tcp bd=device) (test NVMe list-subsys) [passed]
nvme/028 (tr=tcp bd=file) (test NVMe list-subsys) [passed]
nvme/028 (tr=tcp bd=mem) (test NVMe list-subsys) [passed]
-ck
Chaitanya Kulkarni (1):
nvmet: add basic in-memory backend support
drivers/nvme/target/Makefile | 2 +-
drivers/nvme/target/configfs.c | 61 +++++
drivers/nvme/target/core.c | 20 +-
drivers/nvme/target/io-cmd-mem.c | 426 +++++++++++++++++++++++++++++++
drivers/nvme/target/nvmet.h | 8 +
5 files changed, 511 insertions(+), 6 deletions(-)
create mode 100644 drivers/nvme/target/io-cmd-mem.c
nvme (nvme-6.19) # git log -1
commit ca6e4a009dfb06010f590fd80d9899a41cbbe01a (HEAD -> nvme-6.19)
Author: Chaitanya Kulkarni <ckulkarnilinux@gmail.com>
Date: Tue Oct 14 00:05:33 2025 -0700
nvmet: add basic in-memory backend support
Add a new memory backend (io-cmd-mem.c) that enables dynamic, on-demand
RAM-backed storage for NVMe target namespaces. This provides instant,
zero-configuration volatile storage without requiring physical block
devices, filesystem backing, or pre-provisioned storage infrastructure.
Modern cloud-native workloads increasingly require dynamic allocation
of high-performance temporary storage for intermediate data processing,
such as AI/ML training scratch space, data analytics shuffle storage,
and in-memory database overflow. The memory backend addresses this need
by providing instant namespace creation with sub-millisecond latency
via NVMe-oF, eliminating traditional storage provisioning workflows
entirely.
Dynamic Configuration:
The memory backend introduces dynamic namespace configuration via
configfs, enabling instant namespace creation without storage
provisioning. Create memory-backed namespaces on-demand by setting
'mem_size' instead of 'device_path':
# Dynamic namespace creation - instant, no device setup required
echo 1073741824 > /sys/kernel/config/nvmet/.../mem_size
echo 1 > /sys/kernel/config/nvmet/.../enable
This eliminates the need for:
- Block device creation and management (no dd, losetup,
device provisioning)
- Filesystem mounting and configuration
- Storage capacity pre-allocation
- Device cleanup workflows after namespace deletion
Implementation detail :-
- Dynamic page allocation using xarray for sparse storage
- Pages allocated lazily on first write, efficient for partially filled
namespaces
- Full I/O command support: read, write, flush, discard, write-zeroes
- Mutually exclusive with device_path (memory XOR block/file backend)
- Size configurable per-namespace, limited to 80% of total system RAM
- Automatic memory reclamation on namespace deletion
- Page reference counting and cleanup
Backend selection logic:
- If mem_size is set (no device_path): Use memory backend
(dynamic allocation)
- If device_path points to block device: Use bdev backend
- If device_path points to regular file: Use file backend
The implementation follows the existing nvmet backend pattern with three
main entry points:
nvmet_mem_ns_enable() - Initialize namespace with xarray storage
nvmet_mem_ns_disable() - Release all pages and cleanup
nvmet_mem_parse_io_cmd() - Dispatch I/O commands to handlers
Tested with blktests memory backend test suite covering basic I/O
operations, discard/write-zeroes, all transport types (loop/TCP/RDMA),
dynamic namespace creation/deletion cycles, and proper resource cleanup.
Signed-off-by: Chaitanya Kulkarni <ckulkarnilinux@gmail.com>
nvme (nvme-6.19) # ./compile_nvme.sh
+ unload
+ sh ./unload-vfio-nvme.sh
rmmod: ERROR: Module drivers/vfio/pci/nvme/nvme_vfio_pci is not currently loaded
rmmod: ERROR: Module drivers/vfio/pci/vfio_pci is not currently loaded
rmmod: ERROR: Module drivers/vfio/pci/vfio_pci_core is not currently loaded
rmmod: ERROR: Module drivers/vfio/vfio_iommu_type1 is not currently loaded
rmmod: ERROR: Module drivers/vfio/vfio is not currently loaded
############################## UNLOAD #############################
nvme_loop 20480 0
nvmet 237568 1 nvme_loop
nvme_tcp 90112 0
nvme_fabrics 40960 2 nvme_tcp,nvme_loop
nvme_keyring 20480 3 nvmet,nvme_tcp,nvme_fabrics
nvme 69632 0
nvme_core 233472 5 nvmet,nvme_tcp,nvme,nvme_loop,nvme_fabrics
+ umount /mnt/nvme0n1
umount: /mnt/nvme0n1: no mount point specified.
+ ./delete.sh
+ NQN=testnqn
+ nvme disconnect -n testnqn
NQN:testnqn disconnected 0 controller(s)
real 0m0.002s
user 0m0.001s
sys 0m0.001s
+ rm -fr '/sys/kernel/config/nvmet/ports/1/subsystems/*'
+ rmdir /sys/kernel/config/nvmet/ports/1
rmdir: failed to remove '/sys/kernel/config/nvmet/ports/1': No such file or directory
+ for subsys in /sys/kernel/config/nvmet/subsystems/*
+ for ns in ${subsys}/namespaces/*
+ echo 0
./delete.sh: line 14: /sys/kernel/config/nvmet/subsystems/*/namespaces/*/enable: No such file or directory
+ rmdir '/sys/kernel/config/nvmet/subsystems/*/namespaces/*'
rmdir: failed to remove '/sys/kernel/config/nvmet/subsystems/*/namespaces/*': No such file or directory
+ rmdir '/sys/kernel/config/nvmet/subsystems/*'
rmdir: failed to remove '/sys/kernel/config/nvmet/subsystems/*': No such file or directory
+ rmdir 'config/nullb/nullb*'
rmdir: failed to remove 'config/nullb/nullb*': No such file or directory
+ umount /mnt/nvme0n1
umount: /mnt/nvme0n1: no mount point specified.
+ umount /mnt/backend
umount: /mnt/backend: not mounted.
+ echo '############################## DELETE #############################'
############################## DELETE #############################
+ for mod in nvme_loop nvmet nvme_tcp nvme_fabrics nvme nvme_core nvme_keryring nvme_auth null_blk
+ modprobe -r nvme_loop
+ lsmod
+ grep nvme
nvme_tcp 90112 0
nvme_fabrics 40960 1 nvme_tcp
nvme_keyring 20480 2 nvme_tcp,nvme_fabrics
nvme 69632 0
nvme_core 233472 3 nvme_tcp,nvme,nvme_fabrics
+ for mod in nvme_loop nvmet nvme_tcp nvme_fabrics nvme nvme_core nvme_keryring nvme_auth null_blk
+ modprobe -r nvmet
+ lsmod
+ grep nvme
nvme_tcp 90112 0
nvme_fabrics 40960 1 nvme_tcp
nvme_keyring 20480 2 nvme_tcp,nvme_fabrics
nvme 69632 0
nvme_core 233472 3 nvme_tcp,nvme,nvme_fabrics
+ for mod in nvme_loop nvmet nvme_tcp nvme_fabrics nvme nvme_core nvme_keryring nvme_auth null_blk
+ modprobe -r nvme_tcp
+ lsmod
+ grep nvme
nvme 69632 0
nvme_core 233472 1 nvme
+ for mod in nvme_loop nvmet nvme_tcp nvme_fabrics nvme nvme_core nvme_keryring nvme_auth null_blk
+ modprobe -r nvme_fabrics
+ lsmod
+ grep nvme
nvme 69632 0
nvme_core 233472 1 nvme
+ for mod in nvme_loop nvmet nvme_tcp nvme_fabrics nvme nvme_core nvme_keryring nvme_auth null_blk
+ modprobe -r nvme
+ lsmod
+ grep nvme
+ for mod in nvme_loop nvmet nvme_tcp nvme_fabrics nvme nvme_core nvme_keryring nvme_auth null_blk
+ modprobe -r nvme_core
+ lsmod
+ grep nvme
+ for mod in nvme_loop nvmet nvme_tcp nvme_fabrics nvme nvme_core nvme_keryring nvme_auth null_blk
+ modprobe -r nvme_keryring
modprobe: FATAL: Module nvme_keryring not found.
+ lsmod
+ grep nvme
+ for mod in nvme_loop nvmet nvme_tcp nvme_fabrics nvme nvme_core nvme_keryring nvme_auth null_blk
+ modprobe -r nvme_auth
modprobe: FATAL: Module nvme_auth not found.
+ lsmod
+ grep nvme
+ for mod in nvme_loop nvmet nvme_tcp nvme_fabrics nvme nvme_core nvme_keryring nvme_auth null_blk
+ modprobe -r null_blk
+ lsmod
+ grep nvme
+ tree /sys/kernel/config
/sys/kernel/config
└── pci_ep
├── controllers
└── functions
3 directories, 0 files
+ echo '############################## UNLOAD #############################'
############################## UNLOAD #############################
+ for mod in nvme_loop nvmet nvme_tcp nvme_fabrics nvme nvme_core nvme_keryring nvme_auth
+ echo '### nvme_loop unload '
### nvme_loop unload
+ modprobe -r nvme_loop
+ for mod in nvme_loop nvmet nvme_tcp nvme_fabrics nvme nvme_core nvme_keryring nvme_auth
+ echo '### nvmet unload '
### nvmet unload
+ modprobe -r nvmet
+ for mod in nvme_loop nvmet nvme_tcp nvme_fabrics nvme nvme_core nvme_keryring nvme_auth
+ echo '### nvme_tcp unload '
### nvme_tcp unload
+ modprobe -r nvme_tcp
+ for mod in nvme_loop nvmet nvme_tcp nvme_fabrics nvme nvme_core nvme_keryring nvme_auth
+ echo '### nvme_fabrics unload '
### nvme_fabrics unload
+ modprobe -r nvme_fabrics
+ for mod in nvme_loop nvmet nvme_tcp nvme_fabrics nvme nvme_core nvme_keryring nvme_auth
+ echo '### nvme unload '
### nvme unload
+ modprobe -r nvme
+ for mod in nvme_loop nvmet nvme_tcp nvme_fabrics nvme nvme_core nvme_keryring nvme_auth
+ echo '### nvme_core unload '
### nvme_core unload
+ modprobe -r nvme_core
+ for mod in nvme_loop nvmet nvme_tcp nvme_fabrics nvme nvme_core nvme_keryring nvme_auth
+ echo '### nvme_keryring unload '
### nvme_keryring unload
+ modprobe -r nvme_keryring
modprobe: FATAL: Module nvme_keryring not found.
+ for mod in nvme_loop nvmet nvme_tcp nvme_fabrics nvme nvme_core nvme_keryring nvme_auth
+ echo '### nvme_auth unload '
### nvme_auth unload
+ modprobe -r nvme_auth
modprobe: FATAL: Module nvme_auth not found.
+ sleep 1
cdblkt+ lsmod
+ grep nvme
+ git diff
+ getopts :cw option
++ nproc
+ make -j 48 M=drivers/nvme/ modules
make[1]: Entering directory '/mnt/data100G/nvme/drivers/nvme'
e make[1]: Leaving directory '/mnt/data100G/nvme/drivers/nvme'
+ install
++ uname -r
+ LIB=/lib/modules/6.17.0-rc3nvme+/kernel/drivers/nvme
+ HOST=drivers/nvme/host
+ TARGET=drivers/nvme/target
+ HOST_DEST=/lib/modules/6.17.0-rc3nvme+/kernel/drivers/nvme/host/
+ TARGET_DEST=/lib/modules/6.17.0-rc3nvme+/kernel/drivers/nvme/target/
+ cp drivers/nvme/host/nvme-core.ko drivers/nvme/host/nvme-fabrics.ko drivers/nvme/host/nvme-fc.ko drivers/nvme/host/nvme.ko drivers/nvme/host/nvme-rdma.ko drivers/nvme/host/nvme-tcp.ko /lib/modules/6.17.0-rc3nvme+/kernel/drivers/nvme/host//
+ cp drivers/nvme/target/nvme-fcloop.ko drivers/nvme/target/nvme-loop.ko drivers/nvme/target/nvmet-fc.ko drivers/nvme/target/nvmet.ko drivers/nvme/target/nvmet-pci-epf.ko drivers/nvme/target/nvmet-rdma.ko drivers/nvme/target/nvmet-tcp.ko /lib/modules/6.17.0-rc3nvme+/kernel/drivers/nvme/target//
+ ls -lrth /lib/modules/6.17.0-rc3nvme+/kernel/drivers/nvme/host/ /lib/modules/6.17.0-rc3nvme+/kernel/drivers/nvme/target//
/lib/modules/6.17.0-rc3nvme+/kernel/drivers/nvme/host/:
total 9.0M
-rw-r--r--. 1 root root 4.2M Nov 3 23:46 nvme-core.ko
-rw-r--r--. 1 root root 592K Nov 3 23:46 nvme-fabrics.ko
-rw-r--r--. 1 root root 1.2M Nov 3 23:46 nvme-fc.ko
-rw-r--r--. 1 root root 929K Nov 3 23:46 nvme.ko
-rw-r--r--. 1 root root 1.2M Nov 3 23:46 nvme-rdma.ko
-rw-r--r--. 1 root root 1.2M Nov 3 23:46 nvme-tcp.ko
/lib/modules/6.17.0-rc3nvme+/kernel/drivers/nvme/target//:
total 9.9M
-rw-r--r--. 1 root root 660K Nov 3 23:46 nvme-fcloop.ko
-rw-r--r--. 1 root root 563K Nov 3 23:46 nvme-loop.ko
-rw-r--r--. 1 root root 1009K Nov 3 23:46 nvmet-fc.ko
-rw-r--r--. 1 root root 4.9M Nov 3 23:46 nvmet.ko
-rw-r--r--. 1 root root 765K Nov 3 23:46 nvmet-pci-epf.ko
-rw-r--r--. 1 root root 1.1M Nov 3 23:46 nvmet-rdma.ko
-rw-r--r--. 1 root root 994K Nov 3 23:46 nvmet-tcp.ko
+ sync
+ modprobe nvme-core
+ modprobe nvme
+ modprobe nvme-fabrics
+ modprobe nvme-tcp
+ modprobe nvme_loop
+ modprobe nvmet
+ lsmod
+ grep nvme
nvme_loop 20480 0
nvmet 237568 1 nvme_loop
nvme_tcp 90112 0
nvme_fabrics 40960 2 nvme_tcp,nvme_loop
nvme_keyring 20480 3 nvmet,nvme_tcp,nvme_fabrics
nvme 69632 0
nvme_core 233472 5 nvmet,nvme_tcp,nvme,nvme_loop,nvme_fabrics
nvme (nvme-6.19) # cdblktests
blktests (master) # sh test-nvme.sh
+ for t in loop tcp
+ echo '################nvme_trtype=loop############'
################nvme_trtype=loop############
+ nvme_trtype=loop
+ ./check nvme
nvme/002 (tr=loop) (create many subsystems and test discovery) [passed]
runtime 45.859s ... 43.717s
nvme/003 (tr=loop) (test if we're sending keep-alives to a discovery controller) [passed]
runtime 10.229s ... 10.226s
nvme/004 (tr=loop) (test nvme and nvmet UUID NS descriptors) [passed]
runtime 0.796s ... 0.800s
nvme/005 (tr=loop) (reset local loopback target) [passed]
runtime 1.384s ... 1.396s
nvme/006 (tr=loop bd=device) (create an NVMeOF target) [passed]
runtime 0.096s ... 0.092s
nvme/006 (tr=loop bd=file) (create an NVMeOF target) [passed]
runtime 0.069s ... 0.069s
nvme/006 (tr=loop bd=mem) (create an NVMeOF target) [passed]
runtime 0.074s ... 0.072s
nvme/008 (tr=loop bd=device) (create an NVMeOF host) [passed]
runtime 0.814s ... 0.804s
nvme/008 (tr=loop bd=file) (create an NVMeOF host) [passed]
runtime 0.772s ... 0.790s
nvme/008 (tr=loop bd=mem) (create an NVMeOF host) [passed]
runtime 0.781s ... 0.807s
nvme/010 (tr=loop bd=device) (run data verification fio job) [passed]
runtime 77.849s ... 74.891s
nvme/010 (tr=loop bd=file) (run data verification fio job) [passed]
runtime 151.337s ... 152.352s
nvme/010 (tr=loop bd=mem) (run data verification fio job) [passed]
runtime 5.068s ... 4.953s
nvme/012 (tr=loop bd=device) (run mkfs and data verification fio) [passed]
runtime 70.436s ... 73.367s
nvme/012 (tr=loop bd=file) (run mkfs and data verification fio) [passed]
runtime 134.884s ... 137.229s
nvme/012 (tr=loop bd=mem) (run mkfs and data verification fio) [passed]
runtime 15.360s ... 14.633s
nvme/014 (tr=loop bd=device) (flush a command from host) [passed]
runtime 8.608s ... 10.393s
nvme/014 (tr=loop bd=file) (flush a command from host) [passed]
runtime 7.614s ... 7.489s
nvme/014 (tr=loop bd=mem) (flush a command from host) [passed]
runtime 5.053s ... 5.201s
nvme/016 (tr=loop) (create/delete many NVMeOF block device-backed ns and test discovery) [passed]
runtime 34.080s ... 33.866s
nvme/017 (tr=loop) (create/delete many file-ns and test discovery) [passed]
runtime 35.291s ... 35.260s
nvme/018 (tr=loop) (unit test NVMe-oF out of range access on a file backend) [passed]
runtime 0.768s ... 0.766s
nvme/019 (tr=loop bd=device) (test NVMe DSM Discard command) [passed]
runtime 0.795s ... 0.809s
nvme/019 (tr=loop bd=file) (test NVMe DSM Discard command) [passed]
runtime 0.766s ... 0.770s
nvme/019 (tr=loop bd=mem) (test NVMe DSM Discard command) [passed]
runtime 0.771s ... 0.774s
nvme/021 (tr=loop bd=device) (test NVMe list command) [passed]
runtime 0.794s ... 0.800s
nvme/021 (tr=loop bd=file) (test NVMe list command) [passed]
runtime 0.771s ... 0.798s
nvme/021 (tr=loop bd=mem) (test NVMe list command) [passed]
runtime 0.779s ... 0.768s
nvme/022 (tr=loop bd=device) (test NVMe reset command) [passed]
runtime 1.376s ... 1.366s
nvme/022 (tr=loop bd=file) (test NVMe reset command) [passed]
runtime 1.361s ... 1.335s
nvme/022 (tr=loop bd=mem) (test NVMe reset command) [passed]
runtime 1.374s ... 1.245s
nvme/023 (tr=loop bd=device) (test NVMe smart-log command) [passed]
runtime 0.783s ... 0.806s
nvme/023 (tr=loop bd=file) (test NVMe smart-log command) [passed]
runtime 0.757s ... 0.783s
nvme/023 (tr=loop bd=mem) (test NVMe smart-log command) [passed]
runtime 0.790s ... 0.792s
nvme/025 (tr=loop bd=device) (test NVMe effects-log) [passed]
runtime 0.796s ... 0.791s
nvme/025 (tr=loop bd=file) (test NVMe effects-log) [passed]
runtime 0.797s ... 0.771s
nvme/025 (tr=loop bd=mem) (test NVMe effects-log) [passed]
runtime 0.784s ... 0.777s
nvme/026 (tr=loop bd=device) (test NVMe ns-descs) [passed]
runtime 0.794s ... 0.805s
nvme/026 (tr=loop bd=file) (test NVMe ns-descs) [passed]
runtime 0.765s ... 0.775s
nvme/026 (tr=loop bd=mem) (test NVMe ns-descs) [passed]
runtime 0.791s ... 0.789s
nvme/027 (tr=loop bd=device) (test NVMe ns-rescan command) [passed]
runtime 0.820s ... 0.831s
nvme/027 (tr=loop bd=file) (test NVMe ns-rescan command) [passed]
runtime 0.786s ... 0.795s
nvme/027 (tr=loop bd=mem) (test NVMe ns-rescan command) [passed]
runtime 0.795s ... 0.812s
nvme/028 (tr=loop bd=device) (test NVMe list-subsys) [passed]
runtime 0.775s ... 0.786s
nvme/028 (tr=loop bd=file) (test NVMe list-subsys) [passed]
runtime 0.765s ... 0.755s
nvme/028 (tr=loop bd=mem) (test NVMe list-subsys) [passed]
runtime 0.762s ... 0.776s
nvme/029 (tr=loop) (test userspace IO via nvme-cli read/write interface) [passed]
runtime 0.921s ... 0.906s
nvme/030 (tr=loop) (ensure the discovery generation counter is updated appropriately) [passed]
runtime 0.466s ... 0.475s
nvme/031 (tr=loop) (test deletion of NVMeOF controllers immediately after setup) [passed]
runtime 7.590s ... 7.470s
nvme/038 (tr=loop) (test deletion of NVMeOF subsystem without enabling) [passed]
runtime 0.017s ... 0.016s
nvme/040 (tr=loop) (test nvme fabrics controller reset/disconnect operation during I/O) [passed]
runtime 7.792s ... 7.836s
nvme/041 (tr=loop) (Create authenticated connections) [not run]
kernel option NVME_AUTH has not been enabled
kernel option NVME_TARGET_AUTH has not been enabled
nvme-fabrics does not support dhchap_ctrl_secret
nvme/042 (tr=loop) (Test dhchap key types for authenticated connections) [not run]
kernel option NVME_AUTH has not been enabled
kernel option NVME_TARGET_AUTH has not been enabled
nvme-fabrics does not support dhchap_ctrl_secret
nvme/043 (tr=loop) (Test hash and DH group variations for authenticated connections) [not run]
kernel option NVME_AUTH has not been enabled
kernel option NVME_TARGET_AUTH has not been enabled
nvme-fabrics does not support dhchap_ctrl_secret
nvme/044 (tr=loop) (Test bi-directional authentication) [not run]
kernel option NVME_AUTH has not been enabled
kernel option NVME_TARGET_AUTH has not been enabled
nvme-fabrics does not support dhchap_ctrl_secret
nvme/045 (tr=loop) (Test re-authentication) [not run]
kernel option NVME_AUTH has not been enabled
kernel option NVME_TARGET_AUTH has not been enabled
nvme-fabrics does not support dhchap_ctrl_secret
nvme/047 (tr=loop) (test different queue types for fabric transports) [not run]
nvme_trtype=loop is not supported in this test
nvme/048 (tr=loop) (Test queue count changes on reconnect) [not run]
nvme_trtype=loop is not supported in this test
nvme/051 (tr=loop) (test nvmet concurrent ns enable/disable) [passed]
runtime 4.137s ... 3.832s
nvme/052 (tr=loop) (Test file-ns creation/deletion under one subsystem) [passed]
runtime 6.018s ... 5.973s
nvme/054 (tr=loop) (Test the NVMe reservation feature) [passed]
runtime 0.809s ... 0.809s
nvme/055 (tr=loop) (Test nvme write to a loop target ns just after ns is disabled) [passed]
runtime 0.802s ... 0.792s
nvme/056 (tr=loop) (enable zero copy offload and run rw traffic) [not run]
Remote target required but NVME_TARGET_CONTROL is not set
nvme_trtype=loop is not supported in this test
kernel option ULP_DDP has not been enabled
module nvme_tcp does not have parameter ddp_offload
KERNELSRC not set
Kernel sources do not have tools/net/ynl/cli.py
NVME_IFACE not set
nvme/057 (tr=loop) (test nvme fabrics controller ANA failover during I/O) [passed]
runtime 31.640s ... 31.473s
nvme/058 (tr=loop) (test rapid namespace remapping) [passed]
runtime 6.870s ... 5.595s
nvme/060 (tr=loop) (test nvme fabrics target reset) [not run]
nvme_trtype=loop is not supported in this test
nvme/061 (tr=loop) (test fabric target teardown and setup during I/O) [not run]
nvme_trtype=loop is not supported in this test
nvme/062 (tr=loop) (Create TLS-encrypted connections) [not run]
nvme_trtype=loop is not supported in this test
command tlshd is not available
systemctl unit 'tlshd' is missing
Install ktls-utils for tlshd
nvme/063 (tr=loop) (Create authenticated TCP connections with secure concatenation) [not run]
kernel option NVME_AUTH has not been enabled
kernel option NVME_TARGET_AUTH has not been enabled
nvme-fabrics does not support dhchap_ctrl_secret
nvme_trtype=loop is not supported in this test
command tlshd is not available
systemctl unit 'tlshd' is missing
Install ktls-utils for tlshd
nvme/065 (test unmap write zeroes sysfs interface with nvmet devices) [passed]
runtime 2.754s ... 2.780s
+ for t in loop tcp
+ echo '################nvme_trtype=tcp############'
################nvme_trtype=tcp############
+ nvme_trtype=tcp
+ ./check nvme
nvme/002 (tr=tcp) (create many subsystems and test discovery) [not run]
nvme_trtype=tcp is not supported in this test
nvme/003 (tr=tcp) (test if we're sending keep-alives to a discovery controller) [passed]
runtime 10.260s ... 10.276s
nvme/004 (tr=tcp) (test nvme and nvmet UUID NS descriptors) [passed]
runtime 0.291s ... 0.304s
nvme/005 (tr=tcp) (reset local loopback target) [passed]
runtime 0.373s ... 0.380s
nvme/006 (tr=tcp bd=device) (create an NVMeOF target) [passed]
runtime 0.100s ... 0.095s
nvme/006 (tr=tcp bd=file) (create an NVMeOF target) [passed]
runtime 0.076s ... 0.075s
nvme/006 (tr=tcp bd=mem) (create an NVMeOF target) [passed]
runtime 0.078s ... 0.085s
nvme/008 (tr=tcp bd=device) (create an NVMeOF host) [passed]
runtime 0.381s ... 0.761s
nvme/008 (tr=tcp bd=file) (create an NVMeOF host) [passed]
runtime 0.274s ... 0.280s
nvme/008 (tr=tcp bd=mem) (create an NVMeOF host) [passed]
runtime 0.270s ... 0.282s
nvme/010 (tr=tcp bd=device) (run data verification fio job) [passed]
runtime 87.044s ... 87.870s
nvme/010 (tr=tcp bd=file) (run data verification fio job) [passed]
runtime 152.540s ... 149.576s
nvme/010 (tr=tcp bd=mem) (run data verification fio job) [passed]
runtime 17.433s ... 17.207s
nvme/012 (tr=tcp bd=device) (run mkfs and data verification fio) [passed]
runtime 80.288s ... 76.660s
nvme/012 (tr=tcp bd=file) (run mkfs and data verification fio) [passed]
runtime 142.149s ... 133.708s
nvme/012 (tr=tcp bd=mem) (run mkfs and data verification fio) [passed]
runtime 27.757s ... 27.869s
nvme/014 (tr=tcp bd=device) (flush a command from host) [passed]
runtime 7.771s ... 7.752s
nvme/014 (tr=tcp bd=file) (flush a command from host) [passed]
runtime 7.055s ... 7.014s
nvme/014 (tr=tcp bd=mem) (flush a command from host) [passed]
runtime 4.650s ... 4.561s
nvme/016 (tr=tcp) (create/delete many NVMeOF block device-backed ns and test discovery) [not run]
nvme_trtype=tcp is not supported in this test
nvme/017 (tr=tcp) (create/delete many file-ns and test discovery) [not run]
nvme_trtype=tcp is not supported in this test
nvme/018 (tr=tcp) (unit test NVMe-oF out of range access on a file backend) [passed]
runtime 0.263s ... 0.270s
nvme/019 (tr=tcp bd=device) (test NVMe DSM Discard command) [passed]
runtime 0.277s ... 0.290s
nvme/019 (tr=tcp bd=file) (test NVMe DSM Discard command) [passed]
runtime 0.257s ... 0.276s
nvme/019 (tr=tcp bd=mem) (test NVMe DSM Discard command) [passed]
runtime 0.261s ... 0.283s
nvme/021 (tr=tcp bd=device) (test NVMe list command) [passed]
runtime 0.294s ... 0.289s
nvme/021 (tr=tcp bd=file) (test NVMe list command) [passed]
runtime 0.274s ... 0.272s
nvme/021 (tr=tcp bd=mem) (test NVMe list command) [passed]
runtime 0.255s ... 0.267s
nvme/022 (tr=tcp bd=device) (test NVMe reset command) [passed]
runtime 0.380s ... 0.410s
nvme/022 (tr=tcp bd=file) (test NVMe reset command) [passed]
runtime 0.369s ... 0.358s
nvme/022 (tr=tcp bd=mem) (test NVMe reset command) [passed]
runtime 0.354s ... 0.357s
nvme/023 (tr=tcp bd=device) (test NVMe smart-log command) [passed]
runtime 0.288s ... 0.288s
nvme/023 (tr=tcp bd=file) (test NVMe smart-log command) [passed]
runtime 0.267s ... 0.251s
nvme/023 (tr=tcp bd=mem) (test NVMe smart-log command) [passed]
runtime 0.275s ... 0.276s
nvme/025 (tr=tcp bd=device) (test NVMe effects-log) [passed]
runtime 0.290s ... 0.296s
nvme/025 (tr=tcp bd=file) (test NVMe effects-log) [passed]
runtime 0.278s ... 0.275s
nvme/025 (tr=tcp bd=mem) (test NVMe effects-log) [passed]
runtime 0.290s ... 0.278s
nvme/026 (tr=tcp bd=device) (test NVMe ns-descs) [passed]
runtime 0.302s ... 0.283s
nvme/026 (tr=tcp bd=file) (test NVMe ns-descs) [passed]
runtime 0.259s ... 0.254s
nvme/026 (tr=tcp bd=mem) (test NVMe ns-descs) [passed]
runtime 0.265s ... 0.259s
nvme/027 (tr=tcp bd=device) (test NVMe ns-rescan command) [passed]
runtime 0.339s ... 0.318s
nvme/027 (tr=tcp bd=file) (test NVMe ns-rescan command) [passed]
runtime 0.296s ... 0.304s
nvme/027 (tr=tcp bd=mem) (test NVMe ns-rescan command) [passed]
runtime 0.317s ... 0.318s
nvme/028 (tr=tcp bd=device) (test NVMe list-subsys) [passed]
runtime 0.274s ... 0.284s
nvme/028 (tr=tcp bd=file) (test NVMe list-subsys) [passed]
runtime 0.249s ... 0.271s
nvme/028 (tr=tcp bd=mem) (test NVMe list-subsys) [passed]
runtime 0.269s ... 0.278s
nvme/029 (tr=tcp) (test userspace IO via nvme-cli read/write interface) [passed]
runtime 0.425s ... 0.438s
nvme/030 (tr=tcp) (ensure the discovery generation counter is updated appropriately) [passed]
runtime 0.347s ... 0.352s
nvme/031 (tr=tcp) (test deletion of NVMeOF controllers immediately after setup) [passed]
runtime 2.437s ... 2.544s
nvme/038 (tr=tcp) (test deletion of NVMeOF subsystem without enabling) [passed]
runtime 0.022s ... 0.021s
nvme/040 (tr=tcp) (test nvme fabrics controller reset/disconnect operation during I/O) [passed]
runtime 6.404s ... 6.397s
nvme/041 (tr=tcp) (Create authenticated connections) [not run]
kernel option NVME_AUTH has not been enabled
kernel option NVME_TARGET_AUTH has not been enabled
nvme-fabrics does not support dhchap_ctrl_secret
nvme/042 (tr=tcp) (Test dhchap key types for authenticated connections) [not run]
kernel option NVME_AUTH has not been enabled
kernel option NVME_TARGET_AUTH has not been enabled
nvme-fabrics does not support dhchap_ctrl_secret
nvme/043 (tr=tcp) (Test hash and DH group variations for authenticated connections) [not run]
kernel option NVME_AUTH has not been enabled
kernel option NVME_TARGET_AUTH has not been enabled
nvme-fabrics does not support dhchap_ctrl_secret
nvme/044 (tr=tcp) (Test bi-directional authentication) [not run]
kernel option NVME_AUTH has not been enabled
kernel option NVME_TARGET_AUTH has not been enabled
nvme-fabrics does not support dhchap_ctrl_secret
nvme/045 (tr=tcp) (Test re-authentication) [not run]
kernel option NVME_AUTH has not been enabled
kernel option NVME_TARGET_AUTH has not been enabled
nvme-fabrics does not support dhchap_ctrl_secret
nvme/047 (tr=tcp) (test different queue types for fabric transports) [passed]
runtime 1.195s ... 1.197s
nvme/048 (tr=tcp) (Test queue count changes on reconnect) [passed]
runtime 6.368s ... 6.361s
nvme/051 (tr=tcp) (test nvmet concurrent ns enable/disable) [passed]
runtime 4.473s ... 3.472s
nvme/052 (tr=tcp) (Test file-ns creation/deletion under one subsystem) [not run]
nvme_trtype=tcp is not supported in this test
nvme/054 (tr=tcp) (Test the NVMe reservation feature) [passed]
runtime 0.311s ... 0.317s
nvme/055 (tr=tcp) (Test nvme write to a loop target ns just after ns is disabled) [not run]
nvme_trtype=tcp is not supported in this test
nvme/056 (tr=tcp) (enable zero copy offload and run rw traffic) [not run]
Remote target required but NVME_TARGET_CONTROL is not set
kernel option ULP_DDP has not been enabled
module nvme_tcp does not have parameter ddp_offload
KERNELSRC not set
Kernel sources do not have tools/net/ynl/cli.py
NVME_IFACE not set
nvme/057 (tr=tcp) (test nvme fabrics controller ANA failover during I/O) [passed]
runtime 28.377s ... 28.039s
nvme/058 (tr=tcp) (test rapid namespace remapping) [passed]
runtime 4.421s ... 3.264s
nvme/060 (tr=tcp) (test nvme fabrics target reset) [passed]
runtime 19.087s ... 19.643s
nvme/061 (tr=tcp) (test fabric target teardown and setup during I/O) [passed]
runtime 8.446s ... 8.344s
nvme/062 (tr=tcp) (Create TLS-encrypted connections) [not run]
command tlshd is not available
systemctl unit 'tlshd' is missing
Install ktls-utils for tlshd
nvme/063 (tr=tcp) (Create authenticated TCP connections with secure concatenation) [not run]
kernel option NVME_AUTH has not been enabled
kernel option NVME_TARGET_AUTH has not been enabled
nvme-fabrics does not support dhchap_ctrl_secret
command tlshd is not available
systemctl unit 'tlshd' is missing
Install ktls-utils for tlshd
nvme/065 (test unmap write zeroes sysfs interface with nvmet devices) [passed]
runtime 2.780s ... 2.779s
blktests (master) #
--
2.40.0
^ permalink raw reply [flat|nested] 12+ messages in thread* [PATCH 1/1] nvmet: add basic in-memory backend support 2025-11-04 8:06 [PATCH 0/1] nvmet: add basic in-memory backend support Chaitanya Kulkarni @ 2025-11-04 8:06 ` Chaitanya Kulkarni 2025-11-04 10:36 ` [PATCH 0/1] " Hannes Reinecke 1 sibling, 0 replies; 12+ messages in thread From: Chaitanya Kulkarni @ 2025-11-04 8:06 UTC (permalink / raw) To: linux-nvme; +Cc: hch, sagi, kch, Chaitanya Kulkarni Add a new memory backend (io-cmd-mem.c) that enables dynamic, on-demand RAM-backed storage for NVMe target namespaces. This provides instant, zero-configuration volatile storage without requiring physical block devices, filesystem backing, or pre-provisioned storage infrastructure. Modern cloud-native workloads increasingly require dynamic allocation of high-performance temporary storage for intermediate data processing, such as AI/ML training scratch space, data analytics shuffle storage, and in-memory database overflow. The memory backend addresses this need by providing instant namespace creation with sub-millisecond latency via NVMe-oF, eliminating traditional storage provisioning workflows entirely. Dynamic Configuration: The memory backend introduces dynamic namespace configuration via configfs, enabling instant namespace creation without storage provisioning. Create memory-backed namespaces on-demand by setting 'mem_size' instead of 'device_path': # Dynamic namespace creation - instant, no device setup required echo 1073741824 > /sys/kernel/config/nvmet/.../mem_size echo 1 > /sys/kernel/config/nvmet/.../enable This eliminates the need for: - Block device creation and management (no dd, losetup, device provisioning) - Filesystem mounting and configuration - Storage capacity pre-allocation - Device cleanup workflows after namespace deletion Implementation detail :- - Dynamic page allocation using xarray for sparse storage - Pages allocated lazily on first write, efficient for partially filled namespaces - Full I/O command support: read, write, flush, discard, write-zeroes - Mutually exclusive with device_path (memory XOR block/file backend) - Size configurable per-namespace, limited to 80% of total system RAM - Automatic memory reclamation on namespace deletion - Page reference counting and cleanup Backend selection logic: - If mem_size is set (no device_path): Use memory backend (dynamic allocation) - If device_path points to block device: Use bdev backend - If device_path points to regular file: Use file backend The implementation follows the existing nvmet backend pattern with three main entry points: nvmet_mem_ns_enable() - Initialize namespace with xarray storage nvmet_mem_ns_disable() - Release all pages and cleanup nvmet_mem_parse_io_cmd() - Dispatch I/O commands to handlers Tested with blktests memory backend test suite covering basic I/O operations, discard/write-zeroes, all transport types (loop/TCP/RDMA), dynamic namespace creation/deletion cycles, and proper resource cleanup. Signed-off-by: Chaitanya Kulkarni <ckulkarnilinux@gmail.com> --- drivers/nvme/target/Makefile | 2 +- drivers/nvme/target/configfs.c | 61 +++++ drivers/nvme/target/core.c | 20 +- drivers/nvme/target/io-cmd-mem.c | 426 +++++++++++++++++++++++++++++++ drivers/nvme/target/nvmet.h | 8 + 5 files changed, 511 insertions(+), 6 deletions(-) create mode 100644 drivers/nvme/target/io-cmd-mem.c diff --git a/drivers/nvme/target/Makefile b/drivers/nvme/target/Makefile index ed8522911d1f..f27f2bf5a62d 100644 --- a/drivers/nvme/target/Makefile +++ b/drivers/nvme/target/Makefile @@ -11,7 +11,7 @@ obj-$(CONFIG_NVME_TARGET_TCP) += nvmet-tcp.o obj-$(CONFIG_NVME_TARGET_PCI_EPF) += nvmet-pci-epf.o nvmet-y += core.o configfs.o admin-cmd.o fabrics-cmd.o \ - discovery.o io-cmd-file.o io-cmd-bdev.o pr.o + discovery.o io-cmd-file.o io-cmd-bdev.o io-cmd-mem.o pr.o nvmet-$(CONFIG_NVME_TARGET_DEBUGFS) += debugfs.o nvmet-$(CONFIG_NVME_TARGET_PASSTHRU) += passthru.o nvmet-$(CONFIG_BLK_DEV_ZONED) += zns.o diff --git a/drivers/nvme/target/configfs.c b/drivers/nvme/target/configfs.c index 2642e3148f3f..f6ef3404cb81 100644 --- a/drivers/nvme/target/configfs.c +++ b/drivers/nvme/target/configfs.c @@ -535,6 +535,66 @@ static ssize_t nvmet_ns_device_path_store(struct config_item *item, CONFIGFS_ATTR(nvmet_ns_, device_path); +static ssize_t nvmet_ns_mem_size_show(struct config_item *item, char *page) +{ + struct nvmet_ns *ns = to_nvmet_ns(item); + + /* Only show size for memory-backed namespaces */ + if (ns->device_path) + return sprintf(page, "0\n"); + + return sprintf(page, "%llu\n", ns->size); +} + +static ssize_t nvmet_ns_mem_size_store(struct config_item *item, + const char *page, size_t count) +{ + struct nvmet_ns *ns = to_nvmet_ns(item); + struct nvmet_subsys *subsys = ns->subsys; + u64 new_size, max_size; + int ret; + + ret = kstrtou64(page, 0, &new_size); + if (ret) + return ret; + + mutex_lock(&subsys->lock); + + /* Only allow for memory-backed namespaces */ + if (ns->device_path) { + ret = -EINVAL; + goto out_unlock; + } + + /* Can only change size when namespace is disabled */ + if (ns->enabled) { + ret = -EBUSY; + goto out_unlock; + } + + if (new_size == 0) { + ret = -EINVAL; + goto out_unlock; + } + + /* Limit to 80% of total system memory */ + max_size = ((u64)totalram_pages() * 80 / 100) << PAGE_SHIFT; + if (new_size > max_size) { + ret = -EINVAL; + goto out_unlock; + } + + ns->size = new_size; + mutex_unlock(&subsys->lock); + return count; + +out_unlock: + mutex_unlock(&subsys->lock); + return ret; +} + +CONFIGFS_ATTR(nvmet_ns_, mem_size); + #ifdef CONFIG_PCI_P2PDMA static ssize_t nvmet_ns_p2pmem_show(struct config_item *item, char *page) { @@ -800,6 +860,7 @@ CONFIGFS_ATTR(nvmet_ns_, resv_enable); static struct configfs_attribute *nvmet_ns_attrs[] = { &nvmet_ns_attr_device_path, + &nvmet_ns_attr_mem_size, &nvmet_ns_attr_device_nguid, &nvmet_ns_attr_device_uuid, &nvmet_ns_attr_ana_grpid, diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c index cc88e5a28c8a..a2a5e10c2bbd 100644 --- a/drivers/nvme/target/core.c +++ b/drivers/nvme/target/core.c @@ -467,6 +467,7 @@ static void nvmet_ns_dev_disable(struct nvmet_ns *ns) { nvmet_bdev_ns_disable(ns); nvmet_file_ns_disable(ns); + nvmet_mem_ns_disable(ns); } static int nvmet_p2pmem_ns_enable(struct nvmet_ns *ns) @@ -557,8 +558,10 @@ bool nvmet_ns_revalidate(struct nvmet_ns *ns) if (ns->bdev) nvmet_bdev_ns_revalidate(ns); - else + else if (ns->file) nvmet_file_ns_revalidate(ns); + else + nvmet_mem_ns_revalidate(ns); return oldsize != ns->size; } @@ -580,9 +583,14 @@ int nvmet_ns_enable(struct nvmet_ns *ns) if (ns->enabled) goto out_unlock; - ret = nvmet_bdev_ns_enable(ns); - if (ret == -ENOTBLK) - ret = nvmet_file_ns_enable(ns); + /* Memory backend if no device_path is set */ + if (!ns->device_path) { + ret = nvmet_mem_ns_enable(ns); + } else { + ret = nvmet_bdev_ns_enable(ns); + if (ret == -ENOTBLK) + ret = nvmet_file_ns_enable(ns); + } if (ret) goto out_unlock; @@ -1121,8 +1129,10 @@ static u16 nvmet_parse_io_cmd(struct nvmet_req *req) case NVME_CSI_NVM: if (req->ns->file) ret = nvmet_file_parse_io_cmd(req); - else + else if (req->ns->bdev) ret = nvmet_bdev_parse_io_cmd(req); + else + ret = nvmet_mem_parse_io_cmd(req); break; case NVME_CSI_ZNS: if (IS_ENABLED(CONFIG_BLK_DEV_ZONED)) diff --git a/drivers/nvme/target/io-cmd-mem.c b/drivers/nvme/target/io-cmd-mem.c new file mode 100644 index 000000000000..a92c639490cd --- /dev/null +++ b/drivers/nvme/target/io-cmd-mem.c @@ -0,0 +1,426 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * NVMe over Fabrics target in-memory I/O command implementation. + * Copyright (c) 2024 NVIDIA Corporation. + * Author: Chaitanya Kulkarni <kch@nvidia.com> + */ +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt +#include <linux/module.h> +#include <linux/xarray.h> +#include <linux/highmem.h> +#include <linux/overflow.h> +#include "nvmet.h" + +/* Convert sector to xarray page index */ +#define SECTOR_TO_PAGE_IDX(sector) \ + ((sector) >> (PAGE_SHIFT - SECTOR_SHIFT)) + +/* Calculate byte offset within a page for a given sector */ +#define SECTOR_TO_OFFSET_IN_PAGE(sector) \ + (((sector) << SECTOR_SHIFT) & ~PAGE_MASK) + +/* + * Validate LBA range against namespace size. + * Returns 0 if valid, NVMe error status if out of bounds. + */ +static u16 nvmet_mem_check_range(struct nvmet_ns *ns, u64 slba, u32 nlb) +{ + u64 end_lba; + + /* Check for overflow in end_lba calculation */ + if (unlikely(check_add_overflow(slba, nlb, &end_lba))) + return NVME_SC_LBA_RANGE | NVME_STATUS_DNR; + + /* Convert namespace size (bytes) to LBAs */ + if (unlikely(end_lba > (ns->size >> ns->blksize_shift))) + return NVME_SC_LBA_RANGE | NVME_STATUS_DNR; + + return 0; +} + +/* + * Look up a page with refcount grabbed. + * Returns page with reference, or NULL if not found. + */ +static struct page *nvmet_mem_lookup_page(struct nvmet_ns *ns, pgoff_t idx) +{ + struct page *page; + + rcu_read_lock(); +repeat: + page = xa_load(&ns->mem_pages, idx); + if (!page) + goto out; + + if (!get_page_unless_zero(page)) + goto repeat; + + /* + * Verify page is still in tree after getting refcount. + * If not, it's being removed - drop ref and retry. + */ + if (unlikely(page != xa_load(&ns->mem_pages, idx))) { + put_page(page); + goto repeat; + } +out: + rcu_read_unlock(); + return page; +} + +/* + * Allocate and insert a page into the namespace xarray. + * Returns the page with reference grabbed on success, ERR_PTR on failure. + * Caller must call put_page() when done. + */ +static struct page *nvmet_mem_insert_page(struct nvmet_ns *ns, sector_t sect) +{ + pgoff_t idx = SECTOR_TO_PAGE_IDX(sect); + struct page *page, *ret; + + /* Fast path: check if page already exists */ + page = nvmet_mem_lookup_page(ns, idx); + if (page) + return page; + + /* Allocate new page outside of lock */ + page = alloc_page(GFP_NOWAIT | __GFP_ZERO | __GFP_NOWARN); + if (!page) + return ERR_PTR(-ENOMEM); + + /* Try to insert - handle race with xa_cmpxchg */ + xa_lock(&ns->mem_pages); + ret = __xa_cmpxchg(&ns->mem_pages, idx, NULL, page, GFP_ATOMIC); + + if (!ret) { + /* We successfully inserted the page */ + ns->mem_nr_pages++; + get_page(page); /* Reference for caller */ + xa_unlock(&ns->mem_pages); + return page; + } + + if (!xa_is_err(ret)) { + /* Another thread won the race - use their page */ + get_page(ret); /* Reference for caller */ + xa_unlock(&ns->mem_pages); + put_page(page); /* Free our allocated page */ + return ret; + } + + /* Insertion failed due to xarray error */ + xa_unlock(&ns->mem_pages); + put_page(page); + return ERR_PTR(xa_err(ret)); +} + +static int nvmet_mem_read_chunk(struct nvmet_ns *ns, void *sgl_addr, + sector_t sect, unsigned int copy_len) +{ + unsigned int offset = SECTOR_TO_OFFSET_IN_PAGE(sect); + pgoff_t idx = SECTOR_TO_PAGE_IDX(sect); + struct page *page; + void *src; + + page = nvmet_mem_lookup_page(ns, idx); + if (page) { + src = kmap_local_page(page); + memcpy(sgl_addr, src + offset, copy_len); + kunmap_local(src); + put_page(page); /* Drop reference */ + } else { + memset(sgl_addr, 0, copy_len); + } + + return 0; +} + +static int nvmet_mem_write_chunk(struct nvmet_ns *ns, void *sgl_addr, + sector_t sect, unsigned int copy_len) +{ + unsigned int offset = SECTOR_TO_OFFSET_IN_PAGE(sect); + struct page *page; + void *dst; + + page = nvmet_mem_insert_page(ns, sect); + if (IS_ERR(page)) + return PTR_ERR(page); + + dst = kmap_local_page(page); + memcpy(dst + offset, sgl_addr, copy_len); + kunmap_local(dst); + put_page(page); + + return 0; +} + +static void nvmet_mem_execute_rw(struct nvmet_req *req) +{ + int (*process_chunk)(struct nvmet_ns *ns, void *sgl_addr, + sector_t sect, unsigned int copy_len); + struct nvmet_ns *ns = req->ns; + struct sg_mapping_iter miter; + unsigned int sg_flags; + sector_t sect; + u16 status = NVME_SC_SUCCESS; + int ret; + + if (!nvmet_check_transfer_len(req, nvmet_rw_data_len(req))) + return; + + status = nvmet_mem_check_range(ns, le64_to_cpu(req->cmd->rw.slba), + le16_to_cpu(req->cmd->rw.length) + 1); + if (status) { + nvmet_req_complete(req, status); + return; + } + + sect = nvmet_lba_to_sect(ns, req->cmd->rw.slba); + + if (req->cmd->rw.opcode == nvme_cmd_write) { + sg_flags = SG_MITER_FROM_SG; + process_chunk = nvmet_mem_write_chunk; + } else { + sg_flags = SG_MITER_TO_SG; + process_chunk = nvmet_mem_read_chunk; + } + + sg_miter_start(&miter, req->sg, req->sg_cnt, sg_flags); + + while (sg_miter_next(&miter)) { + unsigned int miter_offset = 0; + unsigned int miter_len = miter.length; + + while (miter_len > 0) { + unsigned int offset, copy_len; + + offset = SECTOR_TO_OFFSET_IN_PAGE(sect); + copy_len = min_t(unsigned int, miter_len, + PAGE_SIZE - offset); + + ret = process_chunk(ns, miter.addr + miter_offset, + sect, copy_len); + if (ret) { + status = NVME_SC_INTERNAL | NVME_STATUS_DNR; + goto out; + } + + sect += copy_len >> SECTOR_SHIFT; + miter_offset += copy_len; + miter_len -= copy_len; + } + } + +out: + sg_miter_stop(&miter); + nvmet_req_complete(req, status); +} + +/* + * Flush command - no-op for memory backend (no persistent storage). + */ +static void nvmet_mem_execute_flush(struct nvmet_req *req) +{ + if (!nvmet_check_transfer_len(req, 0)) + return; + nvmet_req_complete(req, NVME_SC_SUCCESS); +} + +/* + * Discard/TRIM command - delete pages in the range. + * With PAGE_SIZE LBAs, each LBA maps to exactly one page, + * so we simply delete the corresponding pages from xarray. + */ +static void nvmet_mem_execute_discard(struct nvmet_req *req) +{ + struct nvmet_ns *ns = req->ns; + struct nvme_dsm_range range; + u16 status = NVME_SC_SUCCESS; + int i; + + for (i = 0; i < le32_to_cpu(req->cmd->dsm.nr) + 1; i++) { + sector_t sect, nr_sect; + pgoff_t idx; + struct page *page; + + status = nvmet_copy_from_sgl(req, i * sizeof(range), &range, + sizeof(range)); + if (status) + break; + + status = nvmet_mem_check_range(ns, le64_to_cpu(range.slba), + le32_to_cpu(range.nlb)); + if (status) + break; + + sect = nvmet_lba_to_sect(ns, range.slba); + nr_sect = le32_to_cpu(range.nlb) << + (ns->blksize_shift - SECTOR_SHIFT); + + /* Skip zero-length discard */ + if (nr_sect == 0) + continue; + + /* + * With PAGE_SIZE LBAs, sectors align with pages. + * Delete all pages in range. + */ + xa_lock(&ns->mem_pages); + for (idx = SECTOR_TO_PAGE_IDX(sect); + idx < SECTOR_TO_PAGE_IDX(sect + nr_sect); + idx++) { + page = __xa_erase(&ns->mem_pages, idx); + if (page) { + put_page(page); + ns->mem_nr_pages--; + } + } + xa_unlock(&ns->mem_pages); + } + + nvmet_req_complete(req, status); +} + +/* + * Write Zeroes command - allocate zeroed pages in the range. + * With PAGE_SIZE LBAs, each LBA maps to exactly one page. + * Allocate pages and zero them (allocation gives zeroed pages). + */ +static void nvmet_mem_execute_write_zeroes(struct nvmet_req *req) +{ + struct nvme_write_zeroes_cmd *wz = &req->cmd->write_zeroes; + struct nvmet_ns *ns = req->ns; + sector_t nr_sects, sect; + pgoff_t idx, start_idx, end_idx; + struct page *page; + void *kaddr; + u16 status = NVME_SC_SUCCESS; + + status = nvmet_mem_check_range(ns, le64_to_cpu(wz->slba), + le16_to_cpu(wz->length) + 1); + if (status) + goto out; + + sect = nvmet_lba_to_sect(ns, wz->slba); + nr_sects = (le16_to_cpu(wz->length) + 1) << + (ns->blksize_shift - SECTOR_SHIFT); + + start_idx = SECTOR_TO_PAGE_IDX(sect); + end_idx = SECTOR_TO_PAGE_IDX(sect + nr_sects - 1); + + /* + * With PAGE_SIZE LBAs, sectors align with pages. + * Allocate and zero all pages in range. + */ + for (idx = start_idx; idx <= end_idx; idx++) { + sector_t pg_sect = idx << (PAGE_SHIFT - SECTOR_SHIFT); + + page = nvmet_mem_insert_page(ns, pg_sect); + if (IS_ERR(page)) { + status = NVME_SC_INTERNAL | NVME_STATUS_DNR; + goto out; + } + + kaddr = kmap_local_page(page); + memset(kaddr, 0, PAGE_SIZE); + kunmap_local(kaddr); + + put_page(page); + } + +out: + nvmet_req_complete(req, status); +} + +/* + * Setup namespace for memory backend. + */ +int nvmet_mem_ns_enable(struct nvmet_ns *ns) +{ + if (!ns->size) { + pr_err("memory backend: namespace size not configured\n"); + return -EINVAL; + } + + xa_init(&ns->mem_pages); + ns->mem_nr_pages = 0; + + /* Set block size shift - memory backend uses page-sized LBAs */ + ns->blksize_shift = PAGE_SHIFT; /* 2^12 = 4096 bytes */ + + pr_info("memory backend: enabled namespace %u, size %lld, lba_size %u\n", + ns->nsid, ns->size, 1 << ns->blksize_shift); + + return 0; +} + +/* + * Disable namespace and free all pages. + */ +void nvmet_mem_ns_disable(struct nvmet_ns *ns) +{ + unsigned long nr_freed = 0; + struct page *pages[32]; /* Batch size */ + unsigned long idx; + int count, i; + + /* Free all allocated pages using batch collection */ + do { + count = 0; + idx = 0; + + xa_lock(&ns->mem_pages); + while (count < 32) { + pages[count] = xa_find(&ns->mem_pages, &idx, + ULONG_MAX, XA_PRESENT); + if (!pages[count]) + break; + __xa_erase(&ns->mem_pages, idx); + count++; + idx++; + } + xa_unlock(&ns->mem_pages); + + for (i = 0; i < count; i++) { + put_page(pages[i]); + nr_freed++; + } + + cond_resched(); + + } while (count > 0); + + xa_destroy(&ns->mem_pages); + ns->mem_nr_pages = 0; + + pr_info("memory backend: disabled namespace %u, freed %lu pages\n", + ns->nsid, nr_freed); +} + +void nvmet_mem_ns_revalidate(struct nvmet_ns *ns) +{ + /* Nothing to revalidate for memory backend */ +} + +u16 nvmet_mem_parse_io_cmd(struct nvmet_req *req) +{ + struct nvme_command *cmd = req->cmd; + + switch (cmd->common.opcode) { + case nvme_cmd_read: + case nvme_cmd_write: + req->execute = nvmet_mem_execute_rw; + return 0; + case nvme_cmd_flush: + req->execute = nvmet_mem_execute_flush; + return 0; + case nvme_cmd_dsm: + req->execute = nvmet_mem_execute_discard; + return 0; + case nvme_cmd_write_zeroes: + req->execute = nvmet_mem_execute_write_zeroes; + return 0; + default: + return NVME_SC_INVALID_OPCODE | NVME_STATUS_DNR; + } +} diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h index b73d9589e043..1deea1527700 100644 --- a/drivers/nvme/target/nvmet.h +++ b/drivers/nvme/target/nvmet.h @@ -128,6 +128,10 @@ struct nvmet_ns { u8 csi; struct nvmet_pr pr; struct xarray pr_per_ctrl_refs; + + /* Memory backend support */ + struct xarray mem_pages; + u64 mem_nr_pages; /* Protected by mem_pages xa_lock */ }; static inline struct nvmet_ns *to_nvmet_ns(struct config_item *item) @@ -706,14 +710,18 @@ bool nvmet_host_allowed(struct nvmet_subsys *subsys, const char *hostnqn); int nvmet_bdev_ns_enable(struct nvmet_ns *ns); int nvmet_file_ns_enable(struct nvmet_ns *ns); +int nvmet_mem_ns_enable(struct nvmet_ns *ns); void nvmet_bdev_ns_disable(struct nvmet_ns *ns); void nvmet_file_ns_disable(struct nvmet_ns *ns); +void nvmet_mem_ns_disable(struct nvmet_ns *ns); u16 nvmet_bdev_flush(struct nvmet_req *req); u16 nvmet_file_flush(struct nvmet_req *req); void nvmet_ns_changed(struct nvmet_subsys *subsys, u32 nsid); void nvmet_bdev_ns_revalidate(struct nvmet_ns *ns); void nvmet_file_ns_revalidate(struct nvmet_ns *ns); +void nvmet_mem_ns_revalidate(struct nvmet_ns *ns); bool nvmet_ns_revalidate(struct nvmet_ns *ns); +u16 nvmet_mem_parse_io_cmd(struct nvmet_req *req); u16 blk_to_nvme_status(struct nvmet_req *req, blk_status_t blk_sts); bool nvmet_bdev_zns_enable(struct nvmet_ns *ns); -- 2.40.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 0/1] nvmet: add basic in-memory backend support 2025-11-04 8:06 [PATCH 0/1] nvmet: add basic in-memory backend support Chaitanya Kulkarni 2025-11-04 8:06 ` [PATCH 1/1] " Chaitanya Kulkarni @ 2025-11-04 10:36 ` Hannes Reinecke 2025-11-05 0:09 ` Chaitanya Kulkarni 2025-11-06 2:54 ` Keith Busch 1 sibling, 2 replies; 12+ messages in thread From: Hannes Reinecke @ 2025-11-04 10:36 UTC (permalink / raw) To: Chaitanya Kulkarni, linux-nvme; +Cc: hch, sagi, kch On 11/4/25 09:06, Chaitanya Kulkarni wrote: > Hi, > > Add a new memory backend (io-cmd-mem.c) that provides RAM-backed storage > for NVMe target namespaces, enabling high-performance volatile storage > without requiring physical block devices or filesystem backing. > > * Implementation Overview: > ========================== > The memory backend introduces a new namespace configuration option via > configfs that allows users to create memory-backed namespaces by > setting the 'mem_size' attribute instead of 'device_path'. > 1. Lazy Page Allocation > - Uses xarray for sparse page storage > - Pages are allocated on first write (copy-on-write semantics) > 2. Configfs Interface > New attribute: ${NVMET_CFGFS}/subsystems/<subsys>/namespaces/<ns>/mem_size > - Accepts size in bytes (e.g., "1073741824" for 1GB) > - Can only be set when namespace is disabled > - Mutually exclusive with device_path attribute > - Limited to 80% of total system memory for safety > 3. I/O Command Support > - Read > - Write > - Flush > - Discard > - Write Zeroes > 4. Backend Detection Logic > Namespace backend is selected based on configuration: > - If mem_size is set (no device_path): Use memory backend > - If device_path points to block device: Use bdev backend > - If device_path points to regular file: Use file backend > 5. The implementation follows existing nvmet backend pattern with three > main entry points: > - nvmet_mem_ns_enable(): Initialize namespace with xarray storage > - nvmet_mem_ns_disable(): Release all pages and cleanup > - nvmet_mem_parse_io_cmd(): Dispatch I/O commands to handlers > > I/O processing uses scatter-gather iteration similar to existing backends, > with per-page operations that handle alignment and boundary cases. > Question remains: why? We already have at least two other memory-backed devices (brd, null_blk) which should do the job just nicely. Why do we need to add another one? Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/1] nvmet: add basic in-memory backend support 2025-11-04 10:36 ` [PATCH 0/1] " Hannes Reinecke @ 2025-11-05 0:09 ` Chaitanya Kulkarni 2025-11-05 13:14 ` hch 2025-11-06 2:54 ` Keith Busch 1 sibling, 1 reply; 12+ messages in thread From: Chaitanya Kulkarni @ 2025-11-05 0:09 UTC (permalink / raw) To: Hannes Reinecke, Chaitanya Kulkarni, linux-nvme@lists.infradead.org Cc: hch@lst.de, sagi@grimberg.me, Chaitanya Kulkarni >> >> I/O processing uses scatter-gather iteration similar to existing >> backends, >> with per-page operations that handle alignment and boundary cases. >> > Question remains: why? > We already have at least two other memory-backed devices (brd, null_blk) > which should do the job just nicely. > Why do we need to add another one? > > Cheers, > > Hannes Thanks for looking into this. brd and null_blk require going through block layer (bio/request allocation) which adds unnecessary overhead for each I/O and fundamentally forces block layer infra to access pages. This block layer infra is already processed I/Os on the host side that already provides block device infra and block device view with nvmet-mem-backend. 1. Fundamentally wrong abstraction when accessing the memory (pages) :- * NVMe Host -> Fabric -> nvmet -> block layer -> brd/null_blk -> memory ^ Unnecessary infrastructure to access memory. * NVMe Host -> Fabric -> nvmet -> memory may I please know what is the advantage this intermediary level where accessing the pages has no conceptual obligation to go through block layer infra, that is forced by null_blk and brd unlike nvmet-mem-backend ? 2. Unnecessary Processing overhead: - bio_alloc() + mempool_alloc() - bio_add_page() for each page - submit_bio() -> submit_bio_noacct() - blk_mq_submit_bio() - Request allocation and Request queue processing - Tag allocation - Plug/unplug handling - Queue lock contention - I/O scheduler invocation (noop processing) - bio_for_each_segment() iteration - bio_endio() + mempool_free() # Code path: ~25-30 function calls Even for brd although it doesn't require request allocation you still need to - allocate the bio - convert the nvmet->req-> sg_list pages to bio bvec - again convert it back to brd xarray from bio-bvec - vice versa for every single I/O. may I please know what is the advantage of this unnecessary conversion when accessing pages can be done efficiently with nvmet-mem-backend ? nvmet-mem direct path: - struct sg_mapping_iter : ~100 bytes (stack, reused - zero heap) - sg_miter_start() on nvmet request sgl - kmap_local_page() -> memcpy() -> kunmap_local() # Code path: ~8-10 function calls 3. Block layer usage model The block layer provides critical functionality for _real block device_: - I/O scheduling (deadline, mq-deadline, etc.) - Request merging - Partitioning - Device queue limits - Plug/unplug batching - Multi-queue infrastructure - Device abstraction For memory-backed nvmet namespaces, _none_ of these apply: - No scheduling needed - memory has no seek time to optimize - No merging needed - no device to batch requests to - No partitions - nvmet exposes namespaces, not block devices - No queue limits - memory has no device-specific constraints - No batching needed - completion is synchronous - No multi-queue needed - no hardware queues to distribute to - No abstraction needed - we ARE the storage back-end and so does list of the calls mentioned in #2 needs to go away. Every bio/request allocation is pure waste for this use case. -ck ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/1] nvmet: add basic in-memory backend support 2025-11-05 0:09 ` Chaitanya Kulkarni @ 2025-11-05 13:14 ` hch 2025-11-06 1:02 ` Chaitanya Kulkarni 2025-11-06 1:20 ` Chaitanya Kulkarni 0 siblings, 2 replies; 12+ messages in thread From: hch @ 2025-11-05 13:14 UTC (permalink / raw) To: Chaitanya Kulkarni Cc: Hannes Reinecke, Chaitanya Kulkarni, linux-nvme@lists.infradead.org, hch@lst.de, sagi@grimberg.me But what is the use that requires removing all that overhead / indirection? I think you need to describe that very clearly to make a case. And maybe drop a lot of the marketing sounding overly dramatatic language that really does not help the case. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/1] nvmet: add basic in-memory backend support 2025-11-05 13:14 ` hch @ 2025-11-06 1:02 ` Chaitanya Kulkarni 2025-11-06 1:03 ` Chaitanya Kulkarni 2025-11-06 1:20 ` Chaitanya Kulkarni 1 sibling, 1 reply; 12+ messages in thread From: Chaitanya Kulkarni @ 2025-11-06 1:02 UTC (permalink / raw) To: hch@lst.de, Hannes Reinecke Cc: Chaitanya Kulkarni, linux-nvme@lists.infradead.org, sagi@grimberg.me Hannes and Christoph, On 11/5/25 05:14, hch@lst.de wrote: > But what is the use that requires removing all that overhead / indirection? > > I think you need to describe that very clearly to make a case. And > maybe drop a lot of the marketing sounding overly dramatatic language > that really does not help the case. Here is the quantitative data that proves removing all overhead/indirection gives better performance for nvmet-mem-backend against null_blk membacked and brd. (see [1] for raw data). ##################################################################### SUMMARY ##################################################################### Note: All values are averages of 3 test iterations per category. ===================================================================== Summary of Dataset 1: perf-results-20251105-102906 (48 FIO Jobs, 5GB) ===================================================================== * nvmet-mem Performance vs null_blk: ------------------------------------------------------------------- Metric Test null_blk nvmet-mem nvmet-mem % +/- ------------------------------------------------------------------- IOPS randread 559,828.91 638,364.83 +14.03% randwrite 563,724.18 617,446.95 +9.53% BW (MiB/s) randread 2,186.83 2,493.61 +14.03% randwrite 2,202.05 2,411.90 +9.53% * nvmet-mem Performance vs BRD: ------------------------------------------------------------------- Metric Test BRD nvmet-mem nvmet-mem % +/- ------------------------------------------------------------------- IOPS randread 572,101.88 638,364.83 +11.58% randwrite 590,480.73 617,446.95 +4.57% BW (MiB/s) randread 2,234.77 2,493.61 +11.58% randwrite 2,306.57 2,411.90 +4.57% ===================================================================== Summary of Dataset 2: perf-results-20251105-120239 (48 FIO Jobs, 5GB) ===================================================================== *nvmet-mem Performance vs null_blk: ------------------------------------------------------------------- Metric Test null_blk nvmet-mem % nvmet-mem +/- ------------------------------------------------------------------- IOPS randread 556,310.23 612,604.62 +10.12% randwrite 558,665.03 609,816.62 +9.16% BW (MiB/s) randread 2,173.09 2,392.99 +10.12% randwrite 2,182.29 2,382.10 +9.16% * nvmet-mem Performance vs BRD: ------------------------------------------------------------------- Metric Test BRD nvmet-mem % nvmet-mem +/- ------------------------------------------------------------------- IOPS randread 576,684.10 612,604.62 +6.23% randwrite 564,228.76 609,816.62 +8.08% BW (MiB/s) randread 2,252.67 2,392.99 +6.23% randwrite 2,204.02 2,382.10 +8.08% ===================================================================== Summary of Dataset 3: perf-results-20251105-160213 (48 FIO Jobs, 5GB) ===================================================================== * nvmet-mem Performance vs null_blk: -------------------------------------------------------------------- Metric Test null_blk nvmet-mem nvmet-mem % +/- -------------------------------------------------------------------- IOPS randread 556,333.33 619,666.67 +11.38% randwrite 561,333.33 623,000.00 +10.99% BW (MiB/s) randread 2,173.00 2,420.33 +11.38% randwrite 2,192.00 2,432.33 +10.96% * nvmet-mem Performance vs BRD: -------------------------------------------------------------------- Metric Test BRD nvmet-mem nvmet-mem % +/- -------------------------------------------------------------------- IOPS randread 572,666.67 619,666.67 +8.21% randwrite 591,333.33 623,000.00 +5.36% BW (MiB/s) randread 2,237.00 2,420.33 +8.20% randwrite 2,310.00 2,432.33 +5.30%May I please know if this is acceptable ? if so I'll update the commit along with other review comments... -ck [1] ===================================================================== Performance Comparison Tables: nvmet-mem vs null_blk vs BRD ===================================================================== Test Configuration: Namespace Size: 5GB FIO Jobs: 48 IO Depth: 64 Test Iterations: 3 per category (results shown are averages) Tests: randread, randwrite Backends: null_blk (memory-backed), nvmet-mem, BRD Machine Information: CPU: AMD Ryzen Threadripper PRO 3975WX 32-Cores (64 threads) Memory: 62 GiB Kernel: 6.17.0-rc3nvme+ ===================================================================== Test methodology: for REP in 1 2; do for BACKEND in nulll_blk_membacked=1 nvmet-mem brd; do setup NVMeOF with $BACKEND ns and connect for FIO_JOB fio-verify fio-randwrite fio-randread; do for i in 1 2 3; do fio $FIO_JOB $BACKEND_DEV capture results into fio-res-$REP-$i.log done done done done ############################################################################ RAW DATA ############################################################################ Dataset 1: perf-results-20251105-102906 (48 FIO Jobs, 5GB) ############################################################################ Test Comparison: null_blk vs nvmet-mem (IOPS) ---------------------------------------------------------------------------- Test null_blk nvmet-mem Diff (%) Winner ---------------------------------------------------------------------------- randread 559,828.91 638,364.83 +14.03% nvmet-mem randwrite 563,724.18 617,446.95 +9.53% nvmet-mem Individual Iteration Values: null_blk randread: 560,901.51 | 558,407.41 | 560,177.82 = 559,828.91 nvmet-mem randread: 595,460.64 | 653,900.97 | 665,732.88 = 638,364.83 null_blk randwrite: 563,323.97 | 562,106.23 | 565,742.33 = 563,724.18 nvmet-mem randwrite: 640,550.91 | 617,786.76 | 594,003.17 = 617,446.95 Test Comparison: BRD vs nvmet-mem (IOPS) ---------------------------------------------------------------------------- Test BRD nvmet-mem Diff (%) Winner ---------------------------------------------------------------------------- randread 572,101.88 638,364.83 +11.58% nvmet-mem randwrite 590,480.73 617,446.95 +4.57% nvmet-mem Individual Iteration Values: BRD randread: 574,151.01 | 568,049.86 | 574,104.76 = 572,101.88 nvmet-mem randread: 595,460.64 | 653,900.97 | 665,732.88 = 638,364.83 BRD randwrite: 620,592.07 | 574,959.80 | 575,890.31 = 590,480.73 nvmet-mem randwrite: 640,550.91 | 617,786.76 | 594,003.17 = 617,446.95 Test Comparison: null_blk vs nvmet-mem (BW MiB/s) ---------------------------------------------------------------------------- Test null_blk nvmet-mem Diff (%) Winner ---------------------------------------------------------------------------- randread 2,186.83 2,493.61 +14.03% nvmet-mem randwrite 2,202.05 2,411.90 +9.53% nvmet-mem Individual Iteration Values: null_blk randread: 2,191.02 | 2,181.28 | 2,188.19 = 2,186.83 nvmet-mem randread: 2,326.02 | 2,554.30 | 2,600.52 = 2,493.61 null_blk randwrite: 2,200.48 | 2,195.73 | 2,209.93 = 2,202.05 nvmet-mem randwrite: 2,502.15 | 2,413.23 | 2,320.32 = 2,411.90 Test Comparison: BRD vs nvmet-mem (BW MiB/s) ---------------------------------------------------------------------------- Test BRD nvmet-mem Diff (%) Winner ---------------------------------------------------------------------------- randread 2,234.77 2,493.61 +11.58% nvmet-mem randwrite 2,306.57 2,411.90 +4.57% nvmet-mem Individual Iteration Values: BRD randread: 2,242.78 | 2,218.94 | 2,242.60 = 2,234.77 nvmet-mem randread: 2,326.02 | 2,554.30 | 2,600.52 = 2,493.61 BRD randwrite: 2,424.19 | 2,245.94 | 2,249.57 = 2,306.57 nvmet-mem randwrite: 2,502.15 | 2,413.23 | 2,320.32 = 2,411.90 ############################################################################ Dataset 2: perf-results-20251105-120239 (48 FIO Jobs, 5GB) ############################################################################ Test Comparison: null_blk vs nvmet-mem (IOPS) ---------------------------------------------------------------------------- Test null_blk nvmet-mem Diff (%) Winner ---------------------------------------------------------------------------- randread 556,310.23 612,604.62 +10.12% nvmet-mem randwrite 558,665.03 609,816.62 +9.16% nvmet-mem Individual Iteration Values: null_blk randread: 555,694.56 | 557,269.21 | 555,966.92 = 556,310.23 nvmet-mem randread: 564,060.01 | 614,484.34 | 659,269.52 = 612,604.62 null_blk randwrite: 561,266.97 | 557,400.91 | 557,327.21 = 558,665.03 nvmet-mem randwrite: 629,159.63 | 577,127.87 | 623,162.36 = 609,816.62 Test Comparison: BRD vs nvmet-mem (IOPS) ---------------------------------------------------------------------------- Test BRD nvmet-mem Diff (%) Winner ---------------------------------------------------------------------------- randread 576,684.10 612,604.62 +6.23% nvmet-mem randwrite 564,228.76 609,816.62 +8.08% nvmet-mem Individual Iteration Values: BRD randread: 559,428.70 | 612,435.76 | 558,187.85 = 576,684.10 nvmet-mem randread: 564,060.01 | 614,484.34 | 659,269.52 = 612,604.62 BRD randwrite: 562,967.74 | 565,558.21 | 564,160.34 = 564,228.76 nvmet-mem randwrite: 629,159.63 | 577,127.87 | 623,162.36 = 609,816.62 Test Comparison: null_blk vs nvmet-mem (BW MiB/s) ---------------------------------------------------------------------------- Test null_blk nvmet-mem Diff (%) Winner ---------------------------------------------------------------------------- randread 2,173.09 2,392.99 +10.12% nvmet-mem randwrite 2,182.29 2,382.10 +9.16% nvmet-mem Individual Iteration Values: null_blk randread: 2,170.68 | 2,176.83 | 2,171.75 = 2,173.09 nvmet-mem randread: 2,203.36 | 2,400.33 | 2,575.27 = 2,392.99 null_blk randwrite: 2,192.45 | 2,177.35 | 2,177.06 = 2,182.29 nvmet-mem randwrite: 2,457.65 | 2,254.41 | 2,434.23 = 2,382.10 Test Comparison: BRD vs nvmet-mem (BW MiB/s) ---------------------------------------------------------------------------- Test BRD nvmet-mem Diff (%) Winner ---------------------------------------------------------------------------- randread 2,252.67 2,392.99 +6.23% nvmet-mem randwrite 2,204.02 2,382.10 +8.08% nvmet-mem Individual Iteration Values: BRD randread: 2,185.27 | 2,392.33 | 2,180.42 = 2,252.67 nvmet-mem randread: 2,203.36 | 2,400.33 | 2,575.27 = 2,392.99 BRD randwrite: 2,199.09 | 2,209.21 | 2,203.75 = 2,204.02 nvmet-mem randwrite: 2,457.65 | 2,254.41 | 2,434.23 = 2,382.10 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/1] nvmet: add basic in-memory backend support 2025-11-06 1:02 ` Chaitanya Kulkarni @ 2025-11-06 1:03 ` Chaitanya Kulkarni 0 siblings, 0 replies; 12+ messages in thread From: Chaitanya Kulkarni @ 2025-11-06 1:03 UTC (permalink / raw) To: hch@lst.de, Hannes Reinecke Cc: Chaitanya Kulkarni, linux-nvme@lists.infradead.org, sagi@grimberg.me Messed up format let me resend again. -ck On 11/5/25 17:02, Chaitanya Kulkarni wrote: > Hannes and Christoph, > > On 11/5/25 05:14, hch@lst.de wrote: >> But what is the use that requires removing all that overhead / indirection? >> >> I think you need to describe that very clearly to make a case. And >> maybe drop a lot of the marketing sounding overly dramatatic language >> that really does not help the case. > Here is the quantitative data that proves removing all > overhead/indirection gives better performance for nvmet-mem-backend > against null_blk membacked and brd. (see [1] for raw data). > ##################################################################### > SUMMARY > ##################################################################### > Note: All values are averages of 3 test iterations per category. > > ===================================================================== > Summary of Dataset 1: perf-results-20251105-102906 (48 FIO Jobs, 5GB) > ===================================================================== * > nvmet-mem Performance vs null_blk: > ------------------------------------------------------------------- > Metric Test null_blk nvmet-mem nvmet-mem % +/- > ------------------------------------------------------------------- IOPS > randread 559,828.91 638,364.83 +14.03% randwrite 563,724.18 617,446.95 > +9.53% BW (MiB/s) randread 2,186.83 2,493.61 +14.03% randwrite 2,202.05 > 2,411.90 +9.53% * nvmet-mem Performance vs BRD: > ------------------------------------------------------------------- > Metric Test BRD nvmet-mem nvmet-mem % +/- > ------------------------------------------------------------------- IOPS > randread 572,101.88 638,364.83 +11.58% randwrite 590,480.73 617,446.95 > +4.57% BW (MiB/s) randread 2,234.77 2,493.61 +11.58% randwrite 2,306.57 > 2,411.90 +4.57% > ===================================================================== > Summary of Dataset 2: perf-results-20251105-120239 (48 FIO Jobs, 5GB) > ===================================================================== > *nvmet-mem Performance vs null_blk: > ------------------------------------------------------------------- > Metric Test null_blk nvmet-mem % nvmet-mem +/- > ------------------------------------------------------------------- IOPS > randread 556,310.23 612,604.62 +10.12% randwrite 558,665.03 609,816.62 > +9.16% BW (MiB/s) randread 2,173.09 2,392.99 +10.12% randwrite 2,182.29 > 2,382.10 +9.16% * nvmet-mem Performance vs BRD: > ------------------------------------------------------------------- > Metric Test BRD nvmet-mem % nvmet-mem +/- > ------------------------------------------------------------------- IOPS > randread 576,684.10 612,604.62 +6.23% randwrite 564,228.76 609,816.62 > +8.08% BW (MiB/s) randread 2,252.67 2,392.99 +6.23% randwrite 2,204.02 > 2,382.10 +8.08% > ===================================================================== > Summary of Dataset 3: perf-results-20251105-160213 (48 FIO Jobs, 5GB) > ===================================================================== * > nvmet-mem Performance vs null_blk: > -------------------------------------------------------------------- > Metric Test null_blk nvmet-mem nvmet-mem % +/- > -------------------------------------------------------------------- > IOPS randread 556,333.33 619,666.67 +11.38% randwrite 561,333.33 > 623,000.00 +10.99% BW (MiB/s) randread 2,173.00 2,420.33 +11.38% > randwrite 2,192.00 2,432.33 +10.96% * nvmet-mem Performance vs BRD: > -------------------------------------------------------------------- > Metric Test BRD nvmet-mem nvmet-mem % +/- > -------------------------------------------------------------------- > IOPS randread 572,666.67 619,666.67 +8.21% randwrite 591,333.33 > 623,000.00 +5.36% BW (MiB/s) randread 2,237.00 2,420.33 +8.20% randwrite > 2,310.00 2,432.33 +5.30%May I please know if this is acceptable ? if so > I'll update the commit along with other review comments... -ck [1] > ===================================================================== > Performance Comparison Tables: nvmet-mem vs null_blk vs BRD > ===================================================================== > Test Configuration: Namespace Size: 5GB FIO Jobs: 48 IO Depth: > 64 Test Iterations: 3 per category (results shown are averages) > Tests: randread, randwrite Backends: null_blk (memory-backed), > nvmet-mem, BRD Machine Information: CPU: AMD Ryzen Threadripper PRO > 3975WX 32-Cores (64 threads) Memory: 62 GiB Kernel: > 6.17.0-rc3nvme+ > ===================================================================== > Test methodology: for REP in 1 2; do for BACKEND in > nulll_blk_membacked=1 nvmet-mem brd; do setup NVMeOF with > $BACKEND ns and connect for FIO_JOB fio-verify fio-randwrite > fio-randread; do for i in 1 2 3; do fio > $FIO_JOB $BACKEND_DEV capture results into > fio-res-$REP-$i.log done done done done > > ############################################################################ > > RAW DATA > > ############################################################################ > Dataset 1: perf-results-20251105-102906 (48 FIO Jobs, 5GB) > ############################################################################ > > Test Comparison: null_blk vs nvmet-mem (IOPS) > ---------------------------------------------------------------------------- > Test null_blk nvmet-mem Diff (%) Winner > ---------------------------------------------------------------------------- > randread 559,828.91 638,364.83 +14.03% nvmet-mem > randwrite 563,724.18 617,446.95 +9.53% nvmet-mem > > Individual Iteration Values: > null_blk randread: 560,901.51 | 558,407.41 | 560,177.82 = 559,828.91 > nvmet-mem randread: 595,460.64 | 653,900.97 | 665,732.88 = 638,364.83 > null_blk randwrite: 563,323.97 | 562,106.23 | 565,742.33 = 563,724.18 > nvmet-mem randwrite: 640,550.91 | 617,786.76 | 594,003.17 = 617,446.95 > > > Test Comparison: BRD vs nvmet-mem (IOPS) > ---------------------------------------------------------------------------- > Test BRD nvmet-mem Diff (%) Winner > ---------------------------------------------------------------------------- > randread 572,101.88 638,364.83 +11.58% nvmet-mem > randwrite 590,480.73 617,446.95 +4.57% nvmet-mem > > Individual Iteration Values: > BRD randread: 574,151.01 | 568,049.86 | 574,104.76 = 572,101.88 > nvmet-mem randread: 595,460.64 | 653,900.97 | 665,732.88 = 638,364.83 > BRD randwrite: 620,592.07 | 574,959.80 | 575,890.31 = 590,480.73 > nvmet-mem randwrite: 640,550.91 | 617,786.76 | 594,003.17 = 617,446.95 > > > Test Comparison: null_blk vs nvmet-mem (BW MiB/s) > ---------------------------------------------------------------------------- > Test null_blk nvmet-mem Diff (%) Winner > ---------------------------------------------------------------------------- > randread 2,186.83 2,493.61 +14.03% nvmet-mem > randwrite 2,202.05 2,411.90 +9.53% nvmet-mem > > Individual Iteration Values: > null_blk randread: 2,191.02 | 2,181.28 | 2,188.19 = 2,186.83 > nvmet-mem randread: 2,326.02 | 2,554.30 | 2,600.52 = 2,493.61 > null_blk randwrite: 2,200.48 | 2,195.73 | 2,209.93 = 2,202.05 > nvmet-mem randwrite: 2,502.15 | 2,413.23 | 2,320.32 = 2,411.90 > > > Test Comparison: BRD vs nvmet-mem (BW MiB/s) > ---------------------------------------------------------------------------- > Test BRD nvmet-mem Diff (%) Winner > ---------------------------------------------------------------------------- > randread 2,234.77 2,493.61 +11.58% nvmet-mem > randwrite 2,306.57 2,411.90 +4.57% nvmet-mem > > Individual Iteration Values: > BRD randread: 2,242.78 | 2,218.94 | 2,242.60 = 2,234.77 > nvmet-mem randread: 2,326.02 | 2,554.30 | 2,600.52 = 2,493.61 > BRD randwrite: 2,424.19 | 2,245.94 | 2,249.57 = 2,306.57 > nvmet-mem randwrite: 2,502.15 | 2,413.23 | 2,320.32 = 2,411.90 > > ############################################################################ > Dataset 2: perf-results-20251105-120239 (48 FIO Jobs, 5GB) > ############################################################################ > > Test Comparison: null_blk vs nvmet-mem (IOPS) > ---------------------------------------------------------------------------- > Test null_blk nvmet-mem Diff (%) Winner > ---------------------------------------------------------------------------- > randread 556,310.23 612,604.62 +10.12% nvmet-mem > randwrite 558,665.03 609,816.62 +9.16% nvmet-mem > > Individual Iteration Values: > null_blk randread: 555,694.56 | 557,269.21 | 555,966.92 = 556,310.23 > nvmet-mem randread: 564,060.01 | 614,484.34 | 659,269.52 = 612,604.62 > null_blk randwrite: 561,266.97 | 557,400.91 | 557,327.21 = 558,665.03 > nvmet-mem randwrite: 629,159.63 | 577,127.87 | 623,162.36 = 609,816.62 > > > Test Comparison: BRD vs nvmet-mem (IOPS) > ---------------------------------------------------------------------------- > Test BRD nvmet-mem Diff (%) Winner > ---------------------------------------------------------------------------- > randread 576,684.10 612,604.62 +6.23% nvmet-mem > randwrite 564,228.76 609,816.62 +8.08% nvmet-mem > > Individual Iteration Values: > BRD randread: 559,428.70 | 612,435.76 | 558,187.85 = 576,684.10 > nvmet-mem randread: 564,060.01 | 614,484.34 | 659,269.52 = 612,604.62 > BRD randwrite: 562,967.74 | 565,558.21 | 564,160.34 = 564,228.76 > nvmet-mem randwrite: 629,159.63 | 577,127.87 | 623,162.36 = 609,816.62 > > > Test Comparison: null_blk vs nvmet-mem (BW MiB/s) > ---------------------------------------------------------------------------- > Test null_blk nvmet-mem Diff (%) Winner > ---------------------------------------------------------------------------- > randread 2,173.09 2,392.99 +10.12% nvmet-mem > randwrite 2,182.29 2,382.10 +9.16% nvmet-mem > > Individual Iteration Values: > null_blk randread: 2,170.68 | 2,176.83 | 2,171.75 = 2,173.09 > nvmet-mem randread: 2,203.36 | 2,400.33 | 2,575.27 = 2,392.99 > null_blk randwrite: 2,192.45 | 2,177.35 | 2,177.06 = 2,182.29 > nvmet-mem randwrite: 2,457.65 | 2,254.41 | 2,434.23 = 2,382.10 > > > Test Comparison: BRD vs nvmet-mem (BW MiB/s) > ---------------------------------------------------------------------------- > Test BRD nvmet-mem Diff (%) Winner > ---------------------------------------------------------------------------- > randread 2,252.67 2,392.99 +6.23% nvmet-mem > randwrite 2,204.02 2,382.10 +8.08% nvmet-mem > > Individual Iteration Values: > BRD randread: 2,185.27 | 2,392.33 | 2,180.42 = 2,252.67 > nvmet-mem randread: 2,203.36 | 2,400.33 | 2,575.27 = 2,392.99 > BRD randwrite: 2,199.09 | 2,209.21 | 2,203.75 = 2,204.02 > nvmet-mem randwrite: 2,457.65 | 2,254.41 | 2,434.23 = 2,382.10 > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/1] nvmet: add basic in-memory backend support 2025-11-05 13:14 ` hch 2025-11-06 1:02 ` Chaitanya Kulkarni @ 2025-11-06 1:20 ` Chaitanya Kulkarni 2025-11-06 11:50 ` hch 1 sibling, 1 reply; 12+ messages in thread From: Chaitanya Kulkarni @ 2025-11-06 1:20 UTC (permalink / raw) To: hch@lst.de, Hannes Reinecke Cc: Chaitanya Kulkarni, linux-nvme@lists.infradead.org, sagi@grimberg.me Hi Hannes and Christoph, On 11/5/25 05:14, hch@lst.de wrote: > But what is the use that requires removing all that overhead / indirection? > > I think you need to describe that very clearly to make a case. And > maybe drop a lot of the marketing sounding overly dramatatic language > that really does not help the case. the quantitative data that proves removing all overhead/indirection gives better performance for nvmet-mem-backend against null_blk membacked and brd. Here is the link for the comparative data :- https://raw.githubusercontent.com/ckoolkarni/nvmet-mem-brd-null-blk-perf/refs/heads/main/README.md I'll update the git commit log if this is acceptable along with other comments. -ck ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/1] nvmet: add basic in-memory backend support 2025-11-06 1:20 ` Chaitanya Kulkarni @ 2025-11-06 11:50 ` hch 2025-11-10 3:59 ` Chaitanya Kulkarni 0 siblings, 1 reply; 12+ messages in thread From: hch @ 2025-11-06 11:50 UTC (permalink / raw) To: Chaitanya Kulkarni Cc: hch@lst.de, Hannes Reinecke, Chaitanya Kulkarni, linux-nvme@lists.infradead.org, sagi@grimberg.me On Thu, Nov 06, 2025 at 01:20:41AM +0000, Chaitanya Kulkarni wrote: > Hi Hannes and Christoph, > > > On 11/5/25 05:14, hch@lst.de wrote: > > But what is the use that requires removing all that overhead / indirection? > > > > I think you need to describe that very clearly to make a case. And > > maybe drop a lot of the marketing sounding overly dramatatic language > > that really does not help the case. > > the quantitative data that proves removing all overhead/indirection > gives better performance for nvmet-mem-backend against null_blk > membacked and brd. Here is the link for the comparative data :- But what is the use case? Why do you care about performance of a non-persistent DRAM-bound backend? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/1] nvmet: add basic in-memory backend support 2025-11-06 11:50 ` hch @ 2025-11-10 3:59 ` Chaitanya Kulkarni 0 siblings, 0 replies; 12+ messages in thread From: Chaitanya Kulkarni @ 2025-11-10 3:59 UTC (permalink / raw) To: hch@lst.de Cc: Hannes Reinecke, Chaitanya Kulkarni, linux-nvme@lists.infradead.org, sagi@grimberg.me Christoph, On 11/6/25 03:50, hch@lst.de wrote: > On Thu, Nov 06, 2025 at 01:20:41AM +0000, Chaitanya Kulkarni wrote: >> Hi Hannes and Christoph, >> >> >> On 11/5/25 05:14, hch@lst.de wrote: >>> But what is the use that requires removing all that overhead / indirection? >>> >>> I think you need to describe that very clearly to make a case. And >>> maybe drop a lot of the marketing sounding overly dramatatic language >>> that really does not help the case. >> the quantitative data that proves removing all overhead/indirection >> gives better performance for nvmet-mem-backend against null_blk >> membacked and brd. Here is the link for the comparative data :- > But what is the use case? Why do you care about performance of a > non-persistent DRAM-bound backend? > 1. From cover letter: Workloads requiring dynamic allocation of high-performance temporary storage: - AI/ML training scratch space for checkpointing and intermediate data - Data analytics shuffle storage - In-memory database overflow - Target resources are exported and assigned to Host VMs as backend storage. Fast ephemeral storage allows VMs to access remote DRAM via NVMeOF building remote backend storage and associated scratch space under one Subsystem with a controller in it. These need sub-millisecond namespace creation via NVMe-oF without traditional storage provisioning or infra. 2. Migration from LIO target: Deployments migrating from LIO to NVMe-oF need equivalent functionality. LIO has supported memory backend (target_core_rd) since inception for efficient host access to target DRAM resources. Applications using LIO require this when moving to NVMe-oF target. LIO target_core_rd has never been removed -> active production use. 3. Performance improvement impact on TCO calculations: Hannes's comment that brd and null_blk will continue to work is only partially correct, but at measurably lower performance. It also creates a misleading impression of the maximum performance achievable (wrong TCO calculations) when using NVMe-oF targets with DRAM: brd: randread 6-11% degradation vs nvmet-mem (avg 3) randwrite 4-8% degradation vs nvmet-mem (avg 3) null_blk: randread 10-14% degradation vs nvmet-mem (avg 3) randwrite 9% degradation vs nvmet-mem (avg 3) I don't have access to all brd/null_blk with NVMe-oF deployments, but significant improvement with reduced overhead will get improved and right TCO numbers. 4. Why we should care ? Kernel implementations should be highly optimized - something evident from all micro-optimization patches I've reviewed. As compare to that this is 10x perf improvement. Hence we should move all existing NVMeOF brd/null_blk users to native nvme-mem backend. Please let me know what else I can provide to help this. -ck ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/1] nvmet: add basic in-memory backend support 2025-11-04 10:36 ` [PATCH 0/1] " Hannes Reinecke 2025-11-05 0:09 ` Chaitanya Kulkarni @ 2025-11-06 2:54 ` Keith Busch 2025-11-06 2:58 ` Chaitanya Kulkarni 1 sibling, 1 reply; 12+ messages in thread From: Keith Busch @ 2025-11-06 2:54 UTC (permalink / raw) To: Hannes Reinecke; +Cc: Chaitanya Kulkarni, linux-nvme, hch, sagi, kch On Tue, Nov 04, 2025 at 11:36:36AM +0100, Hannes Reinecke wrote: > We already have at least two other memory-backed devices (brd, null_blk) > which should do the job just nicely. nvmet also supports file backends, so we have tmpfs as another memory-backed option. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/1] nvmet: add basic in-memory backend support 2025-11-06 2:54 ` Keith Busch @ 2025-11-06 2:58 ` Chaitanya Kulkarni 0 siblings, 0 replies; 12+ messages in thread From: Chaitanya Kulkarni @ 2025-11-06 2:58 UTC (permalink / raw) To: Keith Busch Cc: Chaitanya Kulkarni, linux-nvme@lists.infradead.org, hch@lst.de, sagi@grimberg.me, Chaitanya Kulkarni, Hannes Reinecke On 11/5/25 18:54, Keith Busch wrote: > On Tue, Nov 04, 2025 at 11:36:36AM +0100, Hannes Reinecke wrote: >> We already have at least two other memory-backed devices (brd, null_blk) >> which should do the job just nicely. > nvmet also supports file backends, so we have tmpfs as another > memory-backed option. which requires creating a file and going through file interface why would memory access need such intermediary levels which can be done directly from NVMeOF target ? Also, in-future if we want to extend or implement any NVMe specific feature then any such intermediary layers will make things hard. With nvmet-mem-backend you can easily implement that in the target. -ck ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-11-10 3:59 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-11-04 8:06 [PATCH 0/1] nvmet: add basic in-memory backend support Chaitanya Kulkarni 2025-11-04 8:06 ` [PATCH 1/1] " Chaitanya Kulkarni 2025-11-04 10:36 ` [PATCH 0/1] " Hannes Reinecke 2025-11-05 0:09 ` Chaitanya Kulkarni 2025-11-05 13:14 ` hch 2025-11-06 1:02 ` Chaitanya Kulkarni 2025-11-06 1:03 ` Chaitanya Kulkarni 2025-11-06 1:20 ` Chaitanya Kulkarni 2025-11-06 11:50 ` hch 2025-11-10 3:59 ` Chaitanya Kulkarni 2025-11-06 2:54 ` Keith Busch 2025-11-06 2:58 ` Chaitanya Kulkarni
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).