public inbox for kdevops@lists.linux.dev
 help / color / mirror / Atom feed
From: Luis Chamberlain <mcgrof@kernel.org>
To: kdevops@lists.linux.dev
Cc: Luis Chamberlain <mcgrof@kernel.org>
Subject: [PATCH 5/5] guestfs: add large IO support
Date: Wed,  6 Mar 2024 16:14:26 -0800	[thread overview]
Message-ID: <20240307001426.565390-6-mcgrof@kernel.org> (raw)
In-Reply-To: <20240307001426.565390-1-mcgrof@kernel.org>

Large IO experimentation support was added to vagrant a while ago so to
enable experimentation with LBS support on the kernel [0]. Add this support
to guestfs now that guestfs is all rave, and we plan to deprecate
vagrant support as soon as we can.

Screenshot with NVMe:

$ sudo nvme list
Node                  Generic               SN                   Model                                    Namespace Usage                      Format           FW Rev
--------------------- --------------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme9n1          /dev/ng9n1            kdevops22            QEMU NVMe Ctrl                           1         107.37  GB / 107.37  GB     16 KiB +  0 B   8.2.1
/dev/nvme8n1          /dev/ng8n1            kdevops21            QEMU NVMe Ctrl                           1         107.37  GB / 107.37  GB     16 KiB +  0 B   8.2.1
/dev/nvme7n1          /dev/ng7n1            kdevops16            QEMU NVMe Ctrl                           1         107.37  GB / 107.37  GB      4 KiB +  0 B   8.2.1
/dev/nvme6n1          /dev/ng6n1            kdevops15            QEMU NVMe Ctrl                           1         107.37  GB / 107.37  GB      4 KiB +  0 B   8.2.1
/dev/nvme5n1          /dev/ng5n1            kdevops14            QEMU NVMe Ctrl                           1         107.37  GB / 107.37  GB      4 KiB +  0 B   8.2.1
/dev/nvme4n1          /dev/ng4n1            kdevops13            QEMU NVMe Ctrl                           1         107.37  GB / 107.37  GB      4 KiB +  0 B   8.2.1
/dev/nvme3n1          /dev/ng3n1            kdevops4             QEMU NVMe Ctrl                           1         107.37  GB / 107.37  GB    512   B +  0 B   8.2.1
/dev/nvme2n1          /dev/ng2n1            kdevops3             QEMU NVMe Ctrl                           1         107.37  GB / 107.37  GB    512   B +  0 B   8.2.1
/dev/nvme16n1         /dev/ng16n1           kdevops30            QEMU NVMe Ctrl                           1         107.37  GB / 107.37  GB     64 KiB +  0 B   8.2.1
/dev/nvme15n1         /dev/ng15n1           kdevops29            QEMU NVMe Ctrl                           1         107.37  GB / 107.37  GB     64 KiB +  0 B   8.2.1
/dev/nvme14n1         /dev/ng14n1           kdevops27            QEMU NVMe Ctrl                           1         107.37  GB / 107.37  GB     32 KiB +  0 B   8.2.1
/dev/nvme13n1         /dev/ng13n1           kdevops26            QEMU NVMe Ctrl                           1         107.37  GB / 107.37  GB     32 KiB +  0 B   8.2.1
/dev/nvme12n1         /dev/ng12n1           kdevops25            QEMU NVMe Ctrl                           1         107.37  GB / 107.37  GB     32 KiB +  0 B   8.2.1
/dev/nvme11n1         /dev/ng11n1           kdevops24            QEMU NVMe Ctrl                           1         107.37  GB / 107.37  GB     16 KiB +  0 B   8.2.1
/dev/nvme10n1         /dev/ng10n1           kdevops23            QEMU NVMe Ctrl                           1         107.37  GB / 107.37  GB     16 KiB +  0 B   8.2.1
/dev/nvme1n1          /dev/ng1n1            kdevops2             QEMU NVMe Ctrl                           1         107.37  GB / 107.37  GB    512   B +  0 B   8.2.1
/dev/nvme0n1          /dev/ng0n1            kdevops1             QEMU NVMe Ctrl                           1         107.37  GB / 107.37  GB    512   B +  0 B   8.2.1

$ uname -r
6.1.0-9-amd64

To use drives with larger LBA formats we need further changes than the
LBS patches, but that's later. We have patches to get this functional up
to 1 MiB LBA format.

For virtio, we end up failing at virtio device 14:

Mar 06 19:03:55 d5 kernel: virtio_blk virtio14: virtio_blk: invalid block size: 0x4000
Mar 06 19:03:55 d5 kernel: virtio_blk: probe of virtio14 failed with error -22
Mar 06 19:03:55 d5 kernel: virtio_blk virtio15: 8/0/0 default/read/poll queues

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 playbooks/roles/gen_nodes/templates/drives.j2 | 72 +++++++++++++++++++
 .../gen_nodes/templates/guestfs_q35.j2.xml    | 24 +++++++
 scripts/bringup_guestfs.sh                    | 38 +++++++---
 3 files changed, 125 insertions(+), 9 deletions(-)

diff --git a/playbooks/roles/gen_nodes/templates/drives.j2 b/playbooks/roles/gen_nodes/templates/drives.j2
index 4878cff9..676722ae 100644
--- a/playbooks/roles/gen_nodes/templates/drives.j2
+++ b/playbooks/roles/gen_nodes/templates/drives.j2
@@ -40,6 +40,42 @@ the drives can vary by type, so we have one macro by type of drive.
 {% endfor %}
 <!-- End of virtio drives-->
 {%- endmacro -%}
+{%- macro gen_drive_large_io_virtio(libvirt_largeio_logical_compat,
+				    libvirt_largeio_logical_compat_size,
+				    libvirt_largeio_pow_limit,
+				    libvirt_largeio_drives_per_space,
+				    hostname,
+				    libvirt_extra_drive_format,
+				    libvirt_extra_storage_aio_mode,
+				    libvirt_extra_storage_aio_cache_mode,
+				    kdevops_storage_pool_path) -%}
+<!-- These are virtio drives used for large IO experimentaiton, with LBS support -->
+{% set ns = namespace(lbs_idx=1)  %}
+{% set max_pbs = libvirt_largeio_logical_compat_size  * (2 ** libvirt_largeio_pow_limit) %}
+{% for n in range(0,libvirt_largeio_pow_limit+1) %}
+{% for x in range(0,libvirt_largeio_drives_per_space) %}
+{% set ns2 = namespace(pbs=libvirt_largeio_logical_compat_size  * (2 ** n)) %}
+{% set ns3 = namespace(pbs_next_two=ns2.pbs * (2*(x-1))) %}
+{% if libvirt_largeio_logical_compat %}
+{% set ns4 = namespace(lbs=libvirt_largeio_logical_compat_size) %}
+{% else %}
+{% set ns4 = namespace(lbs=ns2.pbs) %}
+{% endif %}
+{% if (ns2.pbs == 512 or ns2.pbs == 4096 or ns2.pbs >= 16384) and (ns3.pbs_next_two <= max_pbs) %}
+    <qemu:arg value='-device'/>
+    <qemu:arg value='pcie-root-port,id=pcie-port-for-virtio-{{ ns.lbs_idx }},multifunction=on,bus=pcie.1,addr=0x{{ "%0x" | format( ns.lbs_idx | int) }},chassis={{ 50 + ns.lbs_idx  }}'/>
+    <qemu:arg value="-object"/>
+    <qemu:arg value="iothread,id=kdevops-virtio-iothread-{{ ns.lbs_idx }}"/>
+    <qemu:arg value='-drive'/>
+    <qemu:arg value='file={{ kdevops_storage_pool_path }}/guestfs/{{ hostname }}/extra{{ ns.lbs_idx }}.{{ libvirt_extra_drive_format }},format={{ libvirt_extra_drive_format }},if=none,aio={{ libvirt_extra_storage_aio_mode }},cache={{ libvirt_extra_storage_aio_cache_mode }},id=drv{{ ns.lbs_idx }}'/>
+    <qemu:arg value='-device'/>
+    <qemu:arg value="virtio-blk-pci,scsi=off,drive=drv{{ ns.lbs_idx }},id=virtio-drv{{ ns.lbs_idx }},serial=kdevops{{ ns.lbs_idx }},bus=pcie-port-for-virtio-{{ ns.lbs_idx }},addr=0x0,iothread=kdevops-virtio-iothread-{{ ns.lbs_idx }},logical_block_size={{  ns4.lbs }},physical_block_size={{ ns2.pbs }}"/>
+{% endif %}
+{% set ns.lbs_idx = ns.lbs_idx + 1 %}
+{% endfor %}
+{% endfor %}
+<!-- End of virtio drives for large IO experimentation -->
+{%- endmacro -%}
 {%- macro gen_drive_nvme(num_drives,
 			 kdevops_storage_pool_path,
 			 hostname,
@@ -61,6 +97,42 @@ the drives can vary by type, so we have one macro by type of drive.
 {% endfor %}
 <!-- End of NVMe drives-->
 {%- endmacro -%}
+{%- macro gen_drive_large_io_nvme(libvirt_largeio_logical_compat,
+				  libvirt_largeio_logical_compat_size,
+				  libvirt_largeio_pow_limit,
+				  libvirt_largeio_drives_per_space,
+				  hostname,
+				  libvirt_extra_drive_format,
+				  libvirt_extra_storage_aio_mode,
+				  libvirt_extra_storage_aio_cache_mode,
+				  kdevops_storage_pool_path) -%}
+<!-- These are NVMe drives used for large IO experimentaiton, with LBS support -->
+{% set ns = namespace(lbs_idx=1)  %}
+{% set max_pbs = libvirt_largeio_logical_compat_size  * (2 ** libvirt_largeio_pow_limit) %}
+{% for n in range(0,libvirt_largeio_pow_limit+1) %}
+{% for x in range(0,libvirt_largeio_drives_per_space) %}
+{% set ns2 = namespace(pbs=libvirt_largeio_logical_compat_size  * (2 ** n)) %}
+{% set ns3 = namespace(pbs_next_two=ns2.pbs * (2*(x-1))) %}
+{% if libvirt_largeio_logical_compat %}
+{% set ns4 = namespace(lbs=libvirt_largeio_logical_compat_size) %}
+{% else %}
+{% set ns4 = namespace(lbs=ns2.pbs) %}
+{% endif %}
+{% if (ns2.pbs == 512 or ns2.pbs == 4096 or ns2.pbs >= 16384) and (ns3.pbs_next_two <= max_pbs) %}
+    <qemu:arg value='-device'/>
+    <qemu:arg value='pcie-root-port,id=pcie-port-for-nvme-{{ ns.lbs_idx }},multifunction=on,bus=pcie.1,addr=0x{{ "%0x" | format( ns.lbs_idx | int) }},chassis={{ 50 + ns.lbs_idx }}'/>
+    <qemu:arg value='-drive'/>
+    <qemu:arg value='file={{ kdevops_storage_pool_path }}/guestfs/{{ hostname }}/extra{{ ns.lbs_idx }}.{{ libvirt_extra_drive_format }},format={{ libvirt_extra_drive_format }},if=none,id=drv{{ ns.lbs_idx }}'/>
+    <qemu:arg value='-device'/>
+    <qemu:arg value='nvme,id=nvme{{ ns.lbs_idx }},serial=kdevops{{ ns.lbs_idx }},bus=pcie-port-for-nvme-{{ ns.lbs_idx }},addr=0x0'/>
+    <qemu:arg value='-device'/>
+    <qemu:arg value='nvme-ns,drive=drv{{ ns.lbs_idx }},bus=nvme{{ ns.lbs_idx }},nsid=1,logical_block_size={{ ns4.lbs }},physical_block_size={{ ns2.pbs }}'/>
+{% endif %}
+{% set ns.lbs_idx = ns.lbs_idx + 1 %}
+{% endfor %}
+{% endfor %}
+<!-- End of NVMe drives for large IO experimentation -->
+{%- endmacro -%}
 {% macro gen_9p_mount(bootlinux_9p_driver,
 		       bootlinux_9p_fsdev,
 		       bootlinux_9p_host_path,
diff --git a/playbooks/roles/gen_nodes/templates/guestfs_q35.j2.xml b/playbooks/roles/gen_nodes/templates/guestfs_q35.j2.xml
index 16364ea2..fe8be827 100644
--- a/playbooks/roles/gen_nodes/templates/guestfs_q35.j2.xml
+++ b/playbooks/roles/gen_nodes/templates/guestfs_q35.j2.xml
@@ -186,6 +186,17 @@
 			libvirt_extra_storage_aio_mode,
 			libvirt_extra_storage_aio_cache_mode) }}
 {% elif libvirt_extra_storage_drive_virtio %}
+{% if libvirt_largeio_enable %}
+{{ drives.gen_drive_large_io_virtio(libvirt_largeio_logical_compat,
+				    libvirt_largeio_logical_compat_size,
+				    libvirt_largeio_pow_limit,
+				    libvirt_largeio_drives_per_space,
+				    hostname,
+				    libvirt_extra_drive_format,
+				    libvirt_extra_storage_aio_mode,
+				    libvirt_extra_storage_aio_cache_mode,
+				    kdevops_storage_pool_path) }}
+{% else %}
 {{ drives.gen_drive_virtio(4,
 			   kdevops_storage_pool_path,
 			   hostname,
@@ -194,7 +205,19 @@
 			   libvirt_extra_storage_aio_cache_mode,
 			   libvirt_extra_storage_virtio_logical_block_size,
 			   libvirt_extra_storage_virtio_physical_block_size) }}
+{% endif %}
 {% elif libvirt_extra_storage_drive_nvme  %}
+{% if libvirt_largeio_enable %}
+{{ drives.gen_drive_large_io_nvme(libvirt_largeio_logical_compat,
+				  libvirt_largeio_logical_compat_size,
+				  libvirt_largeio_pow_limit,
+				  libvirt_largeio_drives_per_space,
+				  hostname,
+				  libvirt_extra_drive_format,
+				  libvirt_extra_storage_aio_mode,
+				  libvirt_extra_storage_aio_cache_mode,
+				  kdevops_storage_pool_path) }}
+{% else %}
 {{ drives.gen_drive_nvme(4,
 			 kdevops_storage_pool_path,
 			 hostname,
@@ -203,6 +226,7 @@
 			 libvirt_extra_storage_aio_cache_mode,
 			 libvirt_extra_storage_nvme_logical_block_size) }}
 {% endif %}
+{% endif %}
 {% if bootlinux_9p %}
   {{ drives.gen_9p_mount(bootlinux_9p_driver,
 			 bootlinux_9p_fsdev,
diff --git a/scripts/bringup_guestfs.sh b/scripts/bringup_guestfs.sh
index 7dca84fe..6f621785 100755
--- a/scripts/bringup_guestfs.sh
+++ b/scripts/bringup_guestfs.sh
@@ -109,16 +109,36 @@ do
 	cp --reflink=auto $BASE_IMAGE $ROOTIMG
 	virt-sysprep -a $ROOTIMG --hostname $name --ssh-inject "kdevops:file:$SSH_KEY.pub"
 
-	# build some extra disks
-	for i in $(seq 0 3); do
-		diskimg="$STORAGEDIR/$name/extra${i}.${IMG_FMT}"
-		rm -f $diskimg
-		qemu-img create -f $IMG_FMT "$STORAGEDIR/$name/extra${i}.$IMG_FMT" 100G
-		if [[ "$CONFIG_LIBVIRT_URI_SYSTEM" == "y" ]]; then
-			chmod g+rw $STORAGEDIR/$name/extra${i}.$IMG_FMT
-		fi
-	done
+
+	if [[ "$CONFIG_LIBVIRT_ENABLE_LARGEIO" == "y" ]]; then
+		lbs_idx=1
+		for i in $(seq 1 $(($CONFIG_QEMU_LARGEIO_MAX_POW_LIMIT+1))); do
+			for x in $(seq 0 $CONFIG_QEMU_EXTRA_DRIVE_LARGEIO_NUM_DRIVES_PER_SPACE); do
+				diskimg="$STORAGEDIR/$name/extra${lbs_idx}.${IMG_FMT}"
+				rm -f $diskimg
+				qemu-img create -f $IMG_FMT "$diskimg" 100G
+				if [[ "$CONFIG_LIBVIRT_URI_SYSTEM" == "y" ]]; then
+					chmod g+rw $diskimg
+				fi
+				let lbs_idx=$lbs_idx+1
+			done
+		done
+	else
+		# build some extra disks
+		for i in $(seq 0 3); do
+			diskimg="$STORAGEDIR/$name/extra${i}.${IMG_FMT}"
+			rm -f $diskimg
+			qemu-img create -f $IMG_FMT "$STORAGEDIR/$name/extra${i}.$IMG_FMT" 100G
+			if [[ "$CONFIG_LIBVIRT_URI_SYSTEM" == "y" ]]; then
+				chmod g+rw $STORAGEDIR/$name/extra${i}.$IMG_FMT
+			fi
+		done
+	fi
 
 	virsh define $GUESTFSDIR/$name/$name.xml
 	virsh start $name
+	if [[ $? -ne 0 ]]; then
+		echo "Failed to start $name"
+		exit 1
+	fi
 done
-- 
2.43.0


  parent reply	other threads:[~2024-03-07  0:14 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-07  0:14 [PATCH 0/5] guestfs: start moving guest data to macros Luis Chamberlain
2024-03-07  0:14 ` [PATCH 1/5] bringup_guestfs.sh: use bash as the default shell Luis Chamberlain
2024-03-07  0:14 ` [PATCH 2/5] guestfs_q35: use hex for pci addr Luis Chamberlain
2024-03-07  0:14 ` [PATCH 3/5] guestfs_q35: use libvirt_extra_storage_nvme_logical_block_size Luis Chamberlain
2024-03-07  0:14 ` [PATCH 4/5] gen_nodes: move drive generation for guestfs to macros Luis Chamberlain
2024-03-07  0:14 ` Luis Chamberlain [this message]
2024-03-07 18:43 ` [PATCH 0/5] guestfs: start moving guest data " Luis Chamberlain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240307001426.565390-6-mcgrof@kernel.org \
    --to=mcgrof@kernel.org \
    --cc=kdevops@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox