All of lore.kernel.org
 help / color / mirror / Atom feed
From: Luis Chamberlain <mcgrof@kernel.org>
To: kdevops@lists.linux.dev
Cc: Luis Chamberlain <mcgrof@kernel.org>
Subject: [PATCH 5/5] guestfs: add large IO support
Date: Wed,  6 Mar 2024 16:14:26 -0800	[thread overview]
Message-ID: <20240307001426.565390-6-mcgrof@kernel.org> (raw)
In-Reply-To: <20240307001426.565390-1-mcgrof@kernel.org>

Large IO experimentation support was added to vagrant a while ago so to
enable experimentation with LBS support on the kernel [0]. Add this support
to guestfs now that guestfs is all rave, and we plan to deprecate
vagrant support as soon as we can.

Screenshot with NVMe:

$ sudo nvme list
Node                  Generic               SN                   Model                                    Namespace Usage                      Format           FW Rev
--------------------- --------------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme9n1          /dev/ng9n1            kdevops22            QEMU NVMe Ctrl                           1         107.37  GB / 107.37  GB     16 KiB +  0 B   8.2.1
/dev/nvme8n1          /dev/ng8n1            kdevops21            QEMU NVMe Ctrl                           1         107.37  GB / 107.37  GB     16 KiB +  0 B   8.2.1
/dev/nvme7n1          /dev/ng7n1            kdevops16            QEMU NVMe Ctrl                           1         107.37  GB / 107.37  GB      4 KiB +  0 B   8.2.1
/dev/nvme6n1          /dev/ng6n1            kdevops15            QEMU NVMe Ctrl                           1         107.37  GB / 107.37  GB      4 KiB +  0 B   8.2.1
/dev/nvme5n1          /dev/ng5n1            kdevops14            QEMU NVMe Ctrl                           1         107.37  GB / 107.37  GB      4 KiB +  0 B   8.2.1
/dev/nvme4n1          /dev/ng4n1            kdevops13            QEMU NVMe Ctrl                           1         107.37  GB / 107.37  GB      4 KiB +  0 B   8.2.1
/dev/nvme3n1          /dev/ng3n1            kdevops4             QEMU NVMe Ctrl                           1         107.37  GB / 107.37  GB    512   B +  0 B   8.2.1
/dev/nvme2n1          /dev/ng2n1            kdevops3             QEMU NVMe Ctrl                           1         107.37  GB / 107.37  GB    512   B +  0 B   8.2.1
/dev/nvme16n1         /dev/ng16n1           kdevops30            QEMU NVMe Ctrl                           1         107.37  GB / 107.37  GB     64 KiB +  0 B   8.2.1
/dev/nvme15n1         /dev/ng15n1           kdevops29            QEMU NVMe Ctrl                           1         107.37  GB / 107.37  GB     64 KiB +  0 B   8.2.1
/dev/nvme14n1         /dev/ng14n1           kdevops27            QEMU NVMe Ctrl                           1         107.37  GB / 107.37  GB     32 KiB +  0 B   8.2.1
/dev/nvme13n1         /dev/ng13n1           kdevops26            QEMU NVMe Ctrl                           1         107.37  GB / 107.37  GB     32 KiB +  0 B   8.2.1
/dev/nvme12n1         /dev/ng12n1           kdevops25            QEMU NVMe Ctrl                           1         107.37  GB / 107.37  GB     32 KiB +  0 B   8.2.1
/dev/nvme11n1         /dev/ng11n1           kdevops24            QEMU NVMe Ctrl                           1         107.37  GB / 107.37  GB     16 KiB +  0 B   8.2.1
/dev/nvme10n1         /dev/ng10n1           kdevops23            QEMU NVMe Ctrl                           1         107.37  GB / 107.37  GB     16 KiB +  0 B   8.2.1
/dev/nvme1n1          /dev/ng1n1            kdevops2             QEMU NVMe Ctrl                           1         107.37  GB / 107.37  GB    512   B +  0 B   8.2.1
/dev/nvme0n1          /dev/ng0n1            kdevops1             QEMU NVMe Ctrl                           1         107.37  GB / 107.37  GB    512   B +  0 B   8.2.1

$ uname -r
6.1.0-9-amd64

To use drives with larger LBA formats we need further changes than the
LBS patches, but that's later. We have patches to get this functional up
to 1 MiB LBA format.

For virtio, we end up failing at virtio device 14:

Mar 06 19:03:55 d5 kernel: virtio_blk virtio14: virtio_blk: invalid block size: 0x4000
Mar 06 19:03:55 d5 kernel: virtio_blk: probe of virtio14 failed with error -22
Mar 06 19:03:55 d5 kernel: virtio_blk virtio15: 8/0/0 default/read/poll queues

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 playbooks/roles/gen_nodes/templates/drives.j2 | 72 +++++++++++++++++++
 .../gen_nodes/templates/guestfs_q35.j2.xml    | 24 +++++++
 scripts/bringup_guestfs.sh                    | 38 +++++++---
 3 files changed, 125 insertions(+), 9 deletions(-)

diff --git a/playbooks/roles/gen_nodes/templates/drives.j2 b/playbooks/roles/gen_nodes/templates/drives.j2
index 4878cff9..676722ae 100644
--- a/playbooks/roles/gen_nodes/templates/drives.j2
+++ b/playbooks/roles/gen_nodes/templates/drives.j2
@@ -40,6 +40,42 @@ the drives can vary by type, so we have one macro by type of drive.
 {% endfor %}
 <!-- End of virtio drives-->
 {%- endmacro -%}
+{%- macro gen_drive_large_io_virtio(libvirt_largeio_logical_compat,
+				    libvirt_largeio_logical_compat_size,
+				    libvirt_largeio_pow_limit,
+				    libvirt_largeio_drives_per_space,
+				    hostname,
+				    libvirt_extra_drive_format,
+				    libvirt_extra_storage_aio_mode,
+				    libvirt_extra_storage_aio_cache_mode,
+				    kdevops_storage_pool_path) -%}
+<!-- These are virtio drives used for large IO experimentaiton, with LBS support -->
+{% set ns = namespace(lbs_idx=1)  %}
+{% set max_pbs = libvirt_largeio_logical_compat_size  * (2 ** libvirt_largeio_pow_limit) %}
+{% for n in range(0,libvirt_largeio_pow_limit+1) %}
+{% for x in range(0,libvirt_largeio_drives_per_space) %}
+{% set ns2 = namespace(pbs=libvirt_largeio_logical_compat_size  * (2 ** n)) %}
+{% set ns3 = namespace(pbs_next_two=ns2.pbs * (2*(x-1))) %}
+{% if libvirt_largeio_logical_compat %}
+{% set ns4 = namespace(lbs=libvirt_largeio_logical_compat_size) %}
+{% else %}
+{% set ns4 = namespace(lbs=ns2.pbs) %}
+{% endif %}
+{% if (ns2.pbs == 512 or ns2.pbs == 4096 or ns2.pbs >= 16384) and (ns3.pbs_next_two <= max_pbs) %}
+    <qemu:arg value='-device'/>
+    <qemu:arg value='pcie-root-port,id=pcie-port-for-virtio-{{ ns.lbs_idx }},multifunction=on,bus=pcie.1,addr=0x{{ "%0x" | format( ns.lbs_idx | int) }},chassis={{ 50 + ns.lbs_idx  }}'/>
+    <qemu:arg value="-object"/>
+    <qemu:arg value="iothread,id=kdevops-virtio-iothread-{{ ns.lbs_idx }}"/>
+    <qemu:arg value='-drive'/>
+    <qemu:arg value='file={{ kdevops_storage_pool_path }}/guestfs/{{ hostname }}/extra{{ ns.lbs_idx }}.{{ libvirt_extra_drive_format }},format={{ libvirt_extra_drive_format }},if=none,aio={{ libvirt_extra_storage_aio_mode }},cache={{ libvirt_extra_storage_aio_cache_mode }},id=drv{{ ns.lbs_idx }}'/>
+    <qemu:arg value='-device'/>
+    <qemu:arg value="virtio-blk-pci,scsi=off,drive=drv{{ ns.lbs_idx }},id=virtio-drv{{ ns.lbs_idx }},serial=kdevops{{ ns.lbs_idx }},bus=pcie-port-for-virtio-{{ ns.lbs_idx }},addr=0x0,iothread=kdevops-virtio-iothread-{{ ns.lbs_idx }},logical_block_size={{  ns4.lbs }},physical_block_size={{ ns2.pbs }}"/>
+{% endif %}
+{% set ns.lbs_idx = ns.lbs_idx + 1 %}
+{% endfor %}
+{% endfor %}
+<!-- End of virtio drives for large IO experimentation -->
+{%- endmacro -%}
 {%- macro gen_drive_nvme(num_drives,
 			 kdevops_storage_pool_path,
 			 hostname,
@@ -61,6 +97,42 @@ the drives can vary by type, so we have one macro by type of drive.
 {% endfor %}
 <!-- End of NVMe drives-->
 {%- endmacro -%}
+{%- macro gen_drive_large_io_nvme(libvirt_largeio_logical_compat,
+				  libvirt_largeio_logical_compat_size,
+				  libvirt_largeio_pow_limit,
+				  libvirt_largeio_drives_per_space,
+				  hostname,
+				  libvirt_extra_drive_format,
+				  libvirt_extra_storage_aio_mode,
+				  libvirt_extra_storage_aio_cache_mode,
+				  kdevops_storage_pool_path) -%}
+<!-- These are NVMe drives used for large IO experimentaiton, with LBS support -->
+{% set ns = namespace(lbs_idx=1)  %}
+{% set max_pbs = libvirt_largeio_logical_compat_size  * (2 ** libvirt_largeio_pow_limit) %}
+{% for n in range(0,libvirt_largeio_pow_limit+1) %}
+{% for x in range(0,libvirt_largeio_drives_per_space) %}
+{% set ns2 = namespace(pbs=libvirt_largeio_logical_compat_size  * (2 ** n)) %}
+{% set ns3 = namespace(pbs_next_two=ns2.pbs * (2*(x-1))) %}
+{% if libvirt_largeio_logical_compat %}
+{% set ns4 = namespace(lbs=libvirt_largeio_logical_compat_size) %}
+{% else %}
+{% set ns4 = namespace(lbs=ns2.pbs) %}
+{% endif %}
+{% if (ns2.pbs == 512 or ns2.pbs == 4096 or ns2.pbs >= 16384) and (ns3.pbs_next_two <= max_pbs) %}
+    <qemu:arg value='-device'/>
+    <qemu:arg value='pcie-root-port,id=pcie-port-for-nvme-{{ ns.lbs_idx }},multifunction=on,bus=pcie.1,addr=0x{{ "%0x" | format( ns.lbs_idx | int) }},chassis={{ 50 + ns.lbs_idx }}'/>
+    <qemu:arg value='-drive'/>
+    <qemu:arg value='file={{ kdevops_storage_pool_path }}/guestfs/{{ hostname }}/extra{{ ns.lbs_idx }}.{{ libvirt_extra_drive_format }},format={{ libvirt_extra_drive_format }},if=none,id=drv{{ ns.lbs_idx }}'/>
+    <qemu:arg value='-device'/>
+    <qemu:arg value='nvme,id=nvme{{ ns.lbs_idx }},serial=kdevops{{ ns.lbs_idx }},bus=pcie-port-for-nvme-{{ ns.lbs_idx }},addr=0x0'/>
+    <qemu:arg value='-device'/>
+    <qemu:arg value='nvme-ns,drive=drv{{ ns.lbs_idx }},bus=nvme{{ ns.lbs_idx }},nsid=1,logical_block_size={{ ns4.lbs }},physical_block_size={{ ns2.pbs }}'/>
+{% endif %}
+{% set ns.lbs_idx = ns.lbs_idx + 1 %}
+{% endfor %}
+{% endfor %}
+<!-- End of NVMe drives for large IO experimentation -->
+{%- endmacro -%}
 {% macro gen_9p_mount(bootlinux_9p_driver,
 		       bootlinux_9p_fsdev,
 		       bootlinux_9p_host_path,
diff --git a/playbooks/roles/gen_nodes/templates/guestfs_q35.j2.xml b/playbooks/roles/gen_nodes/templates/guestfs_q35.j2.xml
index 16364ea2..fe8be827 100644
--- a/playbooks/roles/gen_nodes/templates/guestfs_q35.j2.xml
+++ b/playbooks/roles/gen_nodes/templates/guestfs_q35.j2.xml
@@ -186,6 +186,17 @@
 			libvirt_extra_storage_aio_mode,
 			libvirt_extra_storage_aio_cache_mode) }}
 {% elif libvirt_extra_storage_drive_virtio %}
+{% if libvirt_largeio_enable %}
+{{ drives.gen_drive_large_io_virtio(libvirt_largeio_logical_compat,
+				    libvirt_largeio_logical_compat_size,
+				    libvirt_largeio_pow_limit,
+				    libvirt_largeio_drives_per_space,
+				    hostname,
+				    libvirt_extra_drive_format,
+				    libvirt_extra_storage_aio_mode,
+				    libvirt_extra_storage_aio_cache_mode,
+				    kdevops_storage_pool_path) }}
+{% else %}
 {{ drives.gen_drive_virtio(4,
 			   kdevops_storage_pool_path,
 			   hostname,
@@ -194,7 +205,19 @@
 			   libvirt_extra_storage_aio_cache_mode,
 			   libvirt_extra_storage_virtio_logical_block_size,
 			   libvirt_extra_storage_virtio_physical_block_size) }}
+{% endif %}
 {% elif libvirt_extra_storage_drive_nvme  %}
+{% if libvirt_largeio_enable %}
+{{ drives.gen_drive_large_io_nvme(libvirt_largeio_logical_compat,
+				  libvirt_largeio_logical_compat_size,
+				  libvirt_largeio_pow_limit,
+				  libvirt_largeio_drives_per_space,
+				  hostname,
+				  libvirt_extra_drive_format,
+				  libvirt_extra_storage_aio_mode,
+				  libvirt_extra_storage_aio_cache_mode,
+				  kdevops_storage_pool_path) }}
+{% else %}
 {{ drives.gen_drive_nvme(4,
 			 kdevops_storage_pool_path,
 			 hostname,
@@ -203,6 +226,7 @@
 			 libvirt_extra_storage_aio_cache_mode,
 			 libvirt_extra_storage_nvme_logical_block_size) }}
 {% endif %}
+{% endif %}
 {% if bootlinux_9p %}
   {{ drives.gen_9p_mount(bootlinux_9p_driver,
 			 bootlinux_9p_fsdev,
diff --git a/scripts/bringup_guestfs.sh b/scripts/bringup_guestfs.sh
index 7dca84fe..6f621785 100755
--- a/scripts/bringup_guestfs.sh
+++ b/scripts/bringup_guestfs.sh
@@ -109,16 +109,36 @@ do
 	cp --reflink=auto $BASE_IMAGE $ROOTIMG
 	virt-sysprep -a $ROOTIMG --hostname $name --ssh-inject "kdevops:file:$SSH_KEY.pub"
 
-	# build some extra disks
-	for i in $(seq 0 3); do
-		diskimg="$STORAGEDIR/$name/extra${i}.${IMG_FMT}"
-		rm -f $diskimg
-		qemu-img create -f $IMG_FMT "$STORAGEDIR/$name/extra${i}.$IMG_FMT" 100G
-		if [[ "$CONFIG_LIBVIRT_URI_SYSTEM" == "y" ]]; then
-			chmod g+rw $STORAGEDIR/$name/extra${i}.$IMG_FMT
-		fi
-	done
+
+	if [[ "$CONFIG_LIBVIRT_ENABLE_LARGEIO" == "y" ]]; then
+		lbs_idx=1
+		for i in $(seq 1 $(($CONFIG_QEMU_LARGEIO_MAX_POW_LIMIT+1))); do
+			for x in $(seq 0 $CONFIG_QEMU_EXTRA_DRIVE_LARGEIO_NUM_DRIVES_PER_SPACE); do
+				diskimg="$STORAGEDIR/$name/extra${lbs_idx}.${IMG_FMT}"
+				rm -f $diskimg
+				qemu-img create -f $IMG_FMT "$diskimg" 100G
+				if [[ "$CONFIG_LIBVIRT_URI_SYSTEM" == "y" ]]; then
+					chmod g+rw $diskimg
+				fi
+				let lbs_idx=$lbs_idx+1
+			done
+		done
+	else
+		# build some extra disks
+		for i in $(seq 0 3); do
+			diskimg="$STORAGEDIR/$name/extra${i}.${IMG_FMT}"
+			rm -f $diskimg
+			qemu-img create -f $IMG_FMT "$STORAGEDIR/$name/extra${i}.$IMG_FMT" 100G
+			if [[ "$CONFIG_LIBVIRT_URI_SYSTEM" == "y" ]]; then
+				chmod g+rw $STORAGEDIR/$name/extra${i}.$IMG_FMT
+			fi
+		done
+	fi
 
 	virsh define $GUESTFSDIR/$name/$name.xml
 	virsh start $name
+	if [[ $? -ne 0 ]]; then
+		echo "Failed to start $name"
+		exit 1
+	fi
 done
-- 
2.43.0


  parent reply	other threads:[~2024-03-07  0:14 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-07  0:14 [PATCH 0/5] guestfs: start moving guest data to macros Luis Chamberlain
2024-03-07  0:14 ` [PATCH 1/5] bringup_guestfs.sh: use bash as the default shell Luis Chamberlain
2024-03-07  0:14 ` [PATCH 2/5] guestfs_q35: use hex for pci addr Luis Chamberlain
2024-03-07  0:14 ` [PATCH 3/5] guestfs_q35: use libvirt_extra_storage_nvme_logical_block_size Luis Chamberlain
2024-03-07  0:14 ` [PATCH 4/5] gen_nodes: move drive generation for guestfs to macros Luis Chamberlain
2024-03-07  0:14 ` Luis Chamberlain [this message]
2024-03-07 18:43 ` [PATCH 0/5] guestfs: start moving guest data " Luis Chamberlain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240307001426.565390-6-mcgrof@kernel.org \
    --to=mcgrof@kernel.org \
    --cc=kdevops@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.