From: Luis Chamberlain <mcgrof@kernel.org>
To: kdevops@lists.linux.dev
Cc: Luis Chamberlain <mcgrof@kernel.org>
Subject: [PATCH 5/5] guestfs: add large IO support
Date: Wed, 6 Mar 2024 16:14:26 -0800 [thread overview]
Message-ID: <20240307001426.565390-6-mcgrof@kernel.org> (raw)
In-Reply-To: <20240307001426.565390-1-mcgrof@kernel.org>
Large IO experimentation support was added to vagrant a while ago so to
enable experimentation with LBS support on the kernel [0]. Add this support
to guestfs now that guestfs is all rave, and we plan to deprecate
vagrant support as soon as we can.
Screenshot with NVMe:
$ sudo nvme list
Node Generic SN Model Namespace Usage Format FW Rev
--------------------- --------------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme9n1 /dev/ng9n1 kdevops22 QEMU NVMe Ctrl 1 107.37 GB / 107.37 GB 16 KiB + 0 B 8.2.1
/dev/nvme8n1 /dev/ng8n1 kdevops21 QEMU NVMe Ctrl 1 107.37 GB / 107.37 GB 16 KiB + 0 B 8.2.1
/dev/nvme7n1 /dev/ng7n1 kdevops16 QEMU NVMe Ctrl 1 107.37 GB / 107.37 GB 4 KiB + 0 B 8.2.1
/dev/nvme6n1 /dev/ng6n1 kdevops15 QEMU NVMe Ctrl 1 107.37 GB / 107.37 GB 4 KiB + 0 B 8.2.1
/dev/nvme5n1 /dev/ng5n1 kdevops14 QEMU NVMe Ctrl 1 107.37 GB / 107.37 GB 4 KiB + 0 B 8.2.1
/dev/nvme4n1 /dev/ng4n1 kdevops13 QEMU NVMe Ctrl 1 107.37 GB / 107.37 GB 4 KiB + 0 B 8.2.1
/dev/nvme3n1 /dev/ng3n1 kdevops4 QEMU NVMe Ctrl 1 107.37 GB / 107.37 GB 512 B + 0 B 8.2.1
/dev/nvme2n1 /dev/ng2n1 kdevops3 QEMU NVMe Ctrl 1 107.37 GB / 107.37 GB 512 B + 0 B 8.2.1
/dev/nvme16n1 /dev/ng16n1 kdevops30 QEMU NVMe Ctrl 1 107.37 GB / 107.37 GB 64 KiB + 0 B 8.2.1
/dev/nvme15n1 /dev/ng15n1 kdevops29 QEMU NVMe Ctrl 1 107.37 GB / 107.37 GB 64 KiB + 0 B 8.2.1
/dev/nvme14n1 /dev/ng14n1 kdevops27 QEMU NVMe Ctrl 1 107.37 GB / 107.37 GB 32 KiB + 0 B 8.2.1
/dev/nvme13n1 /dev/ng13n1 kdevops26 QEMU NVMe Ctrl 1 107.37 GB / 107.37 GB 32 KiB + 0 B 8.2.1
/dev/nvme12n1 /dev/ng12n1 kdevops25 QEMU NVMe Ctrl 1 107.37 GB / 107.37 GB 32 KiB + 0 B 8.2.1
/dev/nvme11n1 /dev/ng11n1 kdevops24 QEMU NVMe Ctrl 1 107.37 GB / 107.37 GB 16 KiB + 0 B 8.2.1
/dev/nvme10n1 /dev/ng10n1 kdevops23 QEMU NVMe Ctrl 1 107.37 GB / 107.37 GB 16 KiB + 0 B 8.2.1
/dev/nvme1n1 /dev/ng1n1 kdevops2 QEMU NVMe Ctrl 1 107.37 GB / 107.37 GB 512 B + 0 B 8.2.1
/dev/nvme0n1 /dev/ng0n1 kdevops1 QEMU NVMe Ctrl 1 107.37 GB / 107.37 GB 512 B + 0 B 8.2.1
$ uname -r
6.1.0-9-amd64
To use drives with larger LBA formats we need further changes than the
LBS patches, but that's later. We have patches to get this functional up
to 1 MiB LBA format.
For virtio, we end up failing at virtio device 14:
Mar 06 19:03:55 d5 kernel: virtio_blk virtio14: virtio_blk: invalid block size: 0x4000
Mar 06 19:03:55 d5 kernel: virtio_blk: probe of virtio14 failed with error -22
Mar 06 19:03:55 d5 kernel: virtio_blk virtio15: 8/0/0 default/read/poll queues
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
playbooks/roles/gen_nodes/templates/drives.j2 | 72 +++++++++++++++++++
.../gen_nodes/templates/guestfs_q35.j2.xml | 24 +++++++
scripts/bringup_guestfs.sh | 38 +++++++---
3 files changed, 125 insertions(+), 9 deletions(-)
diff --git a/playbooks/roles/gen_nodes/templates/drives.j2 b/playbooks/roles/gen_nodes/templates/drives.j2
index 4878cff9..676722ae 100644
--- a/playbooks/roles/gen_nodes/templates/drives.j2
+++ b/playbooks/roles/gen_nodes/templates/drives.j2
@@ -40,6 +40,42 @@ the drives can vary by type, so we have one macro by type of drive.
{% endfor %}
<!-- End of virtio drives-->
{%- endmacro -%}
+{%- macro gen_drive_large_io_virtio(libvirt_largeio_logical_compat,
+ libvirt_largeio_logical_compat_size,
+ libvirt_largeio_pow_limit,
+ libvirt_largeio_drives_per_space,
+ hostname,
+ libvirt_extra_drive_format,
+ libvirt_extra_storage_aio_mode,
+ libvirt_extra_storage_aio_cache_mode,
+ kdevops_storage_pool_path) -%}
+<!-- These are virtio drives used for large IO experimentaiton, with LBS support -->
+{% set ns = namespace(lbs_idx=1) %}
+{% set max_pbs = libvirt_largeio_logical_compat_size * (2 ** libvirt_largeio_pow_limit) %}
+{% for n in range(0,libvirt_largeio_pow_limit+1) %}
+{% for x in range(0,libvirt_largeio_drives_per_space) %}
+{% set ns2 = namespace(pbs=libvirt_largeio_logical_compat_size * (2 ** n)) %}
+{% set ns3 = namespace(pbs_next_two=ns2.pbs * (2*(x-1))) %}
+{% if libvirt_largeio_logical_compat %}
+{% set ns4 = namespace(lbs=libvirt_largeio_logical_compat_size) %}
+{% else %}
+{% set ns4 = namespace(lbs=ns2.pbs) %}
+{% endif %}
+{% if (ns2.pbs == 512 or ns2.pbs == 4096 or ns2.pbs >= 16384) and (ns3.pbs_next_two <= max_pbs) %}
+ <qemu:arg value='-device'/>
+ <qemu:arg value='pcie-root-port,id=pcie-port-for-virtio-{{ ns.lbs_idx }},multifunction=on,bus=pcie.1,addr=0x{{ "%0x" | format( ns.lbs_idx | int) }},chassis={{ 50 + ns.lbs_idx }}'/>
+ <qemu:arg value="-object"/>
+ <qemu:arg value="iothread,id=kdevops-virtio-iothread-{{ ns.lbs_idx }}"/>
+ <qemu:arg value='-drive'/>
+ <qemu:arg value='file={{ kdevops_storage_pool_path }}/guestfs/{{ hostname }}/extra{{ ns.lbs_idx }}.{{ libvirt_extra_drive_format }},format={{ libvirt_extra_drive_format }},if=none,aio={{ libvirt_extra_storage_aio_mode }},cache={{ libvirt_extra_storage_aio_cache_mode }},id=drv{{ ns.lbs_idx }}'/>
+ <qemu:arg value='-device'/>
+ <qemu:arg value="virtio-blk-pci,scsi=off,drive=drv{{ ns.lbs_idx }},id=virtio-drv{{ ns.lbs_idx }},serial=kdevops{{ ns.lbs_idx }},bus=pcie-port-for-virtio-{{ ns.lbs_idx }},addr=0x0,iothread=kdevops-virtio-iothread-{{ ns.lbs_idx }},logical_block_size={{ ns4.lbs }},physical_block_size={{ ns2.pbs }}"/>
+{% endif %}
+{% set ns.lbs_idx = ns.lbs_idx + 1 %}
+{% endfor %}
+{% endfor %}
+<!-- End of virtio drives for large IO experimentation -->
+{%- endmacro -%}
{%- macro gen_drive_nvme(num_drives,
kdevops_storage_pool_path,
hostname,
@@ -61,6 +97,42 @@ the drives can vary by type, so we have one macro by type of drive.
{% endfor %}
<!-- End of NVMe drives-->
{%- endmacro -%}
+{%- macro gen_drive_large_io_nvme(libvirt_largeio_logical_compat,
+ libvirt_largeio_logical_compat_size,
+ libvirt_largeio_pow_limit,
+ libvirt_largeio_drives_per_space,
+ hostname,
+ libvirt_extra_drive_format,
+ libvirt_extra_storage_aio_mode,
+ libvirt_extra_storage_aio_cache_mode,
+ kdevops_storage_pool_path) -%}
+<!-- These are NVMe drives used for large IO experimentaiton, with LBS support -->
+{% set ns = namespace(lbs_idx=1) %}
+{% set max_pbs = libvirt_largeio_logical_compat_size * (2 ** libvirt_largeio_pow_limit) %}
+{% for n in range(0,libvirt_largeio_pow_limit+1) %}
+{% for x in range(0,libvirt_largeio_drives_per_space) %}
+{% set ns2 = namespace(pbs=libvirt_largeio_logical_compat_size * (2 ** n)) %}
+{% set ns3 = namespace(pbs_next_two=ns2.pbs * (2*(x-1))) %}
+{% if libvirt_largeio_logical_compat %}
+{% set ns4 = namespace(lbs=libvirt_largeio_logical_compat_size) %}
+{% else %}
+{% set ns4 = namespace(lbs=ns2.pbs) %}
+{% endif %}
+{% if (ns2.pbs == 512 or ns2.pbs == 4096 or ns2.pbs >= 16384) and (ns3.pbs_next_two <= max_pbs) %}
+ <qemu:arg value='-device'/>
+ <qemu:arg value='pcie-root-port,id=pcie-port-for-nvme-{{ ns.lbs_idx }},multifunction=on,bus=pcie.1,addr=0x{{ "%0x" | format( ns.lbs_idx | int) }},chassis={{ 50 + ns.lbs_idx }}'/>
+ <qemu:arg value='-drive'/>
+ <qemu:arg value='file={{ kdevops_storage_pool_path }}/guestfs/{{ hostname }}/extra{{ ns.lbs_idx }}.{{ libvirt_extra_drive_format }},format={{ libvirt_extra_drive_format }},if=none,id=drv{{ ns.lbs_idx }}'/>
+ <qemu:arg value='-device'/>
+ <qemu:arg value='nvme,id=nvme{{ ns.lbs_idx }},serial=kdevops{{ ns.lbs_idx }},bus=pcie-port-for-nvme-{{ ns.lbs_idx }},addr=0x0'/>
+ <qemu:arg value='-device'/>
+ <qemu:arg value='nvme-ns,drive=drv{{ ns.lbs_idx }},bus=nvme{{ ns.lbs_idx }},nsid=1,logical_block_size={{ ns4.lbs }},physical_block_size={{ ns2.pbs }}'/>
+{% endif %}
+{% set ns.lbs_idx = ns.lbs_idx + 1 %}
+{% endfor %}
+{% endfor %}
+<!-- End of NVMe drives for large IO experimentation -->
+{%- endmacro -%}
{% macro gen_9p_mount(bootlinux_9p_driver,
bootlinux_9p_fsdev,
bootlinux_9p_host_path,
diff --git a/playbooks/roles/gen_nodes/templates/guestfs_q35.j2.xml b/playbooks/roles/gen_nodes/templates/guestfs_q35.j2.xml
index 16364ea2..fe8be827 100644
--- a/playbooks/roles/gen_nodes/templates/guestfs_q35.j2.xml
+++ b/playbooks/roles/gen_nodes/templates/guestfs_q35.j2.xml
@@ -186,6 +186,17 @@
libvirt_extra_storage_aio_mode,
libvirt_extra_storage_aio_cache_mode) }}
{% elif libvirt_extra_storage_drive_virtio %}
+{% if libvirt_largeio_enable %}
+{{ drives.gen_drive_large_io_virtio(libvirt_largeio_logical_compat,
+ libvirt_largeio_logical_compat_size,
+ libvirt_largeio_pow_limit,
+ libvirt_largeio_drives_per_space,
+ hostname,
+ libvirt_extra_drive_format,
+ libvirt_extra_storage_aio_mode,
+ libvirt_extra_storage_aio_cache_mode,
+ kdevops_storage_pool_path) }}
+{% else %}
{{ drives.gen_drive_virtio(4,
kdevops_storage_pool_path,
hostname,
@@ -194,7 +205,19 @@
libvirt_extra_storage_aio_cache_mode,
libvirt_extra_storage_virtio_logical_block_size,
libvirt_extra_storage_virtio_physical_block_size) }}
+{% endif %}
{% elif libvirt_extra_storage_drive_nvme %}
+{% if libvirt_largeio_enable %}
+{{ drives.gen_drive_large_io_nvme(libvirt_largeio_logical_compat,
+ libvirt_largeio_logical_compat_size,
+ libvirt_largeio_pow_limit,
+ libvirt_largeio_drives_per_space,
+ hostname,
+ libvirt_extra_drive_format,
+ libvirt_extra_storage_aio_mode,
+ libvirt_extra_storage_aio_cache_mode,
+ kdevops_storage_pool_path) }}
+{% else %}
{{ drives.gen_drive_nvme(4,
kdevops_storage_pool_path,
hostname,
@@ -203,6 +226,7 @@
libvirt_extra_storage_aio_cache_mode,
libvirt_extra_storage_nvme_logical_block_size) }}
{% endif %}
+{% endif %}
{% if bootlinux_9p %}
{{ drives.gen_9p_mount(bootlinux_9p_driver,
bootlinux_9p_fsdev,
diff --git a/scripts/bringup_guestfs.sh b/scripts/bringup_guestfs.sh
index 7dca84fe..6f621785 100755
--- a/scripts/bringup_guestfs.sh
+++ b/scripts/bringup_guestfs.sh
@@ -109,16 +109,36 @@ do
cp --reflink=auto $BASE_IMAGE $ROOTIMG
virt-sysprep -a $ROOTIMG --hostname $name --ssh-inject "kdevops:file:$SSH_KEY.pub"
- # build some extra disks
- for i in $(seq 0 3); do
- diskimg="$STORAGEDIR/$name/extra${i}.${IMG_FMT}"
- rm -f $diskimg
- qemu-img create -f $IMG_FMT "$STORAGEDIR/$name/extra${i}.$IMG_FMT" 100G
- if [[ "$CONFIG_LIBVIRT_URI_SYSTEM" == "y" ]]; then
- chmod g+rw $STORAGEDIR/$name/extra${i}.$IMG_FMT
- fi
- done
+
+ if [[ "$CONFIG_LIBVIRT_ENABLE_LARGEIO" == "y" ]]; then
+ lbs_idx=1
+ for i in $(seq 1 $(($CONFIG_QEMU_LARGEIO_MAX_POW_LIMIT+1))); do
+ for x in $(seq 0 $CONFIG_QEMU_EXTRA_DRIVE_LARGEIO_NUM_DRIVES_PER_SPACE); do
+ diskimg="$STORAGEDIR/$name/extra${lbs_idx}.${IMG_FMT}"
+ rm -f $diskimg
+ qemu-img create -f $IMG_FMT "$diskimg" 100G
+ if [[ "$CONFIG_LIBVIRT_URI_SYSTEM" == "y" ]]; then
+ chmod g+rw $diskimg
+ fi
+ let lbs_idx=$lbs_idx+1
+ done
+ done
+ else
+ # build some extra disks
+ for i in $(seq 0 3); do
+ diskimg="$STORAGEDIR/$name/extra${i}.${IMG_FMT}"
+ rm -f $diskimg
+ qemu-img create -f $IMG_FMT "$STORAGEDIR/$name/extra${i}.$IMG_FMT" 100G
+ if [[ "$CONFIG_LIBVIRT_URI_SYSTEM" == "y" ]]; then
+ chmod g+rw $STORAGEDIR/$name/extra${i}.$IMG_FMT
+ fi
+ done
+ fi
virsh define $GUESTFSDIR/$name/$name.xml
virsh start $name
+ if [[ $? -ne 0 ]]; then
+ echo "Failed to start $name"
+ exit 1
+ fi
done
--
2.43.0
next prev parent reply other threads:[~2024-03-07 0:14 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-07 0:14 [PATCH 0/5] guestfs: start moving guest data to macros Luis Chamberlain
2024-03-07 0:14 ` [PATCH 1/5] bringup_guestfs.sh: use bash as the default shell Luis Chamberlain
2024-03-07 0:14 ` [PATCH 2/5] guestfs_q35: use hex for pci addr Luis Chamberlain
2024-03-07 0:14 ` [PATCH 3/5] guestfs_q35: use libvirt_extra_storage_nvme_logical_block_size Luis Chamberlain
2024-03-07 0:14 ` [PATCH 4/5] gen_nodes: move drive generation for guestfs to macros Luis Chamberlain
2024-03-07 0:14 ` Luis Chamberlain [this message]
2024-03-07 18:43 ` [PATCH 0/5] guestfs: start moving guest data " Luis Chamberlain
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240307001426.565390-6-mcgrof@kernel.org \
--to=mcgrof@kernel.org \
--cc=kdevops@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox