From: Luis Chamberlain <mcgrof@kernel.org>
To: kdevops@lists.linux.dev
Cc: Luis Chamberlain <mcgrof@kernel.org>
Subject: [PATCH 5/5] guestfs: add large IO support
Date: Wed, 6 Mar 2024 16:14:26 -0800 [thread overview]
Message-ID: <20240307001426.565390-6-mcgrof@kernel.org> (raw)
In-Reply-To: <20240307001426.565390-1-mcgrof@kernel.org>
Large IO experimentation support was added to vagrant a while ago so to
enable experimentation with LBS support on the kernel [0]. Add this support
to guestfs now that guestfs is all rave, and we plan to deprecate
vagrant support as soon as we can.
Screenshot with NVMe:
$ sudo nvme list
Node Generic SN Model Namespace Usage Format FW Rev
--------------------- --------------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme9n1 /dev/ng9n1 kdevops22 QEMU NVMe Ctrl 1 107.37 GB / 107.37 GB 16 KiB + 0 B 8.2.1
/dev/nvme8n1 /dev/ng8n1 kdevops21 QEMU NVMe Ctrl 1 107.37 GB / 107.37 GB 16 KiB + 0 B 8.2.1
/dev/nvme7n1 /dev/ng7n1 kdevops16 QEMU NVMe Ctrl 1 107.37 GB / 107.37 GB 4 KiB + 0 B 8.2.1
/dev/nvme6n1 /dev/ng6n1 kdevops15 QEMU NVMe Ctrl 1 107.37 GB / 107.37 GB 4 KiB + 0 B 8.2.1
/dev/nvme5n1 /dev/ng5n1 kdevops14 QEMU NVMe Ctrl 1 107.37 GB / 107.37 GB 4 KiB + 0 B 8.2.1
/dev/nvme4n1 /dev/ng4n1 kdevops13 QEMU NVMe Ctrl 1 107.37 GB / 107.37 GB 4 KiB + 0 B 8.2.1
/dev/nvme3n1 /dev/ng3n1 kdevops4 QEMU NVMe Ctrl 1 107.37 GB / 107.37 GB 512 B + 0 B 8.2.1
/dev/nvme2n1 /dev/ng2n1 kdevops3 QEMU NVMe Ctrl 1 107.37 GB / 107.37 GB 512 B + 0 B 8.2.1
/dev/nvme16n1 /dev/ng16n1 kdevops30 QEMU NVMe Ctrl 1 107.37 GB / 107.37 GB 64 KiB + 0 B 8.2.1
/dev/nvme15n1 /dev/ng15n1 kdevops29 QEMU NVMe Ctrl 1 107.37 GB / 107.37 GB 64 KiB + 0 B 8.2.1
/dev/nvme14n1 /dev/ng14n1 kdevops27 QEMU NVMe Ctrl 1 107.37 GB / 107.37 GB 32 KiB + 0 B 8.2.1
/dev/nvme13n1 /dev/ng13n1 kdevops26 QEMU NVMe Ctrl 1 107.37 GB / 107.37 GB 32 KiB + 0 B 8.2.1
/dev/nvme12n1 /dev/ng12n1 kdevops25 QEMU NVMe Ctrl 1 107.37 GB / 107.37 GB 32 KiB + 0 B 8.2.1
/dev/nvme11n1 /dev/ng11n1 kdevops24 QEMU NVMe Ctrl 1 107.37 GB / 107.37 GB 16 KiB + 0 B 8.2.1
/dev/nvme10n1 /dev/ng10n1 kdevops23 QEMU NVMe Ctrl 1 107.37 GB / 107.37 GB 16 KiB + 0 B 8.2.1
/dev/nvme1n1 /dev/ng1n1 kdevops2 QEMU NVMe Ctrl 1 107.37 GB / 107.37 GB 512 B + 0 B 8.2.1
/dev/nvme0n1 /dev/ng0n1 kdevops1 QEMU NVMe Ctrl 1 107.37 GB / 107.37 GB 512 B + 0 B 8.2.1
$ uname -r
6.1.0-9-amd64
To use drives with larger LBA formats we need further changes than the
LBS patches, but that's later. We have patches to get this functional up
to 1 MiB LBA format.
For virtio, we end up failing at virtio device 14:
Mar 06 19:03:55 d5 kernel: virtio_blk virtio14: virtio_blk: invalid block size: 0x4000
Mar 06 19:03:55 d5 kernel: virtio_blk: probe of virtio14 failed with error -22
Mar 06 19:03:55 d5 kernel: virtio_blk virtio15: 8/0/0 default/read/poll queues
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
playbooks/roles/gen_nodes/templates/drives.j2 | 72 +++++++++++++++++++
.../gen_nodes/templates/guestfs_q35.j2.xml | 24 +++++++
scripts/bringup_guestfs.sh | 38 +++++++---
3 files changed, 125 insertions(+), 9 deletions(-)
diff --git a/playbooks/roles/gen_nodes/templates/drives.j2 b/playbooks/roles/gen_nodes/templates/drives.j2
index 4878cff9..676722ae 100644
--- a/playbooks/roles/gen_nodes/templates/drives.j2
+++ b/playbooks/roles/gen_nodes/templates/drives.j2
@@ -40,6 +40,42 @@ the drives can vary by type, so we have one macro by type of drive.
{% endfor %}
<!-- End of virtio drives-->
{%- endmacro -%}
+{%- macro gen_drive_large_io_virtio(libvirt_largeio_logical_compat,
+ libvirt_largeio_logical_compat_size,
+ libvirt_largeio_pow_limit,
+ libvirt_largeio_drives_per_space,
+ hostname,
+ libvirt_extra_drive_format,
+ libvirt_extra_storage_aio_mode,
+ libvirt_extra_storage_aio_cache_mode,
+ kdevops_storage_pool_path) -%}
+<!-- These are virtio drives used for large IO experimentaiton, with LBS support -->
+{% set ns = namespace(lbs_idx=1) %}
+{% set max_pbs = libvirt_largeio_logical_compat_size * (2 ** libvirt_largeio_pow_limit) %}
+{% for n in range(0,libvirt_largeio_pow_limit+1) %}
+{% for x in range(0,libvirt_largeio_drives_per_space) %}
+{% set ns2 = namespace(pbs=libvirt_largeio_logical_compat_size * (2 ** n)) %}
+{% set ns3 = namespace(pbs_next_two=ns2.pbs * (2*(x-1))) %}
+{% if libvirt_largeio_logical_compat %}
+{% set ns4 = namespace(lbs=libvirt_largeio_logical_compat_size) %}
+{% else %}
+{% set ns4 = namespace(lbs=ns2.pbs) %}
+{% endif %}
+{% if (ns2.pbs == 512 or ns2.pbs == 4096 or ns2.pbs >= 16384) and (ns3.pbs_next_two <= max_pbs) %}
+ <qemu:arg value='-device'/>
+ <qemu:arg value='pcie-root-port,id=pcie-port-for-virtio-{{ ns.lbs_idx }},multifunction=on,bus=pcie.1,addr=0x{{ "%0x" | format( ns.lbs_idx | int) }},chassis={{ 50 + ns.lbs_idx }}'/>
+ <qemu:arg value="-object"/>
+ <qemu:arg value="iothread,id=kdevops-virtio-iothread-{{ ns.lbs_idx }}"/>
+ <qemu:arg value='-drive'/>
+ <qemu:arg value='file={{ kdevops_storage_pool_path }}/guestfs/{{ hostname }}/extra{{ ns.lbs_idx }}.{{ libvirt_extra_drive_format }},format={{ libvirt_extra_drive_format }},if=none,aio={{ libvirt_extra_storage_aio_mode }},cache={{ libvirt_extra_storage_aio_cache_mode }},id=drv{{ ns.lbs_idx }}'/>
+ <qemu:arg value='-device'/>
+ <qemu:arg value="virtio-blk-pci,scsi=off,drive=drv{{ ns.lbs_idx }},id=virtio-drv{{ ns.lbs_idx }},serial=kdevops{{ ns.lbs_idx }},bus=pcie-port-for-virtio-{{ ns.lbs_idx }},addr=0x0,iothread=kdevops-virtio-iothread-{{ ns.lbs_idx }},logical_block_size={{ ns4.lbs }},physical_block_size={{ ns2.pbs }}"/>
+{% endif %}
+{% set ns.lbs_idx = ns.lbs_idx + 1 %}
+{% endfor %}
+{% endfor %}
+<!-- End of virtio drives for large IO experimentation -->
+{%- endmacro -%}
{%- macro gen_drive_nvme(num_drives,
kdevops_storage_pool_path,
hostname,
@@ -61,6 +97,42 @@ the drives can vary by type, so we have one macro by type of drive.
{% endfor %}
<!-- End of NVMe drives-->
{%- endmacro -%}
+{%- macro gen_drive_large_io_nvme(libvirt_largeio_logical_compat,
+ libvirt_largeio_logical_compat_size,
+ libvirt_largeio_pow_limit,
+ libvirt_largeio_drives_per_space,
+ hostname,
+ libvirt_extra_drive_format,
+ libvirt_extra_storage_aio_mode,
+ libvirt_extra_storage_aio_cache_mode,
+ kdevops_storage_pool_path) -%}
+<!-- These are NVMe drives used for large IO experimentaiton, with LBS support -->
+{% set ns = namespace(lbs_idx=1) %}
+{% set max_pbs = libvirt_largeio_logical_compat_size * (2 ** libvirt_largeio_pow_limit) %}
+{% for n in range(0,libvirt_largeio_pow_limit+1) %}
+{% for x in range(0,libvirt_largeio_drives_per_space) %}
+{% set ns2 = namespace(pbs=libvirt_largeio_logical_compat_size * (2 ** n)) %}
+{% set ns3 = namespace(pbs_next_two=ns2.pbs * (2*(x-1))) %}
+{% if libvirt_largeio_logical_compat %}
+{% set ns4 = namespace(lbs=libvirt_largeio_logical_compat_size) %}
+{% else %}
+{% set ns4 = namespace(lbs=ns2.pbs) %}
+{% endif %}
+{% if (ns2.pbs == 512 or ns2.pbs == 4096 or ns2.pbs >= 16384) and (ns3.pbs_next_two <= max_pbs) %}
+ <qemu:arg value='-device'/>
+ <qemu:arg value='pcie-root-port,id=pcie-port-for-nvme-{{ ns.lbs_idx }},multifunction=on,bus=pcie.1,addr=0x{{ "%0x" | format( ns.lbs_idx | int) }},chassis={{ 50 + ns.lbs_idx }}'/>
+ <qemu:arg value='-drive'/>
+ <qemu:arg value='file={{ kdevops_storage_pool_path }}/guestfs/{{ hostname }}/extra{{ ns.lbs_idx }}.{{ libvirt_extra_drive_format }},format={{ libvirt_extra_drive_format }},if=none,id=drv{{ ns.lbs_idx }}'/>
+ <qemu:arg value='-device'/>
+ <qemu:arg value='nvme,id=nvme{{ ns.lbs_idx }},serial=kdevops{{ ns.lbs_idx }},bus=pcie-port-for-nvme-{{ ns.lbs_idx }},addr=0x0'/>
+ <qemu:arg value='-device'/>
+ <qemu:arg value='nvme-ns,drive=drv{{ ns.lbs_idx }},bus=nvme{{ ns.lbs_idx }},nsid=1,logical_block_size={{ ns4.lbs }},physical_block_size={{ ns2.pbs }}'/>
+{% endif %}
+{% set ns.lbs_idx = ns.lbs_idx + 1 %}
+{% endfor %}
+{% endfor %}
+<!-- End of NVMe drives for large IO experimentation -->
+{%- endmacro -%}
{% macro gen_9p_mount(bootlinux_9p_driver,
bootlinux_9p_fsdev,
bootlinux_9p_host_path,
diff --git a/playbooks/roles/gen_nodes/templates/guestfs_q35.j2.xml b/playbooks/roles/gen_nodes/templates/guestfs_q35.j2.xml
index 16364ea2..fe8be827 100644
--- a/playbooks/roles/gen_nodes/templates/guestfs_q35.j2.xml
+++ b/playbooks/roles/gen_nodes/templates/guestfs_q35.j2.xml
@@ -186,6 +186,17 @@
libvirt_extra_storage_aio_mode,
libvirt_extra_storage_aio_cache_mode) }}
{% elif libvirt_extra_storage_drive_virtio %}
+{% if libvirt_largeio_enable %}
+{{ drives.gen_drive_large_io_virtio(libvirt_largeio_logical_compat,
+ libvirt_largeio_logical_compat_size,
+ libvirt_largeio_pow_limit,
+ libvirt_largeio_drives_per_space,
+ hostname,
+ libvirt_extra_drive_format,
+ libvirt_extra_storage_aio_mode,
+ libvirt_extra_storage_aio_cache_mode,
+ kdevops_storage_pool_path) }}
+{% else %}
{{ drives.gen_drive_virtio(4,
kdevops_storage_pool_path,
hostname,
@@ -194,7 +205,19 @@
libvirt_extra_storage_aio_cache_mode,
libvirt_extra_storage_virtio_logical_block_size,
libvirt_extra_storage_virtio_physical_block_size) }}
+{% endif %}
{% elif libvirt_extra_storage_drive_nvme %}
+{% if libvirt_largeio_enable %}
+{{ drives.gen_drive_large_io_nvme(libvirt_largeio_logical_compat,
+ libvirt_largeio_logical_compat_size,
+ libvirt_largeio_pow_limit,
+ libvirt_largeio_drives_per_space,
+ hostname,
+ libvirt_extra_drive_format,
+ libvirt_extra_storage_aio_mode,
+ libvirt_extra_storage_aio_cache_mode,
+ kdevops_storage_pool_path) }}
+{% else %}
{{ drives.gen_drive_nvme(4,
kdevops_storage_pool_path,
hostname,
@@ -203,6 +226,7 @@
libvirt_extra_storage_aio_cache_mode,
libvirt_extra_storage_nvme_logical_block_size) }}
{% endif %}
+{% endif %}
{% if bootlinux_9p %}
{{ drives.gen_9p_mount(bootlinux_9p_driver,
bootlinux_9p_fsdev,
diff --git a/scripts/bringup_guestfs.sh b/scripts/bringup_guestfs.sh
index 7dca84fe..6f621785 100755
--- a/scripts/bringup_guestfs.sh
+++ b/scripts/bringup_guestfs.sh
@@ -109,16 +109,36 @@ do
cp --reflink=auto $BASE_IMAGE $ROOTIMG
virt-sysprep -a $ROOTIMG --hostname $name --ssh-inject "kdevops:file:$SSH_KEY.pub"
- # build some extra disks
- for i in $(seq 0 3); do
- diskimg="$STORAGEDIR/$name/extra${i}.${IMG_FMT}"
- rm -f $diskimg
- qemu-img create -f $IMG_FMT "$STORAGEDIR/$name/extra${i}.$IMG_FMT" 100G
- if [[ "$CONFIG_LIBVIRT_URI_SYSTEM" == "y" ]]; then
- chmod g+rw $STORAGEDIR/$name/extra${i}.$IMG_FMT
- fi
- done
+
+ if [[ "$CONFIG_LIBVIRT_ENABLE_LARGEIO" == "y" ]]; then
+ lbs_idx=1
+ for i in $(seq 1 $(($CONFIG_QEMU_LARGEIO_MAX_POW_LIMIT+1))); do
+ for x in $(seq 0 $CONFIG_QEMU_EXTRA_DRIVE_LARGEIO_NUM_DRIVES_PER_SPACE); do
+ diskimg="$STORAGEDIR/$name/extra${lbs_idx}.${IMG_FMT}"
+ rm -f $diskimg
+ qemu-img create -f $IMG_FMT "$diskimg" 100G
+ if [[ "$CONFIG_LIBVIRT_URI_SYSTEM" == "y" ]]; then
+ chmod g+rw $diskimg
+ fi
+ let lbs_idx=$lbs_idx+1
+ done
+ done
+ else
+ # build some extra disks
+ for i in $(seq 0 3); do
+ diskimg="$STORAGEDIR/$name/extra${i}.${IMG_FMT}"
+ rm -f $diskimg
+ qemu-img create -f $IMG_FMT "$STORAGEDIR/$name/extra${i}.$IMG_FMT" 100G
+ if [[ "$CONFIG_LIBVIRT_URI_SYSTEM" == "y" ]]; then
+ chmod g+rw $STORAGEDIR/$name/extra${i}.$IMG_FMT
+ fi
+ done
+ fi
virsh define $GUESTFSDIR/$name/$name.xml
virsh start $name
+ if [[ $? -ne 0 ]]; then
+ echo "Failed to start $name"
+ exit 1
+ fi
done
--
2.43.0
next prev parent reply other threads:[~2024-03-07 0:14 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-07 0:14 [PATCH 0/5] guestfs: start moving guest data to macros Luis Chamberlain
2024-03-07 0:14 ` [PATCH 1/5] bringup_guestfs.sh: use bash as the default shell Luis Chamberlain
2024-03-07 0:14 ` [PATCH 2/5] guestfs_q35: use hex for pci addr Luis Chamberlain
2024-03-07 0:14 ` [PATCH 3/5] guestfs_q35: use libvirt_extra_storage_nvme_logical_block_size Luis Chamberlain
2024-03-07 0:14 ` [PATCH 4/5] gen_nodes: move drive generation for guestfs to macros Luis Chamberlain
2024-03-07 0:14 ` Luis Chamberlain [this message]
2024-03-07 18:43 ` [PATCH 0/5] guestfs: start moving guest data " Luis Chamberlain
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240307001426.565390-6-mcgrof@kernel.org \
--to=mcgrof@kernel.org \
--cc=kdevops@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.