From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 835FC36C for ; Thu, 7 Mar 2024 00:14:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709770469; cv=none; b=G31jSZX/JiFWy1Ink87IwQjypEJVSdu7SqFzW2tbQnbAEhTcjnngAdo3xw7DKeBR4Rp7CUQrGIPF0UAPg3ruzldW51g3lbxn8RExl4O6acKgLCpymW/kcAh2Qc6jivomzXOCagXPGn06pc77YpEjc+j3hyEW31tBTllY6gdMzfU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709770469; c=relaxed/simple; bh=Yg+H81ylccgmQBOWazbzRR3HuZVJKat1zqAX1GOVbCU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=js0Y96Gt0CONHNmzXRlbXGKoeYcLudacZaUu7Cbs/x8LfZog+cg66ZptidjNA+auo6zWgCP+3OvXaQH/2beuze6HoJJuylSOywGQRLJKXNQYFQ/5Hq6wX3CrmUG90FNKGR1QQn/xOaKlpBJcJ0oEHmfMZF8yGQQRTTT/xDyr2ug= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=bZ9o36UJ; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="bZ9o36UJ" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=31fqPuDVauunREJplLe1OGMdPxIuX9YjTzrX7BES+Qw=; b=bZ9o36UJXcMAOvGkAQo8esAUY1 SVFOJuGMiBMNX8AV9ecDjWplRetIzjk5jJj1sXepRXhOM4pJTKpETzCTRWg9udB8vzmFE8aRsBUtK +JnTXY35ga+EoIsNaxLqnItVnLbXjE2f2BTmsdSd9ij+6qNMz7rgBddLeMeoR54mT+A26dgomyh2y QbCxLlZXOjMF3zkS2jMetfodL6+BchV+/iNps6steLdXolOali9abVwrFJEvD9Pvj8QmO7LE2mFI2 ytSBQ7jkjzeIvGxtdWe6/FHfFMOaHfX8Z5fSJYq710PNY/y3MnfHn7lLRBDuDrnjBwVs1Wveft91o Abdi1gfw==; Received: from mcgrof by bombadil.infradead.org with local (Exim 4.97.1 #2 (Red Hat Linux)) id 1ri1P1-00000002N5h-0fPN; Thu, 07 Mar 2024 00:14:27 +0000 From: Luis Chamberlain To: kdevops@lists.linux.dev Cc: Luis Chamberlain Subject: [PATCH 5/5] guestfs: add large IO support Date: Wed, 6 Mar 2024 16:14:26 -0800 Message-ID: <20240307001426.565390-6-mcgrof@kernel.org> X-Mailer: git-send-email 2.44.0 In-Reply-To: <20240307001426.565390-1-mcgrof@kernel.org> References: <20240307001426.565390-1-mcgrof@kernel.org> Precedence: bulk X-Mailing-List: kdevops@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: Luis Chamberlain Large IO experimentation support was added to vagrant a while ago so to enable experimentation with LBS support on the kernel [0]. Add this support to guestfs now that guestfs is all rave, and we plan to deprecate vagrant support as soon as we can. Screenshot with NVMe: $ sudo nvme list Node Generic SN Model Namespace Usage Format FW Rev --------------------- --------------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- -------- /dev/nvme9n1 /dev/ng9n1 kdevops22 QEMU NVMe Ctrl 1 107.37 GB / 107.37 GB 16 KiB + 0 B 8.2.1 /dev/nvme8n1 /dev/ng8n1 kdevops21 QEMU NVMe Ctrl 1 107.37 GB / 107.37 GB 16 KiB + 0 B 8.2.1 /dev/nvme7n1 /dev/ng7n1 kdevops16 QEMU NVMe Ctrl 1 107.37 GB / 107.37 GB 4 KiB + 0 B 8.2.1 /dev/nvme6n1 /dev/ng6n1 kdevops15 QEMU NVMe Ctrl 1 107.37 GB / 107.37 GB 4 KiB + 0 B 8.2.1 /dev/nvme5n1 /dev/ng5n1 kdevops14 QEMU NVMe Ctrl 1 107.37 GB / 107.37 GB 4 KiB + 0 B 8.2.1 /dev/nvme4n1 /dev/ng4n1 kdevops13 QEMU NVMe Ctrl 1 107.37 GB / 107.37 GB 4 KiB + 0 B 8.2.1 /dev/nvme3n1 /dev/ng3n1 kdevops4 QEMU NVMe Ctrl 1 107.37 GB / 107.37 GB 512 B + 0 B 8.2.1 /dev/nvme2n1 /dev/ng2n1 kdevops3 QEMU NVMe Ctrl 1 107.37 GB / 107.37 GB 512 B + 0 B 8.2.1 /dev/nvme16n1 /dev/ng16n1 kdevops30 QEMU NVMe Ctrl 1 107.37 GB / 107.37 GB 64 KiB + 0 B 8.2.1 /dev/nvme15n1 /dev/ng15n1 kdevops29 QEMU NVMe Ctrl 1 107.37 GB / 107.37 GB 64 KiB + 0 B 8.2.1 /dev/nvme14n1 /dev/ng14n1 kdevops27 QEMU NVMe Ctrl 1 107.37 GB / 107.37 GB 32 KiB + 0 B 8.2.1 /dev/nvme13n1 /dev/ng13n1 kdevops26 QEMU NVMe Ctrl 1 107.37 GB / 107.37 GB 32 KiB + 0 B 8.2.1 /dev/nvme12n1 /dev/ng12n1 kdevops25 QEMU NVMe Ctrl 1 107.37 GB / 107.37 GB 32 KiB + 0 B 8.2.1 /dev/nvme11n1 /dev/ng11n1 kdevops24 QEMU NVMe Ctrl 1 107.37 GB / 107.37 GB 16 KiB + 0 B 8.2.1 /dev/nvme10n1 /dev/ng10n1 kdevops23 QEMU NVMe Ctrl 1 107.37 GB / 107.37 GB 16 KiB + 0 B 8.2.1 /dev/nvme1n1 /dev/ng1n1 kdevops2 QEMU NVMe Ctrl 1 107.37 GB / 107.37 GB 512 B + 0 B 8.2.1 /dev/nvme0n1 /dev/ng0n1 kdevops1 QEMU NVMe Ctrl 1 107.37 GB / 107.37 GB 512 B + 0 B 8.2.1 $ uname -r 6.1.0-9-amd64 To use drives with larger LBA formats we need further changes than the LBS patches, but that's later. We have patches to get this functional up to 1 MiB LBA format. For virtio, we end up failing at virtio device 14: Mar 06 19:03:55 d5 kernel: virtio_blk virtio14: virtio_blk: invalid block size: 0x4000 Mar 06 19:03:55 d5 kernel: virtio_blk: probe of virtio14 failed with error -22 Mar 06 19:03:55 d5 kernel: virtio_blk virtio15: 8/0/0 default/read/poll queues Signed-off-by: Luis Chamberlain --- playbooks/roles/gen_nodes/templates/drives.j2 | 72 +++++++++++++++++++ .../gen_nodes/templates/guestfs_q35.j2.xml | 24 +++++++ scripts/bringup_guestfs.sh | 38 +++++++--- 3 files changed, 125 insertions(+), 9 deletions(-) diff --git a/playbooks/roles/gen_nodes/templates/drives.j2 b/playbooks/roles/gen_nodes/templates/drives.j2 index 4878cff9..676722ae 100644 --- a/playbooks/roles/gen_nodes/templates/drives.j2 +++ b/playbooks/roles/gen_nodes/templates/drives.j2 @@ -40,6 +40,42 @@ the drives can vary by type, so we have one macro by type of drive. {% endfor %} {%- endmacro -%} +{%- macro gen_drive_large_io_virtio(libvirt_largeio_logical_compat, + libvirt_largeio_logical_compat_size, + libvirt_largeio_pow_limit, + libvirt_largeio_drives_per_space, + hostname, + libvirt_extra_drive_format, + libvirt_extra_storage_aio_mode, + libvirt_extra_storage_aio_cache_mode, + kdevops_storage_pool_path) -%} + +{% set ns = namespace(lbs_idx=1) %} +{% set max_pbs = libvirt_largeio_logical_compat_size * (2 ** libvirt_largeio_pow_limit) %} +{% for n in range(0,libvirt_largeio_pow_limit+1) %} +{% for x in range(0,libvirt_largeio_drives_per_space) %} +{% set ns2 = namespace(pbs=libvirt_largeio_logical_compat_size * (2 ** n)) %} +{% set ns3 = namespace(pbs_next_two=ns2.pbs * (2*(x-1))) %} +{% if libvirt_largeio_logical_compat %} +{% set ns4 = namespace(lbs=libvirt_largeio_logical_compat_size) %} +{% else %} +{% set ns4 = namespace(lbs=ns2.pbs) %} +{% endif %} +{% if (ns2.pbs == 512 or ns2.pbs == 4096 or ns2.pbs >= 16384) and (ns3.pbs_next_two <= max_pbs) %} + + + + + + + + +{% endif %} +{% set ns.lbs_idx = ns.lbs_idx + 1 %} +{% endfor %} +{% endfor %} + +{%- endmacro -%} {%- macro gen_drive_nvme(num_drives, kdevops_storage_pool_path, hostname, @@ -61,6 +97,42 @@ the drives can vary by type, so we have one macro by type of drive. {% endfor %} {%- endmacro -%} +{%- macro gen_drive_large_io_nvme(libvirt_largeio_logical_compat, + libvirt_largeio_logical_compat_size, + libvirt_largeio_pow_limit, + libvirt_largeio_drives_per_space, + hostname, + libvirt_extra_drive_format, + libvirt_extra_storage_aio_mode, + libvirt_extra_storage_aio_cache_mode, + kdevops_storage_pool_path) -%} + +{% set ns = namespace(lbs_idx=1) %} +{% set max_pbs = libvirt_largeio_logical_compat_size * (2 ** libvirt_largeio_pow_limit) %} +{% for n in range(0,libvirt_largeio_pow_limit+1) %} +{% for x in range(0,libvirt_largeio_drives_per_space) %} +{% set ns2 = namespace(pbs=libvirt_largeio_logical_compat_size * (2 ** n)) %} +{% set ns3 = namespace(pbs_next_two=ns2.pbs * (2*(x-1))) %} +{% if libvirt_largeio_logical_compat %} +{% set ns4 = namespace(lbs=libvirt_largeio_logical_compat_size) %} +{% else %} +{% set ns4 = namespace(lbs=ns2.pbs) %} +{% endif %} +{% if (ns2.pbs == 512 or ns2.pbs == 4096 or ns2.pbs >= 16384) and (ns3.pbs_next_two <= max_pbs) %} + + + + + + + + +{% endif %} +{% set ns.lbs_idx = ns.lbs_idx + 1 %} +{% endfor %} +{% endfor %} + +{%- endmacro -%} {% macro gen_9p_mount(bootlinux_9p_driver, bootlinux_9p_fsdev, bootlinux_9p_host_path, diff --git a/playbooks/roles/gen_nodes/templates/guestfs_q35.j2.xml b/playbooks/roles/gen_nodes/templates/guestfs_q35.j2.xml index 16364ea2..fe8be827 100644 --- a/playbooks/roles/gen_nodes/templates/guestfs_q35.j2.xml +++ b/playbooks/roles/gen_nodes/templates/guestfs_q35.j2.xml @@ -186,6 +186,17 @@ libvirt_extra_storage_aio_mode, libvirt_extra_storage_aio_cache_mode) }} {% elif libvirt_extra_storage_drive_virtio %} +{% if libvirt_largeio_enable %} +{{ drives.gen_drive_large_io_virtio(libvirt_largeio_logical_compat, + libvirt_largeio_logical_compat_size, + libvirt_largeio_pow_limit, + libvirt_largeio_drives_per_space, + hostname, + libvirt_extra_drive_format, + libvirt_extra_storage_aio_mode, + libvirt_extra_storage_aio_cache_mode, + kdevops_storage_pool_path) }} +{% else %} {{ drives.gen_drive_virtio(4, kdevops_storage_pool_path, hostname, @@ -194,7 +205,19 @@ libvirt_extra_storage_aio_cache_mode, libvirt_extra_storage_virtio_logical_block_size, libvirt_extra_storage_virtio_physical_block_size) }} +{% endif %} {% elif libvirt_extra_storage_drive_nvme %} +{% if libvirt_largeio_enable %} +{{ drives.gen_drive_large_io_nvme(libvirt_largeio_logical_compat, + libvirt_largeio_logical_compat_size, + libvirt_largeio_pow_limit, + libvirt_largeio_drives_per_space, + hostname, + libvirt_extra_drive_format, + libvirt_extra_storage_aio_mode, + libvirt_extra_storage_aio_cache_mode, + kdevops_storage_pool_path) }} +{% else %} {{ drives.gen_drive_nvme(4, kdevops_storage_pool_path, hostname, @@ -203,6 +226,7 @@ libvirt_extra_storage_aio_cache_mode, libvirt_extra_storage_nvme_logical_block_size) }} {% endif %} +{% endif %} {% if bootlinux_9p %} {{ drives.gen_9p_mount(bootlinux_9p_driver, bootlinux_9p_fsdev, diff --git a/scripts/bringup_guestfs.sh b/scripts/bringup_guestfs.sh index 7dca84fe..6f621785 100755 --- a/scripts/bringup_guestfs.sh +++ b/scripts/bringup_guestfs.sh @@ -109,16 +109,36 @@ do cp --reflink=auto $BASE_IMAGE $ROOTIMG virt-sysprep -a $ROOTIMG --hostname $name --ssh-inject "kdevops:file:$SSH_KEY.pub" - # build some extra disks - for i in $(seq 0 3); do - diskimg="$STORAGEDIR/$name/extra${i}.${IMG_FMT}" - rm -f $diskimg - qemu-img create -f $IMG_FMT "$STORAGEDIR/$name/extra${i}.$IMG_FMT" 100G - if [[ "$CONFIG_LIBVIRT_URI_SYSTEM" == "y" ]]; then - chmod g+rw $STORAGEDIR/$name/extra${i}.$IMG_FMT - fi - done + + if [[ "$CONFIG_LIBVIRT_ENABLE_LARGEIO" == "y" ]]; then + lbs_idx=1 + for i in $(seq 1 $(($CONFIG_QEMU_LARGEIO_MAX_POW_LIMIT+1))); do + for x in $(seq 0 $CONFIG_QEMU_EXTRA_DRIVE_LARGEIO_NUM_DRIVES_PER_SPACE); do + diskimg="$STORAGEDIR/$name/extra${lbs_idx}.${IMG_FMT}" + rm -f $diskimg + qemu-img create -f $IMG_FMT "$diskimg" 100G + if [[ "$CONFIG_LIBVIRT_URI_SYSTEM" == "y" ]]; then + chmod g+rw $diskimg + fi + let lbs_idx=$lbs_idx+1 + done + done + else + # build some extra disks + for i in $(seq 0 3); do + diskimg="$STORAGEDIR/$name/extra${i}.${IMG_FMT}" + rm -f $diskimg + qemu-img create -f $IMG_FMT "$STORAGEDIR/$name/extra${i}.$IMG_FMT" 100G + if [[ "$CONFIG_LIBVIRT_URI_SYSTEM" == "y" ]]; then + chmod g+rw $STORAGEDIR/$name/extra${i}.$IMG_FMT + fi + done + fi virsh define $GUESTFSDIR/$name/$name.xml virsh start $name + if [[ $? -ne 0 ]]; then + echo "Failed to start $name" + exit 1 + fi done -- 2.43.0