From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 60B3A30CD8D for ; Wed, 27 Aug 2025 09:32:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756287134; cv=none; b=Z/YImF7OalifavEXo7BvB1nC2Q8JCx6I2ISUsvNW/YDrGSXqUCTeiQi1dWfEy0FPpneNbEdJon/y2/K3kPf7x7KUUvJLdMf2I2XP2/jWtA6j2A+xBUM0dL7OAYPst0iinw7MUvYQedA3VY4VHYrpOVEKpzasiS1ZyKwBDNfqNVQ= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756287134; c=relaxed/simple; bh=pEerM1B8e82GXReitrM2PASArSg4uCblWjiWw+P6Hmg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=S5WbaMMFKHp0qAnkpGFQOGTvVZd7SNWBRpsmblz0Y8BTz7MDCuPnRPpkrR4yc2yNl52GlK7dew/AGOAhC5G2BgSJTza6Jvo53WtVgu6mGIlHVD1WFu6TAqs0AdCQoGQfJEkpuOTpGHPUMrC09P0i06lvNruNdV97OC93I54JCX0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=OdPqPDt/; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="OdPqPDt/" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Sender:Content-Transfer-Encoding: Content-Type:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc: To:From:Reply-To:Content-ID:Content-Description; bh=bKGskR1TW/OvyXon+3jjQnz0cEB/ItdE7aO8yiV9waM=; b=OdPqPDt/cNT/goHPHvVI8bdghF b+p+h1iHbISwMWu/dqDbbV7I1o2vFu8LUpoXbU49n5AEPzhqYPbmanrtc0pj2f0TIaAjdwvoLat4H GJ+pIk7eQpnuH3V/jD1LbTnHENAZv2Xi1eFfhQluj8iZ9f8uX7cV9p9jbVccYHTtHWutVnw2cSvGj OHvv7IC0yKb9qzBIvpSsGkxBZrXC88+ig/bWZbw5SHu7IIyk4cJMGCVaLmm/YIZ7hpllLcxJzaDpg SiTkjVkUeZu/UR505zrAhtyE2SxkIj/1jAEvUrBeBTnO6EEj3bv1TUDLZ5iqlMwR+9/UntFtjTsWI nsNeqAXw==; Received: from mcgrof by bombadil.infradead.org with local (Exim 4.98.2 #2 (Red Hat Linux)) id 1urCVg-0000000Equo-2GGx; Wed, 27 Aug 2025 09:32:04 +0000 From: Luis Chamberlain To: Chuck Lever , Daniel Gomez , hui81.qi@samsung.com, kundan.kumar@samsung.com, kdevops@lists.linux.dev Cc: Luis Chamberlain Subject: [PATCH 2/2] ai: add multi-filesystem testing support for Milvus benchmarks Date: Wed, 27 Aug 2025 02:32:01 -0700 Message-ID: <20250827093202.3539990-3-mcgrof@kernel.org> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250827093202.3539990-1-mcgrof@kernel.org> References: <20250827093202.3539990-1-mcgrof@kernel.org> Precedence: bulk X-Mailing-List: kdevops@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: Luis Chamberlain Extend the AI workflow to support testing Milvus across multiple filesystem configurations simultaneously. This enables comprehensive performance comparisons between different filesystems and their configuration options. Key features: - Dynamic node generation based on enabled filesystem configurations - Support for XFS, EXT4, and BTRFS with various mount options - Per-filesystem result collection and analysis - A/B testing across all filesystem configurations - Automated comparison graphs between filesystems Filesystem configurations: - XFS: default, nocrc, bigtime with various block sizes (512, 1k, 2k, 4k) - EXT4: default, nojournal, bigalloc configurations - BTRFS: default, zlib, lzo, zstd compression options Defconfigs: - ai-milvus-multifs: Test 7 filesystem configs with A/B testing - ai-milvus-multifs-distro: Test with distribution kernels - ai-milvus-multifs-extended: Extended configs (14 filesystems total) Node generation: The system dynamically generates nodes based on enabled filesystem configurations. With A/B testing enabled, this creates baseline and dev nodes for each filesystem (e.g., debian13-ai-xfs-4k and debian13-ai-xfs-4k-dev). Usage: make defconfig-ai-milvus-multifs make bringup # Creates nodes for each filesystem make ai # Setup infrastructure on all nodes make ai-tests # Run benchmarks on all filesystems make ai-results # Collect and compare results This enables systematic evaluation of how different filesystems and their configurations affect vector database performance. Generated-by: Claude AI Signed-off-by: Luis Chamberlain --- .github/workflows/docker-tests.yml | 6 + Makefile | 2 +- defconfigs/ai-milvus-multifs | 67 + defconfigs/ai-milvus-multifs-distro | 109 ++ defconfigs/ai-milvus-multifs-extended | 108 ++ docs/ai/vector-databases/README.md | 1 - playbooks/ai_install.yml | 6 + playbooks/ai_multifs.yml | 24 + .../host_vars/debian13-ai-xfs-4k-4ks.yml | 10 - .../files/analyze_results.py | 1132 +++++++++++--- .../files/generate_better_graphs.py | 16 +- .../files/generate_graphs.py | 888 ++++------- .../files/generate_html_report.py | 263 +++- .../roles/ai_collect_results/tasks/main.yml | 42 +- .../templates/analysis_config.json.j2 | 2 +- .../roles/ai_milvus_storage/tasks/main.yml | 161 ++ .../tasks/generate_comparison.yml | 279 ++++ playbooks/roles/ai_multifs_run/tasks/main.yml | 23 + .../tasks/run_single_filesystem.yml | 104 ++ .../templates/milvus_config.json.j2 | 42 + .../roles/ai_multifs_setup/defaults/main.yml | 49 + .../roles/ai_multifs_setup/tasks/main.yml | 70 + .../files/milvus_benchmark.py | 164 +- playbooks/roles/gen_hosts/tasks/main.yml | 19 + .../roles/gen_hosts/templates/fstests.j2 | 2 + playbooks/roles/gen_hosts/templates/gitr.j2 | 2 + playbooks/roles/gen_hosts/templates/hosts.j2 | 35 +- .../roles/gen_hosts/templates/nfstest.j2 | 2 + playbooks/roles/gen_hosts/templates/pynfs.j2 | 2 + playbooks/roles/gen_nodes/tasks/main.yml | 90 ++ .../roles/guestfs/tasks/bringup/main.yml | 15 + scripts/guestfs.Makefile | 2 +- workflows/ai/Kconfig | 13 + workflows/ai/Kconfig.fs | 118 ++ workflows/ai/Kconfig.multifs | 184 +++ workflows/ai/scripts/analysis_config.json | 2 +- workflows/ai/scripts/analyze_results.py | 1132 +++++++++++--- workflows/ai/scripts/generate_graphs.py | 1372 ++++------------- workflows/ai/scripts/generate_html_report.py | 94 +- 39 files changed, 4356 insertions(+), 2296 deletions(-) create mode 100644 defconfigs/ai-milvus-multifs create mode 100644 defconfigs/ai-milvus-multifs-distro create mode 100644 defconfigs/ai-milvus-multifs-extended create mode 100644 playbooks/ai_multifs.yml delete mode 100644 playbooks/host_vars/debian13-ai-xfs-4k-4ks.yml create mode 100644 playbooks/roles/ai_milvus_storage/tasks/main.yml create mode 100644 playbooks/roles/ai_multifs_run/tasks/generate_comparison.yml create mode 100644 playbooks/roles/ai_multifs_run/tasks/main.yml create mode 100644 playbooks/roles/ai_multifs_run/tasks/run_single_filesystem.yml create mode 100644 playbooks/roles/ai_multifs_run/templates/milvus_config.json.j2 create mode 100644 playbooks/roles/ai_multifs_setup/defaults/main.yml create mode 100644 playbooks/roles/ai_multifs_setup/tasks/main.yml create mode 100644 workflows/ai/Kconfig.fs create mode 100644 workflows/ai/Kconfig.multifs diff --git a/.github/workflows/docker-tests.yml b/.github/workflows/docker-tests.yml index c0e0d03d..adea1182 100644 --- a/.github/workflows/docker-tests.yml +++ b/.github/workflows/docker-tests.yml @@ -53,3 +53,9 @@ jobs: echo "Running simple make targets on ${{ matrix.distro_container }} environment" make mrproper + - name: Test fio-tests defconfig + run: | + echo "Testing fio-tests CI configuration" + make defconfig-fio-tests-ci + make + echo "Configuration test passed for fio-tests" diff --git a/Makefile b/Makefile index 8755577e..83c67340 100644 --- a/Makefile +++ b/Makefile @@ -226,7 +226,7 @@ include scripts/bringup.Makefile endif DEFAULT_DEPS += $(ANSIBLE_INVENTORY_FILE) -$(ANSIBLE_INVENTORY_FILE): .config $(ANSIBLE_CFG_FILE) $(KDEVOPS_HOSTS_TEMPLATE) +$(ANSIBLE_INVENTORY_FILE): .config $(ANSIBLE_CFG_FILE) $(KDEVOPS_HOSTS_TEMPLATE) $(KDEVOPS_NODES) $(Q)ANSIBLE_LOCALHOST_WARNING=False ANSIBLE_INVENTORY_UNPARSED_WARNING=False \ ansible-playbook $(ANSIBLE_VERBOSE) \ $(KDEVOPS_PLAYBOOKS_DIR)/gen_hosts.yml \ diff --git a/defconfigs/ai-milvus-multifs b/defconfigs/ai-milvus-multifs new file mode 100644 index 00000000..7e5ad971 --- /dev/null +++ b/defconfigs/ai-milvus-multifs @@ -0,0 +1,67 @@ +CONFIG_GUESTFS=y +CONFIG_LIBVIRT=y + +# Disable mirror features for CI/testing +# CONFIG_ENABLE_LOCAL_LINUX_MIRROR is not set +# CONFIG_USE_LOCAL_LINUX_MIRROR is not set +# CONFIG_INSTALL_ONLY_GIT_DAEMON is not set +# CONFIG_MIRROR_INSTALL is not set + +CONFIG_WORKFLOWS=y +CONFIG_WORKFLOW_LINUX_CUSTOM=y + +CONFIG_BOOTLINUX=y +CONFIG_BOOTLINUX_9P=y + +# Enable A/B testing with different kernel references +CONFIG_KDEVOPS_BASELINE_AND_DEV=y +CONFIG_BOOTLINUX_AB_DIFFERENT_REF=y + +# AI workflow configuration +CONFIG_WORKFLOWS_TESTS=y +CONFIG_WORKFLOWS_LINUX_TESTS=y +CONFIG_WORKFLOWS_DEDICATED_WORKFLOW=y +CONFIG_KDEVOPS_WORKFLOW_DEDICATE_AI=y + +# Vector database configuration +CONFIG_AI_TESTS_VECTOR_DATABASE=y +CONFIG_AI_VECTOR_DB_MILVUS=y +CONFIG_AI_VECTOR_DB_MILVUS_DOCKER=y + +# Enable multi-filesystem testing +CONFIG_AI_MULTIFS_ENABLE=y +CONFIG_AI_ENABLE_MULTIFS_TESTING=y + +# Enable dedicated Milvus storage with node-based filesystem +CONFIG_AI_MILVUS_STORAGE_ENABLE=y +CONFIG_AI_MILVUS_USE_NODE_FS=y + +# Test XFS with different block sizes +CONFIG_AI_MULTIFS_TEST_XFS=y +CONFIG_AI_MULTIFS_XFS_4K_4KS=y +CONFIG_AI_MULTIFS_XFS_16K_4KS=y +CONFIG_AI_MULTIFS_XFS_32K_4KS=y +CONFIG_AI_MULTIFS_XFS_64K_4KS=y + +# Test EXT4 configurations +CONFIG_AI_MULTIFS_TEST_EXT4=y +CONFIG_AI_MULTIFS_EXT4_4K=y +CONFIG_AI_MULTIFS_EXT4_16K_BIGALLOC=y + +# Test BTRFS +CONFIG_AI_MULTIFS_TEST_BTRFS=y +CONFIG_AI_MULTIFS_BTRFS_DEFAULT=y + +# Performance settings +CONFIG_AI_BENCHMARK_ENABLE_GRAPHING=y +CONFIG_AI_BENCHMARK_ITERATIONS=5 + +# Dataset configuration for benchmarking +CONFIG_AI_VECTOR_DB_MILVUS_DATASET_SIZE=100000 +CONFIG_AI_VECTOR_DB_MILVUS_BATCH_SIZE=10000 +CONFIG_AI_VECTOR_DB_MILVUS_NUM_QUERIES=10000 + +# Container configuration +CONFIG_AI_VECTOR_DB_MILVUS_CONTAINER_IMAGE_2_5=y +CONFIG_AI_VECTOR_DB_MILVUS_MEMORY_LIMIT="8g" +CONFIG_AI_VECTOR_DB_MILVUS_CPU_LIMIT="4.0" \ No newline at end of file diff --git a/defconfigs/ai-milvus-multifs-distro b/defconfigs/ai-milvus-multifs-distro new file mode 100644 index 00000000..fb71f2b5 --- /dev/null +++ b/defconfigs/ai-milvus-multifs-distro @@ -0,0 +1,109 @@ +# AI Multi-Filesystem Performance Testing Configuration (Distro Kernel) +# This configuration enables testing AI workloads across multiple filesystem +# configurations including XFS (4k and 16k block sizes), ext4 (4k and 16k bigalloc), +# and btrfs (default profile) using the distribution kernel without A/B testing. + +# Base virtualization setup +CONFIG_LIBVIRT=y +CONFIG_LIBVIRT_MACHINE_TYPE_Q35=y +CONFIG_LIBVIRT_STORAGE_POOL_PATH="/opt/kdevops/libvirt" +CONFIG_LIBVIRT_ENABLE_LARGEIO=y +CONFIG_LIBVIRT_EXTRA_STORAGE_DRIVE_NVME=y +CONFIG_LIBVIRT_EXTRA_STORAGE_DRIVE_SIZE="50GiB" + +# Network configuration +CONFIG_LIBVIRT_ENABLE_BRIDGED_NETWORKING=y +CONFIG_LIBVIRT_NET_NAME="kdevops" + +# Host configuration +CONFIG_KDEVOPS_HOSTS_TEMPLATE="hosts.j2" +CONFIG_VAGRANT_NVME_DISK_SIZE="50GiB" + +# Base system requirements +CONFIG_WORKFLOWS=y +CONFIG_WORKFLOWS_TESTS=y +CONFIG_WORKFLOWS_LINUX_TESTS=y +CONFIG_WORKFLOWS_DEDICATED_WORKFLOW=y +CONFIG_KDEVOPS_WORKFLOW_DEDICATE_AI=y + +# AI Workflow Configuration +CONFIG_KDEVOPS_WORKFLOW_ENABLE_AI=y +CONFIG_AI_TESTS_VECTOR_DATABASE=y +CONFIG_AI_MILVUS_DOCKER=y +CONFIG_AI_VECTOR_DB_TYPE_MILVUS=y + +# Milvus Configuration +CONFIG_AI_MILVUS_HOST="localhost" +CONFIG_AI_MILVUS_PORT=19530 +CONFIG_AI_MILVUS_DATABASE_NAME="ai_benchmark" + +# Test Parameters (optimized for multi-fs testing) +CONFIG_AI_BENCHMARK_ITERATIONS=3 +CONFIG_AI_DATASET_1M=y +CONFIG_AI_VECTOR_DIM_128=y +CONFIG_AI_BENCHMARK_RUNTIME="180" +CONFIG_AI_BENCHMARK_WARMUP_TIME="30" + +# Query patterns +CONFIG_AI_BENCHMARK_QUERY_TOPK_1=y +CONFIG_AI_BENCHMARK_QUERY_TOPK_10=y + +# Batch sizes +CONFIG_AI_BENCHMARK_BATCH_1=y +CONFIG_AI_BENCHMARK_BATCH_10=y + +# Index configuration +CONFIG_AI_INDEX_HNSW=y +CONFIG_AI_INDEX_TYPE="HNSW" +CONFIG_AI_INDEX_HNSW_M=16 +CONFIG_AI_INDEX_HNSW_EF_CONSTRUCTION=200 +CONFIG_AI_INDEX_HNSW_EF=64 + +# Results and visualization +CONFIG_AI_BENCHMARK_RESULTS_DIR="/data/ai-benchmark" +CONFIG_AI_BENCHMARK_ENABLE_GRAPHING=y +CONFIG_AI_BENCHMARK_GRAPH_FORMAT="png" +CONFIG_AI_BENCHMARK_GRAPH_DPI=300 +CONFIG_AI_BENCHMARK_GRAPH_THEME="default" + +# Multi-filesystem testing configuration +CONFIG_AI_ENABLE_MULTIFS_TESTING=y +CONFIG_AI_MULTIFS_RESULTS_DIR="/data/ai-multifs-benchmark" + +# Enable dedicated Milvus storage with node-based filesystem +CONFIG_AI_MILVUS_STORAGE_ENABLE=y +CONFIG_AI_MILVUS_USE_NODE_FS=y +CONFIG_AI_MILVUS_DEVICE="/dev/disk/by-id/nvme-QEMU_NVMe_Ctrl_kdevops3" +CONFIG_AI_MILVUS_MOUNT_POINT="/data/milvus" + +# XFS configurations +CONFIG_AI_MULTIFS_TEST_XFS=y +CONFIG_AI_MULTIFS_XFS_4K_4KS=y +CONFIG_AI_MULTIFS_XFS_16K_4KS=y +CONFIG_AI_MULTIFS_XFS_32K_4KS=y +CONFIG_AI_MULTIFS_XFS_64K_4KS=y + +# ext4 configurations +CONFIG_AI_MULTIFS_TEST_EXT4=y +CONFIG_AI_MULTIFS_EXT4_4K=y +CONFIG_AI_MULTIFS_EXT4_16K_BIGALLOC=y + +# btrfs configurations +CONFIG_AI_MULTIFS_TEST_BTRFS=y +CONFIG_AI_MULTIFS_BTRFS_DEFAULT=y + +# Standard filesystem configuration (for comparison) +CONFIG_AI_FILESYSTEM_XFS=y +CONFIG_AI_FILESYSTEM="xfs" +CONFIG_AI_FSTYPE="xfs" +CONFIG_AI_XFS_MKFS_OPTS="-f -s size=4096" +CONFIG_AI_XFS_MOUNT_OPTS="rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota" + +# Use distribution kernel (no kernel building) +# CONFIG_BOOTLINUX is not set + +# Memory configuration +CONFIG_LIBVIRT_MEM_MB=16384 + +# Disable A/B testing to use single baseline configuration +# CONFIG_KDEVOPS_BASELINE_AND_DEV is not set diff --git a/defconfigs/ai-milvus-multifs-extended b/defconfigs/ai-milvus-multifs-extended new file mode 100644 index 00000000..7886c8c4 --- /dev/null +++ b/defconfigs/ai-milvus-multifs-extended @@ -0,0 +1,108 @@ +# AI Extended Multi-Filesystem Performance Testing Configuration (Distro Kernel) +# This configuration enables testing AI workloads across multiple filesystem +# configurations including XFS (4k, 16k, 32k, 64k block sizes), ext4 (4k and 16k bigalloc), +# and btrfs (default profile) using the distribution kernel without A/B testing. + +# Base virtualization setup +CONFIG_LIBVIRT=y +CONFIG_LIBVIRT_MACHINE_TYPE_Q35=y +CONFIG_LIBVIRT_STORAGE_POOL_PATH="/opt/kdevops/libvirt" +CONFIG_LIBVIRT_ENABLE_LARGEIO=y +CONFIG_LIBVIRT_EXTRA_STORAGE_DRIVE_NVME=y +CONFIG_LIBVIRT_EXTRA_STORAGE_DRIVE_SIZE="50GiB" + +# Network configuration +CONFIG_LIBVIRT_ENABLE_BRIDGED_NETWORKING=y +CONFIG_LIBVIRT_NET_NAME="kdevops" + +# Host configuration +CONFIG_KDEVOPS_HOSTS_TEMPLATE="hosts.j2" +CONFIG_VAGRANT_NVME_DISK_SIZE="50GiB" + +# Base system requirements +CONFIG_WORKFLOWS=y +CONFIG_WORKFLOWS_TESTS=y +CONFIG_WORKFLOWS_LINUX_TESTS=y +CONFIG_WORKFLOWS_DEDICATED_WORKFLOW=y +CONFIG_KDEVOPS_WORKFLOW_DEDICATE_AI=y + +# AI Workflow Configuration +CONFIG_KDEVOPS_WORKFLOW_ENABLE_AI=y +CONFIG_AI_TESTS_VECTOR_DATABASE=y +CONFIG_AI_VECTOR_DB_MILVUS=y +CONFIG_AI_VECTOR_DB_MILVUS_DOCKER=y + +# Test Parameters (optimized for multi-fs testing) +CONFIG_AI_BENCHMARK_ITERATIONS=3 +CONFIG_AI_DATASET_1M=y +CONFIG_AI_VECTOR_DIM_128=y +CONFIG_AI_BENCHMARK_RUNTIME="180" +CONFIG_AI_BENCHMARK_WARMUP_TIME="30" + +# Query patterns +CONFIG_AI_BENCHMARK_QUERY_TOPK_1=y +CONFIG_AI_BENCHMARK_QUERY_TOPK_10=y + +# Batch sizes +CONFIG_AI_BENCHMARK_BATCH_1=y +CONFIG_AI_BENCHMARK_BATCH_10=y + +# Index configuration +CONFIG_AI_INDEX_HNSW=y +CONFIG_AI_INDEX_TYPE="HNSW" +CONFIG_AI_INDEX_HNSW_M=16 +CONFIG_AI_INDEX_HNSW_EF_CONSTRUCTION=200 +CONFIG_AI_INDEX_HNSW_EF=64 + +# Results and visualization +CONFIG_AI_BENCHMARK_RESULTS_DIR="/data/ai-benchmark" +CONFIG_AI_BENCHMARK_ENABLE_GRAPHING=y +CONFIG_AI_BENCHMARK_GRAPH_FORMAT="png" +CONFIG_AI_BENCHMARK_GRAPH_DPI=300 +CONFIG_AI_BENCHMARK_GRAPH_THEME="default" + +# Multi-filesystem testing configuration +CONFIG_AI_MULTIFS_ENABLE=y +CONFIG_AI_ENABLE_MULTIFS_TESTING=y +CONFIG_AI_MULTIFS_RESULTS_DIR="/data/ai-multifs-benchmark" + +# Enable dedicated Milvus storage with node-based filesystem +CONFIG_AI_MILVUS_STORAGE_ENABLE=y +CONFIG_AI_MILVUS_USE_NODE_FS=y +CONFIG_AI_MILVUS_DEVICE="/dev/disk/by-id/nvme-QEMU_NVMe_Ctrl_kdevops3" +CONFIG_AI_MILVUS_MOUNT_POINT="/data/milvus" + +# Extended XFS configurations (4k, 16k, 32k, 64k block sizes) +CONFIG_AI_MULTIFS_TEST_XFS=y +CONFIG_AI_MULTIFS_XFS_4K_4KS=y +CONFIG_AI_MULTIFS_XFS_16K_4KS=y +CONFIG_AI_MULTIFS_XFS_32K_4KS=y +CONFIG_AI_MULTIFS_XFS_64K_4KS=y + +# ext4 configurations +CONFIG_AI_MULTIFS_TEST_EXT4=y +CONFIG_AI_MULTIFS_EXT4_4K=y +CONFIG_AI_MULTIFS_EXT4_16K_BIGALLOC=y + +# btrfs configurations +CONFIG_AI_MULTIFS_TEST_BTRFS=y +CONFIG_AI_MULTIFS_BTRFS_DEFAULT=y + +# Standard filesystem configuration (for comparison) +CONFIG_AI_FILESYSTEM_XFS=y +CONFIG_AI_FILESYSTEM="xfs" +CONFIG_AI_FSTYPE="xfs" +CONFIG_AI_XFS_MKFS_OPTS="-f -s size=4096" +CONFIG_AI_XFS_MOUNT_OPTS="rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota" + +# Use distribution kernel (no kernel building) +# CONFIG_BOOTLINUX is not set + +# Memory configuration +CONFIG_LIBVIRT_MEM_MB=16384 + +# Baseline/dev testing setup +CONFIG_KDEVOPS_BASELINE_AND_DEV=y +# Build Linux +CONFIG_WORKFLOW_LINUX_CUSTOM=y +CONFIG_BOOTLINUX_AB_DIFFERENT_REF=y diff --git a/docs/ai/vector-databases/README.md b/docs/ai/vector-databases/README.md index 2a3955d7..0fdd204b 100644 --- a/docs/ai/vector-databases/README.md +++ b/docs/ai/vector-databases/README.md @@ -52,7 +52,6 @@ Vector databases heavily depend on storage performance. The workflow tests acros - **XFS**: Default for many production deployments - **ext4**: Traditional Linux filesystem - **btrfs**: Copy-on-write with compression support -- **ZFS**: Advanced features for data integrity ## Configuration Dimensions diff --git a/playbooks/ai_install.yml b/playbooks/ai_install.yml index 70b734e4..38e6671c 100644 --- a/playbooks/ai_install.yml +++ b/playbooks/ai_install.yml @@ -4,5 +4,11 @@ become: true become_user: root roles: + - role: ai_docker_storage + when: ai_docker_storage_enable | default(true) + tags: ['ai', 'docker', 'storage'] + - role: ai_milvus_storage + when: ai_milvus_storage_enable | default(false) + tags: ['ai', 'milvus', 'storage'] - role: milvus tags: ['ai', 'vector_db', 'milvus', 'install'] diff --git a/playbooks/ai_multifs.yml b/playbooks/ai_multifs.yml new file mode 100644 index 00000000..637f11f4 --- /dev/null +++ b/playbooks/ai_multifs.yml @@ -0,0 +1,24 @@ +--- +- hosts: baseline + become: yes + gather_facts: yes + vars: + ai_benchmark_results_dir: "{{ ai_multifs_results_dir | default('/data/ai-multifs-benchmark') }}" + roles: + - role: ai_multifs_setup + - role: ai_multifs_run + tasks: + - name: Final multi-filesystem testing summary + debug: + msg: | + Multi-filesystem AI benchmark testing completed! + + Results directory: {{ ai_multifs_results_dir }} + Comparison report: {{ ai_multifs_results_dir }}/comparison/multi_filesystem_comparison.html + + Individual filesystem results: + {% for config in ai_multifs_configurations %} + {% if config.enabled %} + - {{ config.name }}: {{ ai_multifs_results_dir }}/{{ config.name }}/ + {% endif %} + {% endfor %} diff --git a/playbooks/host_vars/debian13-ai-xfs-4k-4ks.yml b/playbooks/host_vars/debian13-ai-xfs-4k-4ks.yml deleted file mode 100644 index ffe9eb28..00000000 --- a/playbooks/host_vars/debian13-ai-xfs-4k-4ks.yml +++ /dev/null @@ -1,10 +0,0 @@ ---- -# XFS 4k block, 4k sector configuration -ai_docker_fstype: "xfs" -ai_docker_xfs_blocksize: 4096 -ai_docker_xfs_sectorsize: 4096 -ai_docker_xfs_mkfs_opts: "" -filesystem_type: "xfs" -filesystem_block_size: "4k-4ks" -ai_filesystem: "xfs" -ai_data_device_path: "/var/lib/docker" \ No newline at end of file diff --git a/playbooks/roles/ai_collect_results/files/analyze_results.py b/playbooks/roles/ai_collect_results/files/analyze_results.py index 3d11fb11..2dc4a1d6 100755 --- a/playbooks/roles/ai_collect_results/files/analyze_results.py +++ b/playbooks/roles/ai_collect_results/files/analyze_results.py @@ -226,6 +226,68 @@ class ResultsAnalyzer: return fs_info + def _extract_filesystem_config( + self, result: Dict[str, Any] + ) -> tuple[str, str, str]: + """Extract filesystem type and block size from result data. + Returns (fs_type, block_size, config_key)""" + filename = result.get("_file", "") + + # Primary: Extract filesystem type from filename (more reliable than JSON) + fs_type = "unknown" + block_size = "default" + + if "xfs" in filename: + fs_type = "xfs" + # Check larger sizes first to avoid substring matches + if "64k" in filename and "64k-" in filename: + block_size = "64k" + elif "32k" in filename and "32k-" in filename: + block_size = "32k" + elif "16k" in filename and "16k-" in filename: + block_size = "16k" + elif "4k" in filename and "4k-" in filename: + block_size = "4k" + elif "ext4" in filename: + fs_type = "ext4" + if "16k" in filename: + block_size = "16k" + elif "4k" in filename: + block_size = "4k" + elif "btrfs" in filename: + fs_type = "btrfs" + block_size = "default" + else: + # Fallback to JSON data if filename parsing fails + fs_type = result.get("filesystem", "unknown") + self.logger.warning( + f"Could not determine filesystem from filename {filename}, using JSON data: {fs_type}" + ) + + config_key = f"{fs_type}-{block_size}" if block_size != "default" else fs_type + return fs_type, block_size, config_key + + def _extract_node_info(self, result: Dict[str, Any]) -> tuple[str, bool]: + """Extract node hostname and determine if it's a dev node. + Returns (hostname, is_dev_node)""" + # Get hostname from system_info (preferred) or fall back to filename + system_info = result.get("system_info", {}) + hostname = system_info.get("hostname", "") + + # If no hostname in system_info, try extracting from filename + if not hostname: + filename = result.get("_file", "") + # Remove results_ prefix and .json suffix + hostname = filename.replace("results_", "").replace(".json", "") + # Remove iteration number if present (_1, _2, etc.) + if "_" in hostname and hostname.split("_")[-1].isdigit(): + hostname = "_".join(hostname.split("_")[:-1]) + + # Determine if this is a dev node + is_dev = hostname.endswith("-dev") + + return hostname, is_dev + def load_results(self) -> bool: """Load all result files from the results directory""" try: @@ -391,6 +453,8 @@ class ResultsAnalyzer: html.append( " .highlight { background-color: #fff3cd; padding: 10px; border-radius: 3px; }" ) + html.append(" .baseline-row { background-color: #e8f5e9; }") + html.append(" .dev-row { background-color: #e3f2fd; }") html.append(" ") html.append("") html.append("") @@ -486,26 +550,69 @@ class ResultsAnalyzer: else: html.append("

No storage device information available.

") - # Filesystem section - html.append("

🗂️ Filesystem Configuration

") - fs_info = self.system_info.get("filesystem_info", {}) - html.append(" ") - html.append( - " " - ) - html.append( - " " - ) - html.append( - " " - ) - html.append("
Filesystem Type" - + str(fs_info.get("filesystem_type", "Unknown")) - + "
Mount Point" - + str(fs_info.get("mount_point", "Unknown")) - + "
Mount Options" - + str(fs_info.get("mount_options", "Unknown")) - + "
") + # Node Configuration section - Extract from actual benchmark results + html.append("

🗂️ Node Configuration

") + + # Collect node and filesystem information from benchmark results + node_configs = {} + for result in self.results_data: + # Extract node information + hostname, is_dev = self._extract_node_info(result) + fs_type, block_size, config_key = self._extract_filesystem_config( + result + ) + + system_info = result.get("system_info", {}) + data_path = system_info.get("data_path", "/data/milvus") + mount_point = system_info.get("mount_point", "/data") + kernel_version = system_info.get("kernel_version", "unknown") + + if hostname not in node_configs: + node_configs[hostname] = { + "hostname": hostname, + "node_type": "Development" if is_dev else "Baseline", + "filesystem": fs_type, + "block_size": block_size, + "data_path": data_path, + "mount_point": mount_point, + "kernel": kernel_version, + "test_count": 0, + } + node_configs[hostname]["test_count"] += 1 + + if node_configs: + html.append(" ") + html.append( + " " + ) + # Sort nodes with baseline first, then dev + sorted_nodes = sorted( + node_configs.items(), + key=lambda x: (x[1]["node_type"] != "Baseline", x[0]), + ) + for hostname, config_info in sorted_nodes: + row_class = ( + "dev-row" + if config_info["node_type"] == "Development" + else "baseline-row" + ) + html.append(f" ") + html.append(f" ") + html.append(f" ") + html.append(f" ") + html.append(f" ") + html.append(f" ") + html.append( + f" " + ) + html.append(f" ") + html.append(f" ") + html.append(f" ") + html.append("
NodeTypeFilesystemBlock SizeData PathMount PointKernelTests
{hostname}{config_info['node_type']}{config_info['filesystem']}{config_info['block_size']}{config_info['data_path']}{config_info['mount_point']}{config_info['kernel']}{config_info['test_count']}
") + else: + html.append( + "

No node configuration data found in results.

" + ) html.append(" ") # Test Configuration Section @@ -551,92 +658,192 @@ class ResultsAnalyzer: html.append(" ") html.append(" ") - # Performance Results Section + # Performance Results Section - Per Node html.append("
") - html.append("

📊 Performance Results Summary

") + html.append("

📊 Performance Results by Node

") if self.results_data: - # Insert performance - insert_times = [ - r.get("insert_performance", {}).get("total_time_seconds", 0) - for r in self.results_data - ] - insert_rates = [ - r.get("insert_performance", {}).get("vectors_per_second", 0) - for r in self.results_data - ] - - if insert_times and any(t > 0 for t in insert_times): - html.append("

📈 Vector Insert Performance

") - html.append(" ") - html.append( - f" " - ) - html.append( - f" " + # Group results by node + node_performance = {} + + for result in self.results_data: + # Use node hostname as the grouping key + hostname, is_dev = self._extract_node_info(result) + fs_type, block_size, config_key = self._extract_filesystem_config( + result ) - html.append( - f" " - ) - html.append("
Average Insert Time{np.mean(insert_times):.2f} seconds
Average Insert Rate{np.mean(insert_rates):.2f} vectors/sec
Insert Rate Range{np.min(insert_rates):.2f} - {np.max(insert_rates):.2f} vectors/sec
") - # Index performance - index_times = [ - r.get("index_performance", {}).get("creation_time_seconds", 0) - for r in self.results_data - ] - if index_times and any(t > 0 for t in index_times): - html.append("

🔗 Index Creation Performance

") - html.append(" ") - html.append( - f" " + if hostname not in node_performance: + node_performance[hostname] = { + "hostname": hostname, + "node_type": "Development" if is_dev else "Baseline", + "insert_rates": [], + "insert_times": [], + "index_times": [], + "query_performance": {}, + "filesystem": fs_type, + "block_size": block_size, + } + + # Add insert performance + insert_perf = result.get("insert_performance", {}) + if insert_perf: + rate = insert_perf.get("vectors_per_second", 0) + time = insert_perf.get("total_time_seconds", 0) + if rate > 0: + node_performance[hostname]["insert_rates"].append(rate) + if time > 0: + node_performance[hostname]["insert_times"].append(time) + + # Add index performance + index_perf = result.get("index_performance", {}) + if index_perf: + time = index_perf.get("creation_time_seconds", 0) + if time > 0: + node_performance[hostname]["index_times"].append(time) + + # Collect query performance (use first result for each node) + query_perf = result.get("query_performance", {}) + if ( + query_perf + and not node_performance[hostname]["query_performance"] + ): + node_performance[hostname]["query_performance"] = query_perf + + # Display results for each node, sorted with baseline first + sorted_nodes = sorted( + node_performance.items(), + key=lambda x: (x[1]["node_type"] != "Baseline", x[0]), + ) + for hostname, perf_data in sorted_nodes: + node_type_badge = ( + "🔵" if perf_data["node_type"] == "Development" else "🟢" ) html.append( - f" " + f"

{node_type_badge} {hostname} ({perf_data['node_type']})

" ) - html.append("
Average Index Creation Time{np.mean(index_times):.2f} seconds
Index Time Range{np.min(index_times):.2f} - {np.max(index_times):.2f} seconds
") - - # Query performance - html.append("

🔍 Query Performance

") - first_query_perf = self.results_data[0].get("query_performance", {}) - if first_query_perf: - html.append(" ") html.append( - " " + f"

Filesystem: {perf_data['filesystem']}, Block Size: {perf_data['block_size']}

" ) - for topk, topk_data in first_query_perf.items(): - for batch, batch_data in topk_data.items(): - qps = batch_data.get("queries_per_second", 0) - avg_time = batch_data.get("average_time_seconds", 0) * 1000 - - # Color coding for performance - qps_class = "" - if qps > 1000: - qps_class = "performance-good" - elif qps > 100: - qps_class = "performance-warning" - else: - qps_class = "performance-poor" - - html.append(f" ") - html.append( - f" " - ) - html.append( - f" " - ) - html.append( - f" " - ) - html.append(f" ") - html.append(f" ") + # Insert performance + insert_rates = perf_data["insert_rates"] + if insert_rates: + html.append("

📈 Vector Insert Performance

") + html.append("
Query TypeBatch SizeQPSAvg Latency (ms)
{topk.replace('topk_', 'Top-')}{batch.replace('batch_', 'Batch ')}{qps:.2f}{avg_time:.2f}
") + html.append( + f" " + ) + html.append( + f" " + ) + html.append( + f" " + ) + html.append("
Average Insert Rate{np.mean(insert_rates):.2f} vectors/sec
Insert Rate Range{np.min(insert_rates):.2f} - {np.max(insert_rates):.2f} vectors/sec
Test Iterations{len(insert_rates)}
") + + # Index performance + index_times = perf_data["index_times"] + if index_times: + html.append("

🔗 Index Creation Performance

") + html.append(" ") + html.append( + f" " + ) + html.append( + f" " + ) + html.append("
Average Index Creation Time{np.mean(index_times):.3f} seconds
Index Time Range{np.min(index_times):.3f} - {np.max(index_times):.3f} seconds
") + + # Query performance + query_perf = perf_data["query_performance"] + if query_perf: + html.append("

🔍 Query Performance

") + html.append(" ") + html.append( + " " + ) - html.append("
Query TypeBatch SizeQPSAvg Latency (ms)
") + for topk, topk_data in query_perf.items(): + for batch, batch_data in topk_data.items(): + qps = batch_data.get("queries_per_second", 0) + avg_time = ( + batch_data.get("average_time_seconds", 0) * 1000 + ) + + # Color coding for performance + qps_class = "" + if qps > 1000: + qps_class = "performance-good" + elif qps > 100: + qps_class = "performance-warning" + else: + qps_class = "performance-poor" + + html.append(f" ") + html.append( + f" {topk.replace('topk_', 'Top-')}" + ) + html.append( + f" {batch.replace('batch_', 'Batch ')}" + ) + html.append( + f" {qps:.2f}" + ) + html.append(f" {avg_time:.2f}") + html.append(f" ") + html.append(" ") + + html.append("
") # Add spacing between configurations - html.append("
") + html.append(" ") # Footer + # Performance Graphs Section + html.append("
") + html.append("

📈 Performance Visualizations

") + html.append( + "

The following graphs provide visual analysis of the benchmark results across all tested filesystem configurations:

" + ) + html.append("
    ") + html.append( + "
  • Insert Performance: Shows vector insertion rates and times for each filesystem configuration
  • " + ) + html.append( + "
  • Query Performance: Displays query performance heatmaps for different Top-K and batch sizes
  • " + ) + html.append( + "
  • Index Performance: Compares index creation times across filesystems
  • " + ) + html.append( + "
  • Performance Matrix: Comprehensive comparison matrix of all metrics
  • " + ) + html.append( + "
  • Filesystem Comparison: Side-by-side comparison of filesystem performance
  • " + ) + html.append("
") + html.append( + "

Note: Graphs are generated as separate PNG files in the same directory as this report.

" + ) + html.append("
") + html.append( + " Insert Performance" + ) + html.append( + " Query Performance" + ) + html.append( + " Index Performance" + ) + html.append( + " Performance Matrix" + ) + html.append( + " Filesystem Comparison" + ) + html.append("
") + html.append("
") + html.append("
") html.append("

📝 Notes

") html.append("
    ") @@ -661,10 +868,11 @@ class ResultsAnalyzer: return "\n".join(html) except Exception as e: - self.logger.error(f"Error generating HTML report: {e}") - return ( - f"

    Error generating HTML report: {e}

    " - ) + import traceback + + tb = traceback.format_exc() + self.logger.error(f"Error generating HTML report: {e}\n{tb}") + return f"

    Error generating HTML report: {e}

    {tb}
    " def generate_graphs(self) -> bool: """Generate performance visualization graphs""" @@ -691,6 +899,9 @@ class ResultsAnalyzer: # Graph 4: Performance Comparison Matrix self._plot_performance_matrix() + # Graph 5: Multi-filesystem Comparison (if applicable) + self._plot_filesystem_comparison() + self.logger.info("Graphs generated successfully") return True @@ -699,34 +910,188 @@ class ResultsAnalyzer: return False def _plot_insert_performance(self): - """Plot insert performance metrics""" - fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6)) + """Plot insert performance metrics with node differentiation""" + # Group data by node + node_performance = {} - # Extract insert data - iterations = [] - insert_rates = [] - insert_times = [] + for result in self.results_data: + hostname, is_dev = self._extract_node_info(result) + + if hostname not in node_performance: + node_performance[hostname] = { + "insert_rates": [], + "insert_times": [], + "iterations": [], + "is_dev": is_dev, + } - for i, result in enumerate(self.results_data): insert_perf = result.get("insert_performance", {}) if insert_perf: - iterations.append(i + 1) - insert_rates.append(insert_perf.get("vectors_per_second", 0)) - insert_times.append(insert_perf.get("total_time_seconds", 0)) - - # Plot insert rate - ax1.plot(iterations, insert_rates, "b-o", linewidth=2, markersize=6) - ax1.set_xlabel("Iteration") - ax1.set_ylabel("Vectors/Second") - ax1.set_title("Vector Insert Rate Performance") - ax1.grid(True, alpha=0.3) - - # Plot insert time - ax2.plot(iterations, insert_times, "r-o", linewidth=2, markersize=6) - ax2.set_xlabel("Iteration") - ax2.set_ylabel("Total Time (seconds)") - ax2.set_title("Vector Insert Time Performance") - ax2.grid(True, alpha=0.3) + node_performance[hostname]["insert_rates"].append( + insert_perf.get("vectors_per_second", 0) + ) + node_performance[hostname]["insert_times"].append( + insert_perf.get("total_time_seconds", 0) + ) + node_performance[hostname]["iterations"].append( + len(node_performance[hostname]["insert_rates"]) + ) + + # Check if we have multiple nodes + if len(node_performance) > 1: + # Multi-node mode: separate lines for each node + fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(18, 7)) + + # Sort nodes with baseline first, then dev + sorted_nodes = sorted( + node_performance.items(), key=lambda x: (x[1]["is_dev"], x[0]) + ) + + # Create color palettes for baseline and dev nodes + baseline_colors = [ + "#2E7D32", + "#43A047", + "#66BB6A", + "#81C784", + "#A5D6A7", + "#C8E6C9", + ] # Greens + dev_colors = [ + "#0D47A1", + "#1565C0", + "#1976D2", + "#1E88E5", + "#2196F3", + "#42A5F5", + "#64B5F6", + ] # Blues + + # Additional colors if needed + extra_colors = [ + "#E65100", + "#F57C00", + "#FF9800", + "#FFB300", + "#FFC107", + "#FFCA28", + ] # Oranges + + # Line styles to cycle through + line_styles = ["-", "--", "-.", ":"] + markers = ["o", "s", "^", "v", "D", "p", "*", "h"] + + baseline_idx = 0 + dev_idx = 0 + + # Use different colors and styles for each node + for idx, (hostname, perf_data) in enumerate(sorted_nodes): + if not perf_data["insert_rates"]: + continue + + # Choose color and style based on node type and index + if perf_data["is_dev"]: + # Development nodes - blues + color = dev_colors[dev_idx % len(dev_colors)] + linestyle = line_styles[ + (dev_idx // len(dev_colors)) % len(line_styles) + ] + marker = markers[4 + (dev_idx % 4)] # Use markers 4-7 for dev + label = f"{hostname} (Dev)" + dev_idx += 1 + else: + # Baseline nodes - greens + color = baseline_colors[baseline_idx % len(baseline_colors)] + linestyle = line_styles[ + (baseline_idx // len(baseline_colors)) % len(line_styles) + ] + marker = markers[ + baseline_idx % 4 + ] # Use first 4 markers for baseline + label = f"{hostname} (Baseline)" + baseline_idx += 1 + + iterations = list(range(1, len(perf_data["insert_rates"]) + 1)) + + # Plot insert rate with alpha for better visibility + ax1.plot( + iterations, + perf_data["insert_rates"], + color=color, + linestyle=linestyle, + marker=marker, + linewidth=1.5, + markersize=5, + label=label, + alpha=0.8, + ) + + # Plot insert time + ax2.plot( + iterations, + perf_data["insert_times"], + color=color, + linestyle=linestyle, + marker=marker, + linewidth=1.5, + markersize=5, + label=label, + alpha=0.8, + ) + + ax1.set_xlabel("Iteration") + ax1.set_ylabel("Vectors/Second") + ax1.set_title("Milvus Insert Rate by Node") + ax1.grid(True, alpha=0.3) + # Position legend outside plot area for better visibility with many nodes + ax1.legend(bbox_to_anchor=(1.05, 1), loc="upper left", fontsize=7, ncol=1) + + ax2.set_xlabel("Iteration") + ax2.set_ylabel("Total Time (seconds)") + ax2.set_title("Milvus Insert Time by Node") + ax2.grid(True, alpha=0.3) + # Position legend outside plot area for better visibility with many nodes + ax2.legend(bbox_to_anchor=(1.05, 1), loc="upper left", fontsize=7, ncol=1) + + plt.suptitle( + "Insert Performance Analysis: Baseline vs Development", + fontsize=14, + y=1.02, + ) + else: + # Single node mode: original behavior + fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6)) + + # Extract insert data from single node + hostname = list(node_performance.keys())[0] if node_performance else None + if hostname: + perf_data = node_performance[hostname] + iterations = list(range(1, len(perf_data["insert_rates"]) + 1)) + + # Plot insert rate + ax1.plot( + iterations, + perf_data["insert_rates"], + "b-o", + linewidth=2, + markersize=6, + ) + ax1.set_xlabel("Iteration") + ax1.set_ylabel("Vectors/Second") + ax1.set_title(f"Vector Insert Rate Performance - {hostname}") + ax1.grid(True, alpha=0.3) + + # Plot insert time + ax2.plot( + iterations, + perf_data["insert_times"], + "r-o", + linewidth=2, + markersize=6, + ) + ax2.set_xlabel("Iteration") + ax2.set_ylabel("Total Time (seconds)") + ax2.set_title(f"Vector Insert Time Performance - {hostname}") + ax2.grid(True, alpha=0.3) plt.tight_layout() output_file = os.path.join( @@ -739,52 +1104,110 @@ class ResultsAnalyzer: plt.close() def _plot_query_performance(self): - """Plot query performance metrics""" + """Plot query performance metrics comparing baseline vs dev nodes""" if not self.results_data: return - # Collect query performance data - query_data = [] + # Group data by filesystem configuration + fs_groups = {} for result in self.results_data: + hostname, is_dev = self._extract_node_info(result) + fs_type, block_size, config_key = self._extract_filesystem_config(result) + + if config_key not in fs_groups: + fs_groups[config_key] = {"baseline": [], "dev": []} + query_perf = result.get("query_performance", {}) - for topk, topk_data in query_perf.items(): - for batch, batch_data in topk_data.items(): - query_data.append( - { - "topk": topk.replace("topk_", ""), - "batch": batch.replace("batch_", ""), - "qps": batch_data.get("queries_per_second", 0), - "avg_time": batch_data.get("average_time_seconds", 0) - * 1000, # Convert to ms - } - ) + if query_perf: + node_type = "dev" if is_dev else "baseline" + for topk, topk_data in query_perf.items(): + for batch, batch_data in topk_data.items(): + fs_groups[config_key][node_type].append( + { + "hostname": hostname, + "topk": topk.replace("topk_", ""), + "batch": batch.replace("batch_", ""), + "qps": batch_data.get("queries_per_second", 0), + "avg_time": batch_data.get("average_time_seconds", 0) + * 1000, + } + ) - if not query_data: + if not fs_groups: return - df = pd.DataFrame(query_data) + # Create subplots for each filesystem config + n_configs = len(fs_groups) + fig_height = max(8, 4 * n_configs) + fig, axes = plt.subplots(n_configs, 2, figsize=(16, fig_height)) - # Create subplots - fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6)) + if n_configs == 1: + axes = axes.reshape(1, -1) - # QPS heatmap - qps_pivot = df.pivot_table( - values="qps", index="topk", columns="batch", aggfunc="mean" - ) - sns.heatmap(qps_pivot, annot=True, fmt=".1f", ax=ax1, cmap="YlOrRd") - ax1.set_title("Queries Per Second (QPS)") - ax1.set_xlabel("Batch Size") - ax1.set_ylabel("Top-K") - - # Latency heatmap - latency_pivot = df.pivot_table( - values="avg_time", index="topk", columns="batch", aggfunc="mean" - ) - sns.heatmap(latency_pivot, annot=True, fmt=".1f", ax=ax2, cmap="YlOrRd") - ax2.set_title("Average Query Latency (ms)") - ax2.set_xlabel("Batch Size") - ax2.set_ylabel("Top-K") + for idx, (config_key, data) in enumerate(sorted(fs_groups.items())): + # Create DataFrames for baseline and dev + baseline_df = ( + pd.DataFrame(data["baseline"]) if data["baseline"] else pd.DataFrame() + ) + dev_df = pd.DataFrame(data["dev"]) if data["dev"] else pd.DataFrame() + + # Baseline QPS heatmap + ax_base = axes[idx][0] + if not baseline_df.empty: + baseline_pivot = baseline_df.pivot_table( + values="qps", index="topk", columns="batch", aggfunc="mean" + ) + sns.heatmap( + baseline_pivot, + annot=True, + fmt=".1f", + ax=ax_base, + cmap="Greens", + cbar_kws={"label": "QPS"}, + ) + ax_base.set_title(f"{config_key.upper()} - Baseline QPS") + ax_base.set_xlabel("Batch Size") + ax_base.set_ylabel("Top-K") + else: + ax_base.text( + 0.5, + 0.5, + f"No baseline data for {config_key}", + ha="center", + va="center", + transform=ax_base.transAxes, + ) + ax_base.set_title(f"{config_key.upper()} - Baseline QPS") + # Dev QPS heatmap + ax_dev = axes[idx][1] + if not dev_df.empty: + dev_pivot = dev_df.pivot_table( + values="qps", index="topk", columns="batch", aggfunc="mean" + ) + sns.heatmap( + dev_pivot, + annot=True, + fmt=".1f", + ax=ax_dev, + cmap="Blues", + cbar_kws={"label": "QPS"}, + ) + ax_dev.set_title(f"{config_key.upper()} - Development QPS") + ax_dev.set_xlabel("Batch Size") + ax_dev.set_ylabel("Top-K") + else: + ax_dev.text( + 0.5, + 0.5, + f"No dev data for {config_key}", + ha="center", + va="center", + transform=ax_dev.transAxes, + ) + ax_dev.set_title(f"{config_key.upper()} - Development QPS") + + plt.suptitle("Query Performance: Baseline vs Development", fontsize=16, y=1.02) plt.tight_layout() output_file = os.path.join( self.output_dir, @@ -796,32 +1219,101 @@ class ResultsAnalyzer: plt.close() def _plot_index_performance(self): - """Plot index creation performance""" - iterations = [] - index_times = [] + """Plot index creation performance comparing baseline vs dev""" + # Group by filesystem configuration + fs_groups = {} + + for result in self.results_data: + hostname, is_dev = self._extract_node_info(result) + fs_type, block_size, config_key = self._extract_filesystem_config(result) + + if config_key not in fs_groups: + fs_groups[config_key] = {"baseline": [], "dev": []} - for i, result in enumerate(self.results_data): index_perf = result.get("index_performance", {}) if index_perf: - iterations.append(i + 1) - index_times.append(index_perf.get("creation_time_seconds", 0)) + time = index_perf.get("creation_time_seconds", 0) + if time > 0: + node_type = "dev" if is_dev else "baseline" + fs_groups[config_key][node_type].append(time) - if not index_times: + if not fs_groups: return - plt.figure(figsize=(10, 6)) - plt.bar(iterations, index_times, alpha=0.7, color="green") - plt.xlabel("Iteration") - plt.ylabel("Index Creation Time (seconds)") - plt.title("Index Creation Performance") - plt.grid(True, alpha=0.3) - - # Add average line - avg_time = np.mean(index_times) - plt.axhline( - y=avg_time, color="red", linestyle="--", label=f"Average: {avg_time:.2f}s" + # Create comparison bar chart + fig, ax = plt.subplots(figsize=(14, 8)) + + configs = sorted(fs_groups.keys()) + x = np.arange(len(configs)) + width = 0.35 + + # Calculate averages for each config + baseline_avgs = [] + dev_avgs = [] + baseline_stds = [] + dev_stds = [] + + for config in configs: + baseline_times = fs_groups[config]["baseline"] + dev_times = fs_groups[config]["dev"] + + baseline_avgs.append(np.mean(baseline_times) if baseline_times else 0) + dev_avgs.append(np.mean(dev_times) if dev_times else 0) + baseline_stds.append(np.std(baseline_times) if baseline_times else 0) + dev_stds.append(np.std(dev_times) if dev_times else 0) + + # Create bars + bars1 = ax.bar( + x - width / 2, + baseline_avgs, + width, + yerr=baseline_stds, + label="Baseline", + color="#4CAF50", + capsize=5, + ) + bars2 = ax.bar( + x + width / 2, + dev_avgs, + width, + yerr=dev_stds, + label="Development", + color="#2196F3", + capsize=5, ) - plt.legend() + + # Add value labels on bars + for bar, val in zip(bars1, baseline_avgs): + if val > 0: + height = bar.get_height() + ax.text( + bar.get_x() + bar.get_width() / 2.0, + height, + f"{val:.3f}s", + ha="center", + va="bottom", + fontsize=9, + ) + + for bar, val in zip(bars2, dev_avgs): + if val > 0: + height = bar.get_height() + ax.text( + bar.get_x() + bar.get_width() / 2.0, + height, + f"{val:.3f}s", + ha="center", + va="bottom", + fontsize=9, + ) + + ax.set_xlabel("Filesystem Configuration", fontsize=12) + ax.set_ylabel("Index Creation Time (seconds)", fontsize=12) + ax.set_title("Index Creation Performance: Baseline vs Development", fontsize=14) + ax.set_xticks(x) + ax.set_xticklabels([c.upper() for c in configs], rotation=45, ha="right") + ax.legend(loc="upper right") + ax.grid(True, alpha=0.3, axis="y") output_file = os.path.join( self.output_dir, @@ -833,61 +1325,148 @@ class ResultsAnalyzer: plt.close() def _plot_performance_matrix(self): - """Plot comprehensive performance comparison matrix""" + """Plot performance comparison matrix for each filesystem config""" if len(self.results_data) < 2: return - # Extract key metrics for comparison - metrics = [] - for i, result in enumerate(self.results_data): + # Group by filesystem configuration + fs_metrics = {} + + for result in self.results_data: + hostname, is_dev = self._extract_node_info(result) + fs_type, block_size, config_key = self._extract_filesystem_config(result) + + if config_key not in fs_metrics: + fs_metrics[config_key] = {"baseline": [], "dev": []} + + # Collect metrics insert_perf = result.get("insert_performance", {}) index_perf = result.get("index_performance", {}) + query_perf = result.get("query_performance", {}) metric = { - "iteration": i + 1, + "hostname": hostname, "insert_rate": insert_perf.get("vectors_per_second", 0), "index_time": index_perf.get("creation_time_seconds", 0), } - # Add query metrics - query_perf = result.get("query_performance", {}) + # Get representative query performance (topk_10, batch_1) if "topk_10" in query_perf and "batch_1" in query_perf["topk_10"]: metric["query_qps"] = query_perf["topk_10"]["batch_1"].get( "queries_per_second", 0 ) + else: + metric["query_qps"] = 0 - metrics.append(metric) + node_type = "dev" if is_dev else "baseline" + fs_metrics[config_key][node_type].append(metric) - df = pd.DataFrame(metrics) + if not fs_metrics: + return - # Normalize metrics for comparison - numeric_cols = ["insert_rate", "index_time", "query_qps"] - for col in numeric_cols: - if col in df.columns: - df[f"{col}_norm"] = (df[col] - df[col].min()) / ( - df[col].max() - df[col].min() + 1e-6 - ) + # Create subplots for each filesystem + n_configs = len(fs_metrics) + n_cols = min(3, n_configs) + n_rows = (n_configs + n_cols - 1) // n_cols + + fig, axes = plt.subplots(n_rows, n_cols, figsize=(n_cols * 6, n_rows * 5)) + if n_rows == 1 and n_cols == 1: + axes = [[axes]] + elif n_rows == 1: + axes = [axes] + elif n_cols == 1: + axes = [[ax] for ax in axes] + + for idx, (config_key, data) in enumerate(sorted(fs_metrics.items())): + row = idx // n_cols + col = idx % n_cols + ax = axes[row][col] + + # Calculate averages + baseline_metrics = data["baseline"] + dev_metrics = data["dev"] + + if baseline_metrics and dev_metrics: + categories = ["Insert Rate\n(vec/s)", "Index Time\n(s)", "Query QPS"] + + baseline_avg = [ + np.mean([m["insert_rate"] for m in baseline_metrics]), + np.mean([m["index_time"] for m in baseline_metrics]), + np.mean([m["query_qps"] for m in baseline_metrics]), + ] - # Create radar chart - fig, ax = plt.subplots(figsize=(10, 8), subplot_kw=dict(projection="polar")) + dev_avg = [ + np.mean([m["insert_rate"] for m in dev_metrics]), + np.mean([m["index_time"] for m in dev_metrics]), + np.mean([m["query_qps"] for m in dev_metrics]), + ] - angles = np.linspace(0, 2 * np.pi, len(numeric_cols), endpoint=False).tolist() - angles += angles[:1] # Complete the circle + x = np.arange(len(categories)) + width = 0.35 - for i, row in df.iterrows(): - values = [row.get(f"{col}_norm", 0) for col in numeric_cols] - values += values[:1] # Complete the circle + bars1 = ax.bar( + x - width / 2, + baseline_avg, + width, + label="Baseline", + color="#4CAF50", + ) + bars2 = ax.bar( + x + width / 2, dev_avg, width, label="Development", color="#2196F3" + ) - ax.plot( - angles, values, "o-", linewidth=2, label=f'Iteration {row["iteration"]}' - ) - ax.fill(angles, values, alpha=0.25) + # Add value labels + for bar, val in zip(bars1, baseline_avg): + height = bar.get_height() + ax.text( + bar.get_x() + bar.get_width() / 2.0, + height, + f"{val:.0f}" if val > 100 else f"{val:.2f}", + ha="center", + va="bottom", + fontsize=8, + ) - ax.set_xticks(angles[:-1]) - ax.set_xticklabels(["Insert Rate", "Index Time (inv)", "Query QPS"]) - ax.set_ylim(0, 1) - ax.set_title("Performance Comparison Matrix (Normalized)", y=1.08) - ax.legend(loc="upper right", bbox_to_anchor=(1.3, 1.0)) + for bar, val in zip(bars2, dev_avg): + height = bar.get_height() + ax.text( + bar.get_x() + bar.get_width() / 2.0, + height, + f"{val:.0f}" if val > 100 else f"{val:.2f}", + ha="center", + va="bottom", + fontsize=8, + ) + + ax.set_xlabel("Metrics") + ax.set_ylabel("Value") + ax.set_title(f"{config_key.upper()}") + ax.set_xticks(x) + ax.set_xticklabels(categories) + ax.legend(loc="upper right", fontsize=8) + ax.grid(True, alpha=0.3, axis="y") + else: + ax.text( + 0.5, + 0.5, + f"Insufficient data\nfor {config_key}", + ha="center", + va="center", + transform=ax.transAxes, + ) + ax.set_title(f"{config_key.upper()}") + + # Hide unused subplots + for idx in range(n_configs, n_rows * n_cols): + row = idx // n_cols + col = idx % n_cols + axes[row][col].set_visible(False) + + plt.suptitle( + "Performance Comparison Matrix: Baseline vs Development", + fontsize=14, + y=1.02, + ) output_file = os.path.join( self.output_dir, @@ -898,6 +1477,149 @@ class ResultsAnalyzer: ) plt.close() + def _plot_filesystem_comparison(self): + """Plot node performance comparison chart""" + if len(self.results_data) < 2: + return + + # Group results by node + node_performance = {} + + for result in self.results_data: + hostname, is_dev = self._extract_node_info(result) + + if hostname not in node_performance: + node_performance[hostname] = { + "insert_rates": [], + "index_times": [], + "query_qps": [], + "is_dev": is_dev, + } + + # Collect metrics + insert_perf = result.get("insert_performance", {}) + if insert_perf: + node_performance[hostname]["insert_rates"].append( + insert_perf.get("vectors_per_second", 0) + ) + + index_perf = result.get("index_performance", {}) + if index_perf: + node_performance[hostname]["index_times"].append( + index_perf.get("creation_time_seconds", 0) + ) + + # Get top-10 batch-1 query performance as representative + query_perf = result.get("query_performance", {}) + if "topk_10" in query_perf and "batch_1" in query_perf["topk_10"]: + qps = query_perf["topk_10"]["batch_1"].get("queries_per_second", 0) + node_performance[hostname]["query_qps"].append(qps) + + # Only create comparison if we have multiple nodes + if len(node_performance) > 1: + # Calculate averages + node_metrics = {} + for hostname, perf_data in node_performance.items(): + node_metrics[hostname] = { + "avg_insert_rate": ( + np.mean(perf_data["insert_rates"]) + if perf_data["insert_rates"] + else 0 + ), + "avg_index_time": ( + np.mean(perf_data["index_times"]) + if perf_data["index_times"] + else 0 + ), + "avg_query_qps": ( + np.mean(perf_data["query_qps"]) if perf_data["query_qps"] else 0 + ), + "is_dev": perf_data["is_dev"], + } + + # Create comparison bar chart with more space + fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(24, 8)) + + # Sort nodes with baseline first + sorted_nodes = sorted( + node_metrics.items(), key=lambda x: (x[1]["is_dev"], x[0]) + ) + node_names = [hostname for hostname, _ in sorted_nodes] + + # Use different colors for baseline vs dev + colors = [ + "#4CAF50" if not node_metrics[hostname]["is_dev"] else "#2196F3" + for hostname in node_names + ] + + # Add labels for clarity + labels = [ + f"{hostname}\n({'Dev' if node_metrics[hostname]['is_dev'] else 'Baseline'})" + for hostname in node_names + ] + + # Insert rate comparison + insert_rates = [ + node_metrics[hostname]["avg_insert_rate"] for hostname in node_names + ] + bars1 = ax1.bar(labels, insert_rates, color=colors) + ax1.set_title("Average Milvus Insert Rate by Node") + ax1.set_ylabel("Vectors/Second") + # Rotate labels for better readability + ax1.set_xticklabels(labels, rotation=45, ha="right", fontsize=8) + + # Index time comparison (lower is better) + index_times = [ + node_metrics[hostname]["avg_index_time"] for hostname in node_names + ] + bars2 = ax2.bar(labels, index_times, color=colors) + ax2.set_title("Average Milvus Index Time by Node") + ax2.set_ylabel("Seconds (Lower is Better)") + ax2.set_xticklabels(labels, rotation=45, ha="right", fontsize=8) + + # Query QPS comparison + query_qps = [ + node_metrics[hostname]["avg_query_qps"] for hostname in node_names + ] + bars3 = ax3.bar(labels, query_qps, color=colors) + ax3.set_title("Average Milvus Query QPS by Node") + ax3.set_ylabel("Queries/Second") + ax3.set_xticklabels(labels, rotation=45, ha="right", fontsize=8) + + # Add value labels on bars + for bars, values in [ + (bars1, insert_rates), + (bars2, index_times), + (bars3, query_qps), + ]: + for bar, value in zip(bars, values): + height = bar.get_height() + ax = bar.axes + ax.text( + bar.get_x() + bar.get_width() / 2.0, + height + height * 0.01, + f"{value:.1f}", + ha="center", + va="bottom", + fontsize=10, + ) + + plt.suptitle( + "Milvus Performance Comparison: Baseline vs Development Nodes", + fontsize=16, + y=1.02, + ) + plt.tight_layout() + + output_file = os.path.join( + self.output_dir, + f"filesystem_comparison.{self.config.get('graph_format', 'png')}", + ) + plt.savefig( + output_file, dpi=self.config.get("graph_dpi", 300), bbox_inches="tight" + ) + plt.close() + def analyze(self) -> bool: """Run complete analysis""" self.logger.info("Starting results analysis...") diff --git a/playbooks/roles/ai_collect_results/files/generate_better_graphs.py b/playbooks/roles/ai_collect_results/files/generate_better_graphs.py index 645bac9e..b3681ff9 100755 --- a/playbooks/roles/ai_collect_results/files/generate_better_graphs.py +++ b/playbooks/roles/ai_collect_results/files/generate_better_graphs.py @@ -29,17 +29,18 @@ def extract_filesystem_from_filename(filename): if "_" in node_name: parts = node_name.split("_") node_name = "_".join(parts[:-1]) # Remove last part (iteration) - + # Extract filesystem type from node name if "-xfs-" in node_name: return "xfs" elif "-ext4-" in node_name: - return "ext4" + return "ext4" elif "-btrfs-" in node_name: return "btrfs" - + return "unknown" + def extract_node_config_from_filename(filename): """Extract detailed node configuration from filename""" # Expected format: results_debian13-ai-xfs-4k-4ks_1.json @@ -50,14 +51,15 @@ def extract_node_config_from_filename(filename): if "_" in node_name: parts = node_name.split("_") node_name = "_".join(parts[:-1]) # Remove last part (iteration) - + # Remove -dev suffix if present node_name = node_name.replace("-dev", "") - + return node_name.replace("debian13-ai-", "") - + return "unknown" + def detect_filesystem(): """Detect the filesystem type of /data on test nodes""" # This is now a fallback - we primarily use filename-based detection @@ -104,7 +106,7 @@ def load_results(results_dir): # Extract node type from filename filename = os.path.basename(json_file) data["filename"] = filename - + # Extract filesystem type and config from filename data["filesystem"] = extract_filesystem_from_filename(filename) data["node_config"] = extract_node_config_from_filename(filename) diff --git a/playbooks/roles/ai_collect_results/files/generate_graphs.py b/playbooks/roles/ai_collect_results/files/generate_graphs.py index 53a835e2..fafc62bf 100755 --- a/playbooks/roles/ai_collect_results/files/generate_graphs.py +++ b/playbooks/roles/ai_collect_results/files/generate_graphs.py @@ -9,7 +9,6 @@ import sys import glob import numpy as np import matplotlib - matplotlib.use("Agg") # Use non-interactive backend import matplotlib.pyplot as plt from datetime import datetime @@ -17,68 +16,78 @@ from pathlib import Path from collections import defaultdict +def _extract_filesystem_config(result): + """Extract filesystem type and block size from result data. + Returns (fs_type, block_size, config_key)""" + filename = result.get("_file", "") + + # Primary: Extract filesystem type from filename (more reliable than JSON) + fs_type = "unknown" + block_size = "default" + + if "xfs" in filename: + fs_type = "xfs" + # Check larger sizes first to avoid substring matches + if "64k" in filename and "64k-" in filename: + block_size = "64k" + elif "32k" in filename and "32k-" in filename: + block_size = "32k" + elif "16k" in filename and "16k-" in filename: + block_size = "16k" + elif "4k" in filename and "4k-" in filename: + block_size = "4k" + elif "ext4" in filename: + fs_type = "ext4" + if "4k" in filename and "4k-" in filename: + block_size = "4k" + elif "16k" in filename and "16k-" in filename: + block_size = "16k" + elif "btrfs" in filename: + fs_type = "btrfs" + + # Fallback: Check JSON data if filename parsing failed + if fs_type == "unknown": + fs_type = result.get("filesystem", "unknown") + + # Create descriptive config key + config_key = f"{fs_type}-{block_size}" if block_size != "default" else fs_type + return fs_type, block_size, config_key + + +def _extract_node_info(result): + """Extract node hostname and determine if it's a dev node. + Returns (hostname, is_dev_node)""" + # Get hostname from system_info (preferred) or fall back to filename + system_info = result.get("system_info", {}) + hostname = system_info.get("hostname", "") + + # If no hostname in system_info, try extracting from filename + if not hostname: + filename = result.get("_file", "") + # Remove results_ prefix and .json suffix + hostname = filename.replace("results_", "").replace(".json", "") + # Remove iteration number if present (_1, _2, etc.) + if "_" in hostname and hostname.split("_")[-1].isdigit(): + hostname = "_".join(hostname.split("_")[:-1]) + + # Determine if this is a dev node + is_dev = hostname.endswith("-dev") + + return hostname, is_dev + + def load_results(results_dir): """Load all JSON result files from the directory""" results = [] - json_files = glob.glob(os.path.join(results_dir, "*.json")) + # Only load results_*.json files, not consolidated or other JSON files + json_files = glob.glob(os.path.join(results_dir, "results_*.json")) for json_file in json_files: try: with open(json_file, "r") as f: data = json.load(f) - # Extract filesystem info - prefer from JSON data over filename - filename = os.path.basename(json_file) - - # First, try to get filesystem from the JSON data itself - fs_type = data.get("filesystem", None) - - # If not in JSON, try to parse from filename (backwards compatibility) - if not fs_type: - parts = filename.replace("results_", "").replace(".json", "").split("-") - - # Parse host info - if "debian13-ai-" in filename: - host_parts = ( - filename.replace("results_debian13-ai-", "") - .replace("_1.json", "") - .replace("_2.json", "") - .replace("_3.json", "") - .split("-") - ) - if "xfs" in host_parts[0]: - fs_type = "xfs" - # Extract block size (e.g., "4k", "16k", etc.) - block_size = host_parts[1] if len(host_parts) > 1 else "unknown" - elif "ext4" in host_parts[0]: - fs_type = "ext4" - block_size = host_parts[1] if len(host_parts) > 1 else "4k" - elif "btrfs" in host_parts[0]: - fs_type = "btrfs" - block_size = "default" - else: - fs_type = "unknown" - block_size = "unknown" - else: - fs_type = "unknown" - block_size = "unknown" - else: - # If filesystem came from JSON, set appropriate block size - if fs_type == "btrfs": - block_size = "default" - elif fs_type in ["ext4", "xfs"]: - block_size = data.get("block_size", "4k") - else: - block_size = data.get("block_size", "default") - - is_dev = "dev" in filename - - # Use filesystem from JSON if available, otherwise use parsed value - if "filesystem" not in data: - data["filesystem"] = fs_type - data["block_size"] = block_size - data["is_dev"] = is_dev - data["filename"] = filename - + # Add filename for filesystem detection + data["_file"] = os.path.basename(json_file) results.append(data) except Exception as e: print(f"Error loading {json_file}: {e}") @@ -86,554 +95,243 @@ def load_results(results_dir): return results -def create_filesystem_comparison_chart(results, output_dir): - """Create a bar chart comparing performance across filesystems""" - # Group by filesystem and baseline/dev - fs_data = defaultdict(lambda: {"baseline": [], "dev": []}) - - for result in results: - fs = result.get("filesystem", "unknown") - category = "dev" if result.get("is_dev", False) else "baseline" - - # Extract actual performance data from results - if "insert_performance" in result: - insert_qps = result["insert_performance"].get("vectors_per_second", 0) - else: - insert_qps = 0 - fs_data[fs][category].append(insert_qps) - - # Prepare data for plotting - filesystems = list(fs_data.keys()) - baseline_means = [ - np.mean(fs_data[fs]["baseline"]) if fs_data[fs]["baseline"] else 0 - for fs in filesystems - ] - dev_means = [ - np.mean(fs_data[fs]["dev"]) if fs_data[fs]["dev"] else 0 for fs in filesystems - ] - - x = np.arange(len(filesystems)) - width = 0.35 - - fig, ax = plt.subplots(figsize=(10, 6)) - baseline_bars = ax.bar( - x - width / 2, baseline_means, width, label="Baseline", color="#1f77b4" - ) - dev_bars = ax.bar( - x + width / 2, dev_means, width, label="Development", color="#ff7f0e" - ) - - ax.set_xlabel("Filesystem") - ax.set_ylabel("Insert QPS") - ax.set_title("Vector Database Performance by Filesystem") - ax.set_xticks(x) - ax.set_xticklabels(filesystems) - ax.legend() - ax.grid(True, alpha=0.3) - - # Add value labels on bars - for bars in [baseline_bars, dev_bars]: - for bar in bars: - height = bar.get_height() - if height > 0: - ax.annotate( - f"{height:.0f}", - xy=(bar.get_x() + bar.get_width() / 2, height), - xytext=(0, 3), - textcoords="offset points", - ha="center", - va="bottom", - ) - - plt.tight_layout() - plt.savefig(os.path.join(output_dir, "filesystem_comparison.png"), dpi=150) - plt.close() - - -def create_block_size_analysis(results, output_dir): - """Create analysis for different block sizes (XFS specific)""" - # Filter XFS results - xfs_results = [r for r in results if r.get("filesystem") == "xfs"] - - if not xfs_results: +def create_simple_performance_trends(results, output_dir): + """Create multi-node performance trends chart""" + if not results: return - # Group by block size - block_size_data = defaultdict(lambda: {"baseline": [], "dev": []}) - - for result in xfs_results: - block_size = result.get("block_size", "unknown") - category = "dev" if result.get("is_dev", False) else "baseline" - if "insert_performance" in result: - insert_qps = result["insert_performance"].get("vectors_per_second", 0) - else: - insert_qps = 0 - block_size_data[block_size][category].append(insert_qps) - - # Sort block sizes - block_sizes = sorted( - block_size_data.keys(), - key=lambda x: ( - int(x.replace("k", "").replace("s", "")) - if x not in ["unknown", "default"] - else 0 - ), - ) - - # Create grouped bar chart - baseline_means = [ - ( - np.mean(block_size_data[bs]["baseline"]) - if block_size_data[bs]["baseline"] - else 0 - ) - for bs in block_sizes - ] - dev_means = [ - np.mean(block_size_data[bs]["dev"]) if block_size_data[bs]["dev"] else 0 - for bs in block_sizes - ] - - x = np.arange(len(block_sizes)) - width = 0.35 - - fig, ax = plt.subplots(figsize=(12, 6)) - baseline_bars = ax.bar( - x - width / 2, baseline_means, width, label="Baseline", color="#2ca02c" - ) - dev_bars = ax.bar( - x + width / 2, dev_means, width, label="Development", color="#d62728" - ) - - ax.set_xlabel("Block Size") - ax.set_ylabel("Insert QPS") - ax.set_title("XFS Performance by Block Size") - ax.set_xticks(x) - ax.set_xticklabels(block_sizes) - ax.legend() - ax.grid(True, alpha=0.3) - - # Add value labels - for bars in [baseline_bars, dev_bars]: - for bar in bars: - height = bar.get_height() - if height > 0: - ax.annotate( - f"{height:.0f}", - xy=(bar.get_x() + bar.get_width() / 2, height), - xytext=(0, 3), - textcoords="offset points", - ha="center", - va="bottom", - ) - - plt.tight_layout() - plt.savefig(os.path.join(output_dir, "xfs_block_size_analysis.png"), dpi=150) - plt.close() - - -def create_heatmap_analysis(results, output_dir): - """Create a heatmap showing performance across all configurations""" - # Group data by configuration and version - config_data = defaultdict( - lambda: { - "baseline": {"insert": 0, "query": 0}, - "dev": {"insert": 0, "query": 0}, - } - ) + # Group results by node + node_performance = defaultdict(lambda: { + "insert_rates": [], + "insert_times": [], + "iterations": [], + "is_dev": False, + }) for result in results: - fs = result.get("filesystem", "unknown") - block_size = result.get("block_size", "default") - config = f"{fs}-{block_size}" - version = "dev" if result.get("is_dev", False) else "baseline" - - # Get actual insert performance - if "insert_performance" in result: - insert_qps = result["insert_performance"].get("vectors_per_second", 0) - else: - insert_qps = 0 - - # Calculate average query QPS - query_qps = 0 - if "query_performance" in result: - qp = result["query_performance"] - total_qps = 0 - count = 0 - for topk_key in ["topk_1", "topk_10", "topk_100"]: - if topk_key in qp: - for batch_key in ["batch_1", "batch_10", "batch_100"]: - if batch_key in qp[topk_key]: - total_qps += qp[topk_key][batch_key].get( - "queries_per_second", 0 - ) - count += 1 - if count > 0: - query_qps = total_qps / count - - config_data[config][version]["insert"] = insert_qps - config_data[config][version]["query"] = query_qps - - # Sort configurations - configs = sorted(config_data.keys()) - - # Prepare data for heatmap - insert_baseline = [config_data[c]["baseline"]["insert"] for c in configs] - insert_dev = [config_data[c]["dev"]["insert"] for c in configs] - query_baseline = [config_data[c]["baseline"]["query"] for c in configs] - query_dev = [config_data[c]["dev"]["query"] for c in configs] - - # Create figure with custom heatmap - fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 8)) - - # Create data matrices - insert_data = np.array([insert_baseline, insert_dev]).T - query_data = np.array([query_baseline, query_dev]).T - - # Insert QPS heatmap - im1 = ax1.imshow(insert_data, cmap="YlOrRd", aspect="auto") - ax1.set_xticks([0, 1]) - ax1.set_xticklabels(["Baseline", "Development"]) - ax1.set_yticks(range(len(configs))) - ax1.set_yticklabels(configs) - ax1.set_title("Insert Performance Heatmap") - ax1.set_ylabel("Configuration") - - # Add text annotations - for i in range(len(configs)): - for j in range(2): - text = ax1.text( - j, - i, - f"{int(insert_data[i, j])}", - ha="center", - va="center", - color="black", - ) + hostname, is_dev = _extract_node_info(result) + + if hostname not in node_performance: + node_performance[hostname] = { + "insert_rates": [], + "insert_times": [], + "iterations": [], + "is_dev": is_dev, + } - # Add colorbar - cbar1 = plt.colorbar(im1, ax=ax1) - cbar1.set_label("Insert QPS") - - # Query QPS heatmap - im2 = ax2.imshow(query_data, cmap="YlGnBu", aspect="auto") - ax2.set_xticks([0, 1]) - ax2.set_xticklabels(["Baseline", "Development"]) - ax2.set_yticks(range(len(configs))) - ax2.set_yticklabels(configs) - ax2.set_title("Query Performance Heatmap") - - # Add text annotations - for i in range(len(configs)): - for j in range(2): - text = ax2.text( - j, - i, - f"{int(query_data[i, j])}", - ha="center", - va="center", - color="black", + insert_perf = result.get("insert_performance", {}) + if insert_perf: + node_performance[hostname]["insert_rates"].append( + insert_perf.get("vectors_per_second", 0) + ) + fs_performance[config_key]["insert_times"].append( + insert_perf.get("total_time_seconds", 0) + ) + fs_performance[config_key]["iterations"].append( + len(fs_performance[config_key]["insert_rates"]) ) - # Add colorbar - cbar2 = plt.colorbar(im2, ax=ax2) - cbar2.set_label("Query QPS") - - plt.tight_layout() - plt.savefig(os.path.join(output_dir, "performance_heatmap.png"), dpi=150) - plt.close() - - -def create_performance_trends(results, output_dir): - """Create line charts showing performance trends""" - # Group by filesystem type - fs_types = defaultdict( - lambda: { - "configs": [], - "baseline_insert": [], - "dev_insert": [], - "baseline_query": [], - "dev_query": [], - } - ) - - for result in results: - fs = result.get("filesystem", "unknown") - block_size = result.get("block_size", "default") - config = f"{block_size}" - - if config not in fs_types[fs]["configs"]: - fs_types[fs]["configs"].append(config) - fs_types[fs]["baseline_insert"].append(0) - fs_types[fs]["dev_insert"].append(0) - fs_types[fs]["baseline_query"].append(0) - fs_types[fs]["dev_query"].append(0) - - idx = fs_types[fs]["configs"].index(config) - - # Calculate average query QPS from all test configurations - query_qps = 0 - if "query_performance" in result: - qp = result["query_performance"] - total_qps = 0 - count = 0 - for topk_key in ["topk_1", "topk_10", "topk_100"]: - if topk_key in qp: - for batch_key in ["batch_1", "batch_10", "batch_100"]: - if batch_key in qp[topk_key]: - total_qps += qp[topk_key][batch_key].get( - "queries_per_second", 0 - ) - count += 1 - if count > 0: - query_qps = total_qps / count - - if result.get("is_dev", False): - if "insert_performance" in result: - fs_types[fs]["dev_insert"][idx] = result["insert_performance"].get( - "vectors_per_second", 0 - ) - fs_types[fs]["dev_query"][idx] = query_qps - else: - if "insert_performance" in result: - fs_types[fs]["baseline_insert"][idx] = result["insert_performance"].get( - "vectors_per_second", 0 - ) - fs_types[fs]["baseline_query"][idx] = query_qps - - # Create separate plots for each filesystem - for fs, data in fs_types.items(): - if not data["configs"]: - continue - - fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 10)) - - x = range(len(data["configs"])) - - # Insert performance - ax1.plot( - x, - data["baseline_insert"], - "o-", - label="Baseline", - linewidth=2, - markersize=8, - ) - ax1.plot( - x, data["dev_insert"], "s-", label="Development", linewidth=2, markersize=8 - ) - ax1.set_xlabel("Configuration") - ax1.set_ylabel("Insert QPS") - ax1.set_title(f"{fs.upper()} Insert Performance") - ax1.set_xticks(x) - ax1.set_xticklabels(data["configs"]) - ax1.legend() + # Check if we have multi-filesystem data + if len(fs_performance) > 1: + # Multi-filesystem mode: separate lines for each filesystem + fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6)) + + colors = ["b", "r", "g", "m", "c", "y", "k"] + color_idx = 0 + + for config_key, perf_data in fs_performance.items(): + if not perf_data["insert_rates"]: + continue + + color = colors[color_idx % len(colors)] + iterations = list(range(1, len(perf_data["insert_rates"]) + 1)) + + # Plot insert rate + ax1.plot( + iterations, + perf_data["insert_rates"], + f"{color}-o", + linewidth=2, + markersize=6, + label=config_key.upper(), + ) + + # Plot insert time + ax2.plot( + iterations, + perf_data["insert_times"], + f"{color}-o", + linewidth=2, + markersize=6, + label=config_key.upper(), + ) + + color_idx += 1 + + ax1.set_xlabel("Iteration") + ax1.set_ylabel("Vectors/Second") + ax1.set_title("Milvus Insert Rate by Storage Filesystem") ax1.grid(True, alpha=0.3) - - # Query performance - ax2.plot( - x, data["baseline_query"], "o-", label="Baseline", linewidth=2, markersize=8 - ) - ax2.plot( - x, data["dev_query"], "s-", label="Development", linewidth=2, markersize=8 - ) - ax2.set_xlabel("Configuration") - ax2.set_ylabel("Query QPS") - ax2.set_title(f"{fs.upper()} Query Performance") - ax2.set_xticks(x) - ax2.set_xticklabels(data["configs"]) - ax2.legend() + ax1.legend() + + ax2.set_xlabel("Iteration") + ax2.set_ylabel("Total Time (seconds)") + ax2.set_title("Milvus Insert Time by Storage Filesystem") ax2.grid(True, alpha=0.3) - - plt.tight_layout() - plt.savefig(os.path.join(output_dir, f"{fs}_performance_trends.png"), dpi=150) - plt.close() + ax2.legend() + else: + # Single filesystem mode: original behavior + fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6)) + + # Extract insert data from single filesystem + config_key = list(fs_performance.keys())[0] if fs_performance else None + if config_key: + perf_data = fs_performance[config_key] + iterations = list(range(1, len(perf_data["insert_rates"]) + 1)) + + # Plot insert rate + ax1.plot( + iterations, + perf_data["insert_rates"], + "b-o", + linewidth=2, + markersize=6, + ) + ax1.set_xlabel("Iteration") + ax1.set_ylabel("Vectors/Second") + ax1.set_title("Vector Insert Rate Performance") + ax1.grid(True, alpha=0.3) + + # Plot insert time + ax2.plot( + iterations, + perf_data["insert_times"], + "r-o", + linewidth=2, + markersize=6, + ) + ax2.set_xlabel("Iteration") + ax2.set_ylabel("Total Time (seconds)") + ax2.set_title("Vector Insert Time Performance") + ax2.grid(True, alpha=0.3) + + plt.tight_layout() + plt.savefig(os.path.join(output_dir, "performance_trends.png"), dpi=150) + plt.close() -def create_simple_performance_trends(results, output_dir): - """Create a simple performance trends chart for basic Milvus testing""" +def create_heatmap_analysis(results, output_dir): + """Create multi-filesystem heatmap showing query performance""" if not results: return - - # Separate baseline and dev results - baseline_results = [r for r in results if not r.get("is_dev", False)] - dev_results = [r for r in results if r.get("is_dev", False)] - - if not baseline_results and not dev_results: - return - - fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 10)) - - # Prepare data - baseline_insert = [] - baseline_query = [] - dev_insert = [] - dev_query = [] - labels = [] - - # Process baseline results - for i, result in enumerate(baseline_results): - if "insert_performance" in result: - baseline_insert.append(result["insert_performance"].get("vectors_per_second", 0)) - else: - baseline_insert.append(0) + + # Group data by filesystem configuration + fs_performance = defaultdict(lambda: { + "query_data": [], + "config_key": "", + }) + + for result in results: + fs_type, block_size, config_key = _extract_filesystem_config(result) - # Calculate average query QPS - query_qps = 0 - if "query_performance" in result: - qp = result["query_performance"] - total_qps = 0 - count = 0 - for topk_key in ["topk_1", "topk_10", "topk_100"]: - if topk_key in qp: - for batch_key in ["batch_1", "batch_10", "batch_100"]: - if batch_key in qp[topk_key]: - total_qps += qp[topk_key][batch_key].get("queries_per_second", 0) - count += 1 - if count > 0: - query_qps = total_qps / count - baseline_query.append(query_qps) - labels.append(f"Run {i+1}") - - # Process dev results - for result in dev_results: - if "insert_performance" in result: - dev_insert.append(result["insert_performance"].get("vectors_per_second", 0)) - else: - dev_insert.append(0) + query_perf = result.get("query_performance", {}) + for topk, topk_data in query_perf.items(): + for batch, batch_data in topk_data.items(): + qps = batch_data.get("queries_per_second", 0) + fs_performance[config_key]["query_data"].append({ + "topk": topk, + "batch": batch, + "qps": qps, + }) + fs_performance[config_key]["config_key"] = config_key + + # Check if we have multi-filesystem data + if len(fs_performance) > 1: + # Multi-filesystem mode: separate heatmaps for each filesystem + num_fs = len(fs_performance) + fig, axes = plt.subplots(1, num_fs, figsize=(5*num_fs, 6)) + if num_fs == 1: + axes = [axes] - query_qps = 0 - if "query_performance" in result: - qp = result["query_performance"] - total_qps = 0 - count = 0 - for topk_key in ["topk_1", "topk_10", "topk_100"]: - if topk_key in qp: - for batch_key in ["batch_1", "batch_10", "batch_100"]: - if batch_key in qp[topk_key]: - total_qps += qp[topk_key][batch_key].get("queries_per_second", 0) - count += 1 - if count > 0: - query_qps = total_qps / count - dev_query.append(query_qps) - - x = range(len(baseline_results) if baseline_results else len(dev_results)) - - # Insert performance - if baseline_insert: - ax1.plot(x, baseline_insert, "o-", label="Baseline", linewidth=2, markersize=8) - if dev_insert: - ax1.plot(x[:len(dev_insert)], dev_insert, "s-", label="Development", linewidth=2, markersize=8) - ax1.set_xlabel("Test Run") - ax1.set_ylabel("Insert QPS") - ax1.set_title("Milvus Insert Performance") - ax1.set_xticks(x) - ax1.set_xticklabels(labels if labels else [f"Run {i+1}" for i in x]) - ax1.legend() - ax1.grid(True, alpha=0.3) - - # Query performance - if baseline_query: - ax2.plot(x, baseline_query, "o-", label="Baseline", linewidth=2, markersize=8) - if dev_query: - ax2.plot(x[:len(dev_query)], dev_query, "s-", label="Development", linewidth=2, markersize=8) - ax2.set_xlabel("Test Run") - ax2.set_ylabel("Query QPS") - ax2.set_title("Milvus Query Performance") - ax2.set_xticks(x) - ax2.set_xticklabels(labels if labels else [f"Run {i+1}" for i in x]) - ax2.legend() - ax2.grid(True, alpha=0.3) + # Define common structure for consistency + topk_order = ["topk_1", "topk_10", "topk_100"] + batch_order = ["batch_1", "batch_10", "batch_100"] + + for idx, (config_key, perf_data) in enumerate(fs_performance.items()): + # Create matrix for this filesystem + matrix = np.zeros((len(topk_order), len(batch_order))) + + # Fill matrix with data + query_dict = {} + for item in perf_data["query_data"]: + query_dict[(item["topk"], item["batch"])] = item["qps"] + + for i, topk in enumerate(topk_order): + for j, batch in enumerate(batch_order): + matrix[i, j] = query_dict.get((topk, batch), 0) + + # Plot heatmap + im = axes[idx].imshow(matrix, cmap='viridis', aspect='auto') + axes[idx].set_title(f"{config_key.upper()} Query Performance") + axes[idx].set_xticks(range(len(batch_order))) + axes[idx].set_xticklabels([b.replace("batch_", "Batch ") for b in batch_order]) + axes[idx].set_yticks(range(len(topk_order))) + axes[idx].set_yticklabels([t.replace("topk_", "Top-") for t in topk_order]) + + # Add text annotations + for i in range(len(topk_order)): + for j in range(len(batch_order)): + axes[idx].text(j, i, f'{matrix[i, j]:.0f}', + ha="center", va="center", color="white", fontweight="bold") + + # Add colorbar + cbar = plt.colorbar(im, ax=axes[idx]) + cbar.set_label('Queries Per Second (QPS)') + else: + # Single filesystem mode + fig, ax = plt.subplots(1, 1, figsize=(8, 6)) + + if fs_performance: + config_key = list(fs_performance.keys())[0] + perf_data = fs_performance[config_key] + + # Create matrix + topk_order = ["topk_1", "topk_10", "topk_100"] + batch_order = ["batch_1", "batch_10", "batch_100"] + matrix = np.zeros((len(topk_order), len(batch_order))) + + # Fill matrix with data + query_dict = {} + for item in perf_data["query_data"]: + query_dict[(item["topk"], item["batch"])] = item["qps"] + + for i, topk in enumerate(topk_order): + for j, batch in enumerate(batch_order): + matrix[i, j] = query_dict.get((topk, batch), 0) + + # Plot heatmap + im = ax.imshow(matrix, cmap='viridis', aspect='auto') + ax.set_title("Milvus Query Performance Heatmap") + ax.set_xticks(range(len(batch_order))) + ax.set_xticklabels([b.replace("batch_", "Batch ") for b in batch_order]) + ax.set_yticks(range(len(topk_order))) + ax.set_yticklabels([t.replace("topk_", "Top-") for t in topk_order]) + + # Add text annotations + for i in range(len(topk_order)): + for j in range(len(batch_order)): + ax.text(j, i, f'{matrix[i, j]:.0f}', + ha="center", va="center", color="white", fontweight="bold") + + # Add colorbar + cbar = plt.colorbar(im, ax=ax) + cbar.set_label('Queries Per Second (QPS)') plt.tight_layout() - plt.savefig(os.path.join(output_dir, "performance_trends.png"), dpi=150) + plt.savefig(os.path.join(output_dir, "performance_heatmap.png"), dpi=150, bbox_inches="tight") plt.close() -def generate_summary_statistics(results, output_dir): - """Generate summary statistics and save to JSON""" - summary = { - "total_tests": len(results), - "filesystems_tested": list( - set(r.get("filesystem", "unknown") for r in results) - ), - "configurations": {}, - "performance_summary": { - "best_insert_qps": {"value": 0, "config": ""}, - "best_query_qps": {"value": 0, "config": ""}, - "average_insert_qps": 0, - "average_query_qps": 0, - }, - } - - # Calculate statistics - all_insert_qps = [] - all_query_qps = [] - - for result in results: - fs = result.get("filesystem", "unknown") - block_size = result.get("block_size", "default") - is_dev = "dev" if result.get("is_dev", False) else "baseline" - config_name = f"{fs}-{block_size}-{is_dev}" - - # Get actual performance metrics - if "insert_performance" in result: - insert_qps = result["insert_performance"].get("vectors_per_second", 0) - else: - insert_qps = 0 - - # Calculate average query QPS - query_qps = 0 - if "query_performance" in result: - qp = result["query_performance"] - total_qps = 0 - count = 0 - for topk_key in ["topk_1", "topk_10", "topk_100"]: - if topk_key in qp: - for batch_key in ["batch_1", "batch_10", "batch_100"]: - if batch_key in qp[topk_key]: - total_qps += qp[topk_key][batch_key].get( - "queries_per_second", 0 - ) - count += 1 - if count > 0: - query_qps = total_qps / count - - all_insert_qps.append(insert_qps) - all_query_qps.append(query_qps) - - summary["configurations"][config_name] = { - "insert_qps": insert_qps, - "query_qps": query_qps, - "host": result.get("host", "unknown"), - } - - if insert_qps > summary["performance_summary"]["best_insert_qps"]["value"]: - summary["performance_summary"]["best_insert_qps"] = { - "value": insert_qps, - "config": config_name, - } - - if query_qps > summary["performance_summary"]["best_query_qps"]["value"]: - summary["performance_summary"]["best_query_qps"] = { - "value": query_qps, - "config": config_name, - } - - summary["performance_summary"]["average_insert_qps"] = ( - np.mean(all_insert_qps) if all_insert_qps else 0 - ) - summary["performance_summary"]["average_query_qps"] = ( - np.mean(all_query_qps) if all_query_qps else 0 - ) - - # Save summary - with open(os.path.join(output_dir, "summary.json"), "w") as f: - json.dump(summary, f, indent=2) - - return summary - - def main(): if len(sys.argv) < 3: print("Usage: generate_graphs.py ") @@ -642,37 +340,23 @@ def main(): results_dir = sys.argv[1] output_dir = sys.argv[2] - # Create output directory + # Ensure output directory exists os.makedirs(output_dir, exist_ok=True) # Load results results = load_results(results_dir) - if not results: - print("No results found to analyze") + print(f"No valid results found in {results_dir}") sys.exit(1) print(f"Loaded {len(results)} result files") # Generate graphs - print("Generating performance heatmap...") - create_heatmap_analysis(results, output_dir) - - print("Generating performance trends...") create_simple_performance_trends(results, output_dir) + create_heatmap_analysis(results, output_dir) - print("Generating summary statistics...") - summary = generate_summary_statistics(results, output_dir) - - print(f"\nAnalysis complete! Graphs saved to {output_dir}") - print(f"Total configurations tested: {summary['total_tests']}") - print( - f"Best insert QPS: {summary['performance_summary']['best_insert_qps']['value']} ({summary['performance_summary']['best_insert_qps']['config']})" - ) - print( - f"Best query QPS: {summary['performance_summary']['best_query_qps']['value']} ({summary['performance_summary']['best_query_qps']['config']})" - ) + print(f"Graphs generated in {output_dir}") if __name__ == "__main__": - main() + main() \ No newline at end of file diff --git a/playbooks/roles/ai_collect_results/files/generate_html_report.py b/playbooks/roles/ai_collect_results/files/generate_html_report.py index a205577c..01ec734c 100755 --- a/playbooks/roles/ai_collect_results/files/generate_html_report.py +++ b/playbooks/roles/ai_collect_results/files/generate_html_report.py @@ -69,6 +69,24 @@ HTML_TEMPLATE = """ color: #7f8c8d; font-size: 0.9em; }} + .config-box {{ + background: #f8f9fa; + border-left: 4px solid #3498db; + padding: 15px; + margin: 20px 0; + border-radius: 4px; + }} + .config-box h3 {{ + margin-top: 0; + color: #2c3e50; + }} + .config-box ul {{ + margin: 10px 0; + padding-left: 20px; + }} + .config-box li {{ + margin: 5px 0; + }} .section {{ background: white; padding: 30px; @@ -162,15 +180,16 @@ HTML_TEMPLATE = """
    -

    AI Vector Database Benchmark Results

    +

    Milvus Vector Database Benchmark Results

    Generated on {timestamp}
    @@ -192,34 +211,40 @@ HTML_TEMPLATE = """
    {best_query_config}
-

Test Runs

-
{total_tests}
-
Benchmark Executions
+

{fourth_card_title}

+
{fourth_card_value}
+
{fourth_card_label}
-
-

Performance Metrics

-

Key performance indicators for Milvus vector database operations.

+ {filesystem_comparison_section} + + {block_size_analysis_section} + +
+

Performance Heatmap

+

Heatmap visualization showing performance metrics across all tested configurations.

- Performance Metrics + Performance Heatmap
- ") # Footer + # Performance Graphs Section + html.append("
") + html.append("

📈 Performance Visualizations

") + html.append( + "

The following graphs provide visual analysis of the benchmark results across all tested filesystem configurations:

" + ) + html.append("
    ") + html.append( + "
  • Insert Performance: Shows vector insertion rates and times for each filesystem configuration
  • " + ) + html.append( + "
  • Query Performance: Displays query performance heatmaps for different Top-K and batch sizes
  • " + ) + html.append( + "
  • Index Performance: Compares index creation times across filesystems
  • " + ) + html.append( + "
  • Performance Matrix: Comprehensive comparison matrix of all metrics
  • " + ) + html.append( + "
  • Filesystem Comparison: Side-by-side comparison of filesystem performance
  • " + ) + html.append("
") + html.append( + "

Note: Graphs are generated as separate PNG files in the same directory as this report.

" + ) + html.append("
") + html.append( + " Insert Performance" + ) + html.append( + " Query Performance" + ) + html.append( + " Index Performance" + ) + html.append( + " Performance Matrix" + ) + html.append( + " Filesystem Comparison" + ) + html.append("
") + html.append("
") + html.append("
") html.append("

📝 Notes

") html.append("
    ") @@ -661,10 +868,11 @@ class ResultsAnalyzer: return "\n".join(html) except Exception as e: - self.logger.error(f"Error generating HTML report: {e}") - return ( - f"

    Error generating HTML report: {e}

    " - ) + import traceback + + tb = traceback.format_exc() + self.logger.error(f"Error generating HTML report: {e}\n{tb}") + return f"

    Error generating HTML report: {e}

    {tb}
    " def generate_graphs(self) -> bool: """Generate performance visualization graphs""" @@ -691,6 +899,9 @@ class ResultsAnalyzer: # Graph 4: Performance Comparison Matrix self._plot_performance_matrix() + # Graph 5: Multi-filesystem Comparison (if applicable) + self._plot_filesystem_comparison() + self.logger.info("Graphs generated successfully") return True @@ -699,34 +910,188 @@ class ResultsAnalyzer: return False def _plot_insert_performance(self): - """Plot insert performance metrics""" - fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6)) + """Plot insert performance metrics with node differentiation""" + # Group data by node + node_performance = {} - # Extract insert data - iterations = [] - insert_rates = [] - insert_times = [] + for result in self.results_data: + hostname, is_dev = self._extract_node_info(result) + + if hostname not in node_performance: + node_performance[hostname] = { + "insert_rates": [], + "insert_times": [], + "iterations": [], + "is_dev": is_dev, + } - for i, result in enumerate(self.results_data): insert_perf = result.get("insert_performance", {}) if insert_perf: - iterations.append(i + 1) - insert_rates.append(insert_perf.get("vectors_per_second", 0)) - insert_times.append(insert_perf.get("total_time_seconds", 0)) - - # Plot insert rate - ax1.plot(iterations, insert_rates, "b-o", linewidth=2, markersize=6) - ax1.set_xlabel("Iteration") - ax1.set_ylabel("Vectors/Second") - ax1.set_title("Vector Insert Rate Performance") - ax1.grid(True, alpha=0.3) - - # Plot insert time - ax2.plot(iterations, insert_times, "r-o", linewidth=2, markersize=6) - ax2.set_xlabel("Iteration") - ax2.set_ylabel("Total Time (seconds)") - ax2.set_title("Vector Insert Time Performance") - ax2.grid(True, alpha=0.3) + node_performance[hostname]["insert_rates"].append( + insert_perf.get("vectors_per_second", 0) + ) + node_performance[hostname]["insert_times"].append( + insert_perf.get("total_time_seconds", 0) + ) + node_performance[hostname]["iterations"].append( + len(node_performance[hostname]["insert_rates"]) + ) + + # Check if we have multiple nodes + if len(node_performance) > 1: + # Multi-node mode: separate lines for each node + fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(18, 7)) + + # Sort nodes with baseline first, then dev + sorted_nodes = sorted( + node_performance.items(), key=lambda x: (x[1]["is_dev"], x[0]) + ) + + # Create color palettes for baseline and dev nodes + baseline_colors = [ + "#2E7D32", + "#43A047", + "#66BB6A", + "#81C784", + "#A5D6A7", + "#C8E6C9", + ] # Greens + dev_colors = [ + "#0D47A1", + "#1565C0", + "#1976D2", + "#1E88E5", + "#2196F3", + "#42A5F5", + "#64B5F6", + ] # Blues + + # Additional colors if needed + extra_colors = [ + "#E65100", + "#F57C00", + "#FF9800", + "#FFB300", + "#FFC107", + "#FFCA28", + ] # Oranges + + # Line styles to cycle through + line_styles = ["-", "--", "-.", ":"] + markers = ["o", "s", "^", "v", "D", "p", "*", "h"] + + baseline_idx = 0 + dev_idx = 0 + + # Use different colors and styles for each node + for idx, (hostname, perf_data) in enumerate(sorted_nodes): + if not perf_data["insert_rates"]: + continue + + # Choose color and style based on node type and index + if perf_data["is_dev"]: + # Development nodes - blues + color = dev_colors[dev_idx % len(dev_colors)] + linestyle = line_styles[ + (dev_idx // len(dev_colors)) % len(line_styles) + ] + marker = markers[4 + (dev_idx % 4)] # Use markers 4-7 for dev + label = f"{hostname} (Dev)" + dev_idx += 1 + else: + # Baseline nodes - greens + color = baseline_colors[baseline_idx % len(baseline_colors)] + linestyle = line_styles[ + (baseline_idx // len(baseline_colors)) % len(line_styles) + ] + marker = markers[ + baseline_idx % 4 + ] # Use first 4 markers for baseline + label = f"{hostname} (Baseline)" + baseline_idx += 1 + + iterations = list(range(1, len(perf_data["insert_rates"]) + 1)) + + # Plot insert rate with alpha for better visibility + ax1.plot( + iterations, + perf_data["insert_rates"], + color=color, + linestyle=linestyle, + marker=marker, + linewidth=1.5, + markersize=5, + label=label, + alpha=0.8, + ) + + # Plot insert time + ax2.plot( + iterations, + perf_data["insert_times"], + color=color, + linestyle=linestyle, + marker=marker, + linewidth=1.5, + markersize=5, + label=label, + alpha=0.8, + ) + + ax1.set_xlabel("Iteration") + ax1.set_ylabel("Vectors/Second") + ax1.set_title("Milvus Insert Rate by Node") + ax1.grid(True, alpha=0.3) + # Position legend outside plot area for better visibility with many nodes + ax1.legend(bbox_to_anchor=(1.05, 1), loc="upper left", fontsize=7, ncol=1) + + ax2.set_xlabel("Iteration") + ax2.set_ylabel("Total Time (seconds)") + ax2.set_title("Milvus Insert Time by Node") + ax2.grid(True, alpha=0.3) + # Position legend outside plot area for better visibility with many nodes + ax2.legend(bbox_to_anchor=(1.05, 1), loc="upper left", fontsize=7, ncol=1) + + plt.suptitle( + "Insert Performance Analysis: Baseline vs Development", + fontsize=14, + y=1.02, + ) + else: + # Single node mode: original behavior + fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6)) + + # Extract insert data from single node + hostname = list(node_performance.keys())[0] if node_performance else None + if hostname: + perf_data = node_performance[hostname] + iterations = list(range(1, len(perf_data["insert_rates"]) + 1)) + + # Plot insert rate + ax1.plot( + iterations, + perf_data["insert_rates"], + "b-o", + linewidth=2, + markersize=6, + ) + ax1.set_xlabel("Iteration") + ax1.set_ylabel("Vectors/Second") + ax1.set_title(f"Vector Insert Rate Performance - {hostname}") + ax1.grid(True, alpha=0.3) + + # Plot insert time + ax2.plot( + iterations, + perf_data["insert_times"], + "r-o", + linewidth=2, + markersize=6, + ) + ax2.set_xlabel("Iteration") + ax2.set_ylabel("Total Time (seconds)") + ax2.set_title(f"Vector Insert Time Performance - {hostname}") + ax2.grid(True, alpha=0.3) plt.tight_layout() output_file = os.path.join( @@ -739,52 +1104,110 @@ class ResultsAnalyzer: plt.close() def _plot_query_performance(self): - """Plot query performance metrics""" + """Plot query performance metrics comparing baseline vs dev nodes""" if not self.results_data: return - # Collect query performance data - query_data = [] + # Group data by filesystem configuration + fs_groups = {} for result in self.results_data: + hostname, is_dev = self._extract_node_info(result) + fs_type, block_size, config_key = self._extract_filesystem_config(result) + + if config_key not in fs_groups: + fs_groups[config_key] = {"baseline": [], "dev": []} + query_perf = result.get("query_performance", {}) - for topk, topk_data in query_perf.items(): - for batch, batch_data in topk_data.items(): - query_data.append( - { - "topk": topk.replace("topk_", ""), - "batch": batch.replace("batch_", ""), - "qps": batch_data.get("queries_per_second", 0), - "avg_time": batch_data.get("average_time_seconds", 0) - * 1000, # Convert to ms - } - ) + if query_perf: + node_type = "dev" if is_dev else "baseline" + for topk, topk_data in query_perf.items(): + for batch, batch_data in topk_data.items(): + fs_groups[config_key][node_type].append( + { + "hostname": hostname, + "topk": topk.replace("topk_", ""), + "batch": batch.replace("batch_", ""), + "qps": batch_data.get("queries_per_second", 0), + "avg_time": batch_data.get("average_time_seconds", 0) + * 1000, + } + ) - if not query_data: + if not fs_groups: return - df = pd.DataFrame(query_data) + # Create subplots for each filesystem config + n_configs = len(fs_groups) + fig_height = max(8, 4 * n_configs) + fig, axes = plt.subplots(n_configs, 2, figsize=(16, fig_height)) - # Create subplots - fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6)) + if n_configs == 1: + axes = axes.reshape(1, -1) - # QPS heatmap - qps_pivot = df.pivot_table( - values="qps", index="topk", columns="batch", aggfunc="mean" - ) - sns.heatmap(qps_pivot, annot=True, fmt=".1f", ax=ax1, cmap="YlOrRd") - ax1.set_title("Queries Per Second (QPS)") - ax1.set_xlabel("Batch Size") - ax1.set_ylabel("Top-K") - - # Latency heatmap - latency_pivot = df.pivot_table( - values="avg_time", index="topk", columns="batch", aggfunc="mean" - ) - sns.heatmap(latency_pivot, annot=True, fmt=".1f", ax=ax2, cmap="YlOrRd") - ax2.set_title("Average Query Latency (ms)") - ax2.set_xlabel("Batch Size") - ax2.set_ylabel("Top-K") + for idx, (config_key, data) in enumerate(sorted(fs_groups.items())): + # Create DataFrames for baseline and dev + baseline_df = ( + pd.DataFrame(data["baseline"]) if data["baseline"] else pd.DataFrame() + ) + dev_df = pd.DataFrame(data["dev"]) if data["dev"] else pd.DataFrame() + + # Baseline QPS heatmap + ax_base = axes[idx][0] + if not baseline_df.empty: + baseline_pivot = baseline_df.pivot_table( + values="qps", index="topk", columns="batch", aggfunc="mean" + ) + sns.heatmap( + baseline_pivot, + annot=True, + fmt=".1f", + ax=ax_base, + cmap="Greens", + cbar_kws={"label": "QPS"}, + ) + ax_base.set_title(f"{config_key.upper()} - Baseline QPS") + ax_base.set_xlabel("Batch Size") + ax_base.set_ylabel("Top-K") + else: + ax_base.text( + 0.5, + 0.5, + f"No baseline data for {config_key}", + ha="center", + va="center", + transform=ax_base.transAxes, + ) + ax_base.set_title(f"{config_key.upper()} - Baseline QPS") + # Dev QPS heatmap + ax_dev = axes[idx][1] + if not dev_df.empty: + dev_pivot = dev_df.pivot_table( + values="qps", index="topk", columns="batch", aggfunc="mean" + ) + sns.heatmap( + dev_pivot, + annot=True, + fmt=".1f", + ax=ax_dev, + cmap="Blues", + cbar_kws={"label": "QPS"}, + ) + ax_dev.set_title(f"{config_key.upper()} - Development QPS") + ax_dev.set_xlabel("Batch Size") + ax_dev.set_ylabel("Top-K") + else: + ax_dev.text( + 0.5, + 0.5, + f"No dev data for {config_key}", + ha="center", + va="center", + transform=ax_dev.transAxes, + ) + ax_dev.set_title(f"{config_key.upper()} - Development QPS") + + plt.suptitle("Query Performance: Baseline vs Development", fontsize=16, y=1.02) plt.tight_layout() output_file = os.path.join( self.output_dir, @@ -796,32 +1219,101 @@ class ResultsAnalyzer: plt.close() def _plot_index_performance(self): - """Plot index creation performance""" - iterations = [] - index_times = [] + """Plot index creation performance comparing baseline vs dev""" + # Group by filesystem configuration + fs_groups = {} + + for result in self.results_data: + hostname, is_dev = self._extract_node_info(result) + fs_type, block_size, config_key = self._extract_filesystem_config(result) + + if config_key not in fs_groups: + fs_groups[config_key] = {"baseline": [], "dev": []} - for i, result in enumerate(self.results_data): index_perf = result.get("index_performance", {}) if index_perf: - iterations.append(i + 1) - index_times.append(index_perf.get("creation_time_seconds", 0)) + time = index_perf.get("creation_time_seconds", 0) + if time > 0: + node_type = "dev" if is_dev else "baseline" + fs_groups[config_key][node_type].append(time) - if not index_times: + if not fs_groups: return - plt.figure(figsize=(10, 6)) - plt.bar(iterations, index_times, alpha=0.7, color="green") - plt.xlabel("Iteration") - plt.ylabel("Index Creation Time (seconds)") - plt.title("Index Creation Performance") - plt.grid(True, alpha=0.3) - - # Add average line - avg_time = np.mean(index_times) - plt.axhline( - y=avg_time, color="red", linestyle="--", label=f"Average: {avg_time:.2f}s" + # Create comparison bar chart + fig, ax = plt.subplots(figsize=(14, 8)) + + configs = sorted(fs_groups.keys()) + x = np.arange(len(configs)) + width = 0.35 + + # Calculate averages for each config + baseline_avgs = [] + dev_avgs = [] + baseline_stds = [] + dev_stds = [] + + for config in configs: + baseline_times = fs_groups[config]["baseline"] + dev_times = fs_groups[config]["dev"] + + baseline_avgs.append(np.mean(baseline_times) if baseline_times else 0) + dev_avgs.append(np.mean(dev_times) if dev_times else 0) + baseline_stds.append(np.std(baseline_times) if baseline_times else 0) + dev_stds.append(np.std(dev_times) if dev_times else 0) + + # Create bars + bars1 = ax.bar( + x - width / 2, + baseline_avgs, + width, + yerr=baseline_stds, + label="Baseline", + color="#4CAF50", + capsize=5, + ) + bars2 = ax.bar( + x + width / 2, + dev_avgs, + width, + yerr=dev_stds, + label="Development", + color="#2196F3", + capsize=5, ) - plt.legend() + + # Add value labels on bars + for bar, val in zip(bars1, baseline_avgs): + if val > 0: + height = bar.get_height() + ax.text( + bar.get_x() + bar.get_width() / 2.0, + height, + f"{val:.3f}s", + ha="center", + va="bottom", + fontsize=9, + ) + + for bar, val in zip(bars2, dev_avgs): + if val > 0: + height = bar.get_height() + ax.text( + bar.get_x() + bar.get_width() / 2.0, + height, + f"{val:.3f}s", + ha="center", + va="bottom", + fontsize=9, + ) + + ax.set_xlabel("Filesystem Configuration", fontsize=12) + ax.set_ylabel("Index Creation Time (seconds)", fontsize=12) + ax.set_title("Index Creation Performance: Baseline vs Development", fontsize=14) + ax.set_xticks(x) + ax.set_xticklabels([c.upper() for c in configs], rotation=45, ha="right") + ax.legend(loc="upper right") + ax.grid(True, alpha=0.3, axis="y") output_file = os.path.join( self.output_dir, @@ -833,61 +1325,148 @@ class ResultsAnalyzer: plt.close() def _plot_performance_matrix(self): - """Plot comprehensive performance comparison matrix""" + """Plot performance comparison matrix for each filesystem config""" if len(self.results_data) < 2: return - # Extract key metrics for comparison - metrics = [] - for i, result in enumerate(self.results_data): + # Group by filesystem configuration + fs_metrics = {} + + for result in self.results_data: + hostname, is_dev = self._extract_node_info(result) + fs_type, block_size, config_key = self._extract_filesystem_config(result) + + if config_key not in fs_metrics: + fs_metrics[config_key] = {"baseline": [], "dev": []} + + # Collect metrics insert_perf = result.get("insert_performance", {}) index_perf = result.get("index_performance", {}) + query_perf = result.get("query_performance", {}) metric = { - "iteration": i + 1, + "hostname": hostname, "insert_rate": insert_perf.get("vectors_per_second", 0), "index_time": index_perf.get("creation_time_seconds", 0), } - # Add query metrics - query_perf = result.get("query_performance", {}) + # Get representative query performance (topk_10, batch_1) if "topk_10" in query_perf and "batch_1" in query_perf["topk_10"]: metric["query_qps"] = query_perf["topk_10"]["batch_1"].get( "queries_per_second", 0 ) + else: + metric["query_qps"] = 0 - metrics.append(metric) + node_type = "dev" if is_dev else "baseline" + fs_metrics[config_key][node_type].append(metric) - df = pd.DataFrame(metrics) + if not fs_metrics: + return - # Normalize metrics for comparison - numeric_cols = ["insert_rate", "index_time", "query_qps"] - for col in numeric_cols: - if col in df.columns: - df[f"{col}_norm"] = (df[col] - df[col].min()) / ( - df[col].max() - df[col].min() + 1e-6 - ) + # Create subplots for each filesystem + n_configs = len(fs_metrics) + n_cols = min(3, n_configs) + n_rows = (n_configs + n_cols - 1) // n_cols + + fig, axes = plt.subplots(n_rows, n_cols, figsize=(n_cols * 6, n_rows * 5)) + if n_rows == 1 and n_cols == 1: + axes = [[axes]] + elif n_rows == 1: + axes = [axes] + elif n_cols == 1: + axes = [[ax] for ax in axes] + + for idx, (config_key, data) in enumerate(sorted(fs_metrics.items())): + row = idx // n_cols + col = idx % n_cols + ax = axes[row][col] + + # Calculate averages + baseline_metrics = data["baseline"] + dev_metrics = data["dev"] + + if baseline_metrics and dev_metrics: + categories = ["Insert Rate\n(vec/s)", "Index Time\n(s)", "Query QPS"] + + baseline_avg = [ + np.mean([m["insert_rate"] for m in baseline_metrics]), + np.mean([m["index_time"] for m in baseline_metrics]), + np.mean([m["query_qps"] for m in baseline_metrics]), + ] - # Create radar chart - fig, ax = plt.subplots(figsize=(10, 8), subplot_kw=dict(projection="polar")) + dev_avg = [ + np.mean([m["insert_rate"] for m in dev_metrics]), + np.mean([m["index_time"] for m in dev_metrics]), + np.mean([m["query_qps"] for m in dev_metrics]), + ] - angles = np.linspace(0, 2 * np.pi, len(numeric_cols), endpoint=False).tolist() - angles += angles[:1] # Complete the circle + x = np.arange(len(categories)) + width = 0.35 - for i, row in df.iterrows(): - values = [row.get(f"{col}_norm", 0) for col in numeric_cols] - values += values[:1] # Complete the circle + bars1 = ax.bar( + x - width / 2, + baseline_avg, + width, + label="Baseline", + color="#4CAF50", + ) + bars2 = ax.bar( + x + width / 2, dev_avg, width, label="Development", color="#2196F3" + ) - ax.plot( - angles, values, "o-", linewidth=2, label=f'Iteration {row["iteration"]}' - ) - ax.fill(angles, values, alpha=0.25) + # Add value labels + for bar, val in zip(bars1, baseline_avg): + height = bar.get_height() + ax.text( + bar.get_x() + bar.get_width() / 2.0, + height, + f"{val:.0f}" if val > 100 else f"{val:.2f}", + ha="center", + va="bottom", + fontsize=8, + ) - ax.set_xticks(angles[:-1]) - ax.set_xticklabels(["Insert Rate", "Index Time (inv)", "Query QPS"]) - ax.set_ylim(0, 1) - ax.set_title("Performance Comparison Matrix (Normalized)", y=1.08) - ax.legend(loc="upper right", bbox_to_anchor=(1.3, 1.0)) + for bar, val in zip(bars2, dev_avg): + height = bar.get_height() + ax.text( + bar.get_x() + bar.get_width() / 2.0, + height, + f"{val:.0f}" if val > 100 else f"{val:.2f}", + ha="center", + va="bottom", + fontsize=8, + ) + + ax.set_xlabel("Metrics") + ax.set_ylabel("Value") + ax.set_title(f"{config_key.upper()}") + ax.set_xticks(x) + ax.set_xticklabels(categories) + ax.legend(loc="upper right", fontsize=8) + ax.grid(True, alpha=0.3, axis="y") + else: + ax.text( + 0.5, + 0.5, + f"Insufficient data\nfor {config_key}", + ha="center", + va="center", + transform=ax.transAxes, + ) + ax.set_title(f"{config_key.upper()}") + + # Hide unused subplots + for idx in range(n_configs, n_rows * n_cols): + row = idx // n_cols + col = idx % n_cols + axes[row][col].set_visible(False) + + plt.suptitle( + "Performance Comparison Matrix: Baseline vs Development", + fontsize=14, + y=1.02, + ) output_file = os.path.join( self.output_dir, @@ -898,6 +1477,149 @@ class ResultsAnalyzer: ) plt.close() + def _plot_filesystem_comparison(self): + """Plot node performance comparison chart""" + if len(self.results_data) < 2: + return + + # Group results by node + node_performance = {} + + for result in self.results_data: + hostname, is_dev = self._extract_node_info(result) + + if hostname not in node_performance: + node_performance[hostname] = { + "insert_rates": [], + "index_times": [], + "query_qps": [], + "is_dev": is_dev, + } + + # Collect metrics + insert_perf = result.get("insert_performance", {}) + if insert_perf: + node_performance[hostname]["insert_rates"].append( + insert_perf.get("vectors_per_second", 0) + ) + + index_perf = result.get("index_performance", {}) + if index_perf: + node_performance[hostname]["index_times"].append( + index_perf.get("creation_time_seconds", 0) + ) + + # Get top-10 batch-1 query performance as representative + query_perf = result.get("query_performance", {}) + if "topk_10" in query_perf and "batch_1" in query_perf["topk_10"]: + qps = query_perf["topk_10"]["batch_1"].get("queries_per_second", 0) + node_performance[hostname]["query_qps"].append(qps) + + # Only create comparison if we have multiple nodes + if len(node_performance) > 1: + # Calculate averages + node_metrics = {} + for hostname, perf_data in node_performance.items(): + node_metrics[hostname] = { + "avg_insert_rate": ( + np.mean(perf_data["insert_rates"]) + if perf_data["insert_rates"] + else 0 + ), + "avg_index_time": ( + np.mean(perf_data["index_times"]) + if perf_data["index_times"] + else 0 + ), + "avg_query_qps": ( + np.mean(perf_data["query_qps"]) if perf_data["query_qps"] else 0 + ), + "is_dev": perf_data["is_dev"], + } + + # Create comparison bar chart with more space + fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(24, 8)) + + # Sort nodes with baseline first + sorted_nodes = sorted( + node_metrics.items(), key=lambda x: (x[1]["is_dev"], x[0]) + ) + node_names = [hostname for hostname, _ in sorted_nodes] + + # Use different colors for baseline vs dev + colors = [ + "#4CAF50" if not node_metrics[hostname]["is_dev"] else "#2196F3" + for hostname in node_names + ] + + # Add labels for clarity + labels = [ + f"{hostname}\n({'Dev' if node_metrics[hostname]['is_dev'] else 'Baseline'})" + for hostname in node_names + ] + + # Insert rate comparison + insert_rates = [ + node_metrics[hostname]["avg_insert_rate"] for hostname in node_names + ] + bars1 = ax1.bar(labels, insert_rates, color=colors) + ax1.set_title("Average Milvus Insert Rate by Node") + ax1.set_ylabel("Vectors/Second") + # Rotate labels for better readability + ax1.set_xticklabels(labels, rotation=45, ha="right", fontsize=8) + + # Index time comparison (lower is better) + index_times = [ + node_metrics[hostname]["avg_index_time"] for hostname in node_names + ] + bars2 = ax2.bar(labels, index_times, color=colors) + ax2.set_title("Average Milvus Index Time by Node") + ax2.set_ylabel("Seconds (Lower is Better)") + ax2.set_xticklabels(labels, rotation=45, ha="right", fontsize=8) + + # Query QPS comparison + query_qps = [ + node_metrics[hostname]["avg_query_qps"] for hostname in node_names + ] + bars3 = ax3.bar(labels, query_qps, color=colors) + ax3.set_title("Average Milvus Query QPS by Node") + ax3.set_ylabel("Queries/Second") + ax3.set_xticklabels(labels, rotation=45, ha="right", fontsize=8) + + # Add value labels on bars + for bars, values in [ + (bars1, insert_rates), + (bars2, index_times), + (bars3, query_qps), + ]: + for bar, value in zip(bars, values): + height = bar.get_height() + ax = bar.axes + ax.text( + bar.get_x() + bar.get_width() / 2.0, + height + height * 0.01, + f"{value:.1f}", + ha="center", + va="bottom", + fontsize=10, + ) + + plt.suptitle( + "Milvus Performance Comparison: Baseline vs Development Nodes", + fontsize=16, + y=1.02, + ) + plt.tight_layout() + + output_file = os.path.join( + self.output_dir, + f"filesystem_comparison.{self.config.get('graph_format', 'png')}", + ) + plt.savefig( + output_file, dpi=self.config.get("graph_dpi", 300), bbox_inches="tight" + ) + plt.close() + def analyze(self) -> bool: """Run complete analysis""" self.logger.info("Starting results analysis...") diff --git a/workflows/ai/scripts/generate_graphs.py b/workflows/ai/scripts/generate_graphs.py index 2e183e86..fafc62bf 100755 --- a/workflows/ai/scripts/generate_graphs.py +++ b/workflows/ai/scripts/generate_graphs.py @@ -9,7 +9,6 @@ import sys import glob import numpy as np import matplotlib - matplotlib.use("Agg") # Use non-interactive backend import matplotlib.pyplot as plt from datetime import datetime @@ -17,6 +16,66 @@ from pathlib import Path from collections import defaultdict +def _extract_filesystem_config(result): + """Extract filesystem type and block size from result data. + Returns (fs_type, block_size, config_key)""" + filename = result.get("_file", "") + + # Primary: Extract filesystem type from filename (more reliable than JSON) + fs_type = "unknown" + block_size = "default" + + if "xfs" in filename: + fs_type = "xfs" + # Check larger sizes first to avoid substring matches + if "64k" in filename and "64k-" in filename: + block_size = "64k" + elif "32k" in filename and "32k-" in filename: + block_size = "32k" + elif "16k" in filename and "16k-" in filename: + block_size = "16k" + elif "4k" in filename and "4k-" in filename: + block_size = "4k" + elif "ext4" in filename: + fs_type = "ext4" + if "4k" in filename and "4k-" in filename: + block_size = "4k" + elif "16k" in filename and "16k-" in filename: + block_size = "16k" + elif "btrfs" in filename: + fs_type = "btrfs" + + # Fallback: Check JSON data if filename parsing failed + if fs_type == "unknown": + fs_type = result.get("filesystem", "unknown") + + # Create descriptive config key + config_key = f"{fs_type}-{block_size}" if block_size != "default" else fs_type + return fs_type, block_size, config_key + + +def _extract_node_info(result): + """Extract node hostname and determine if it's a dev node. + Returns (hostname, is_dev_node)""" + # Get hostname from system_info (preferred) or fall back to filename + system_info = result.get("system_info", {}) + hostname = system_info.get("hostname", "") + + # If no hostname in system_info, try extracting from filename + if not hostname: + filename = result.get("_file", "") + # Remove results_ prefix and .json suffix + hostname = filename.replace("results_", "").replace(".json", "") + # Remove iteration number if present (_1, _2, etc.) + if "_" in hostname and hostname.split("_")[-1].isdigit(): + hostname = "_".join(hostname.split("_")[:-1]) + + # Determine if this is a dev node + is_dev = hostname.endswith("-dev") + + return hostname, is_dev + + def load_results(results_dir): """Load all JSON result files from the directory""" results = [] @@ -27,63 +86,8 @@ def load_results(results_dir): try: with open(json_file, "r") as f: data = json.load(f) - # Extract filesystem info - prefer from JSON data over filename - filename = os.path.basename(json_file) - - # First, try to get filesystem from the JSON data itself - fs_type = data.get("filesystem", None) - - # If not in JSON, try to parse from filename (backwards compatibility) - if not fs_type: - parts = ( - filename.replace("results_", "").replace(".json", "").split("-") - ) - - # Parse host info - if "debian13-ai-" in filename: - host_parts = ( - filename.replace("results_debian13-ai-", "") - .replace("_1.json", "") - .replace("_2.json", "") - .replace("_3.json", "") - .split("-") - ) - if "xfs" in host_parts[0]: - fs_type = "xfs" - # Extract block size (e.g., "4k", "16k", etc.) - block_size = ( - host_parts[1] if len(host_parts) > 1 else "unknown" - ) - elif "ext4" in host_parts[0]: - fs_type = "ext4" - block_size = host_parts[1] if len(host_parts) > 1 else "4k" - elif "btrfs" in host_parts[0]: - fs_type = "btrfs" - block_size = "default" - else: - fs_type = "unknown" - block_size = "unknown" - else: - fs_type = "unknown" - block_size = "unknown" - else: - # If filesystem came from JSON, set appropriate block size - if fs_type == "btrfs": - block_size = "default" - elif fs_type in ["ext4", "xfs"]: - block_size = data.get("block_size", "4k") - else: - block_size = data.get("block_size", "default") - - is_dev = "dev" in filename - - # Use filesystem from JSON if available, otherwise use parsed value - if "filesystem" not in data: - data["filesystem"] = fs_type - data["block_size"] = block_size - data["is_dev"] = is_dev - data["filename"] = filename - + # Add filename for filesystem detection + data["_file"] = os.path.basename(json_file) results.append(data) except Exception as e: print(f"Error loading {json_file}: {e}") @@ -91,1023 +95,240 @@ def load_results(results_dir): return results -def create_filesystem_comparison_chart(results, output_dir): - """Create a bar chart comparing performance across filesystems""" - # Group by filesystem and baseline/dev - fs_data = defaultdict(lambda: {"baseline": [], "dev": []}) - - for result in results: - fs = result.get("filesystem", "unknown") - category = "dev" if result.get("is_dev", False) else "baseline" - - # Extract actual performance data from results - if "insert_performance" in result: - insert_qps = result["insert_performance"].get("vectors_per_second", 0) - else: - insert_qps = 0 - fs_data[fs][category].append(insert_qps) - - # Prepare data for plotting - filesystems = list(fs_data.keys()) - baseline_means = [ - np.mean(fs_data[fs]["baseline"]) if fs_data[fs]["baseline"] else 0 - for fs in filesystems - ] - dev_means = [ - np.mean(fs_data[fs]["dev"]) if fs_data[fs]["dev"] else 0 for fs in filesystems - ] - - x = np.arange(len(filesystems)) - width = 0.35 - - fig, ax = plt.subplots(figsize=(10, 6)) - baseline_bars = ax.bar( - x - width / 2, baseline_means, width, label="Baseline", color="#1f77b4" - ) - dev_bars = ax.bar( - x + width / 2, dev_means, width, label="Development", color="#ff7f0e" - ) - - ax.set_xlabel("Filesystem") - ax.set_ylabel("Insert QPS") - ax.set_title("Vector Database Performance by Filesystem") - ax.set_xticks(x) - ax.set_xticklabels(filesystems) - ax.legend() - ax.grid(True, alpha=0.3) - - # Add value labels on bars - for bars in [baseline_bars, dev_bars]: - for bar in bars: - height = bar.get_height() - if height > 0: - ax.annotate( - f"{height:.0f}", - xy=(bar.get_x() + bar.get_width() / 2, height), - xytext=(0, 3), - textcoords="offset points", - ha="center", - va="bottom", - ) - - plt.tight_layout() - plt.savefig(os.path.join(output_dir, "filesystem_comparison.png"), dpi=150) - plt.close() - - -def create_block_size_analysis(results, output_dir): - """Create analysis for different block sizes (XFS specific)""" - # Filter XFS results - xfs_results = [r for r in results if r.get("filesystem") == "xfs"] - - if not xfs_results: +def create_simple_performance_trends(results, output_dir): + """Create multi-node performance trends chart""" + if not results: return - # Group by block size - block_size_data = defaultdict(lambda: {"baseline": [], "dev": []}) - - for result in xfs_results: - block_size = result.get("block_size", "unknown") - category = "dev" if result.get("is_dev", False) else "baseline" - if "insert_performance" in result: - insert_qps = result["insert_performance"].get("vectors_per_second", 0) - else: - insert_qps = 0 - block_size_data[block_size][category].append(insert_qps) - - # Sort block sizes - block_sizes = sorted( - block_size_data.keys(), - key=lambda x: ( - int(x.replace("k", "").replace("s", "")) - if x not in ["unknown", "default"] - else 0 - ), - ) - - # Create grouped bar chart - baseline_means = [ - ( - np.mean(block_size_data[bs]["baseline"]) - if block_size_data[bs]["baseline"] - else 0 - ) - for bs in block_sizes - ] - dev_means = [ - np.mean(block_size_data[bs]["dev"]) if block_size_data[bs]["dev"] else 0 - for bs in block_sizes - ] - - x = np.arange(len(block_sizes)) - width = 0.35 - - fig, ax = plt.subplots(figsize=(12, 6)) - baseline_bars = ax.bar( - x - width / 2, baseline_means, width, label="Baseline", color="#2ca02c" - ) - dev_bars = ax.bar( - x + width / 2, dev_means, width, label="Development", color="#d62728" - ) - - ax.set_xlabel("Block Size") - ax.set_ylabel("Insert QPS") - ax.set_title("XFS Performance by Block Size") - ax.set_xticks(x) - ax.set_xticklabels(block_sizes) - ax.legend() - ax.grid(True, alpha=0.3) - - # Add value labels - for bars in [baseline_bars, dev_bars]: - for bar in bars: - height = bar.get_height() - if height > 0: - ax.annotate( - f"{height:.0f}", - xy=(bar.get_x() + bar.get_width() / 2, height), - xytext=(0, 3), - textcoords="offset points", - ha="center", - va="bottom", - ) - - plt.tight_layout() - plt.savefig(os.path.join(output_dir, "xfs_block_size_analysis.png"), dpi=150) - plt.close() - - -def create_heatmap_analysis(results, output_dir): - """Create a heatmap showing AVERAGE performance across all test iterations""" - # Group data by configuration and version, collecting ALL values for averaging - config_data = defaultdict( - lambda: { - "baseline": {"insert": [], "query": [], "count": 0}, - "dev": {"insert": [], "query": [], "count": 0}, - } - ) + # Group results by node + node_performance = defaultdict(lambda: { + "insert_rates": [], + "insert_times": [], + "iterations": [], + "is_dev": False, + }) for result in results: - fs = result.get("filesystem", "unknown") - block_size = result.get("block_size", "default") - config = f"{fs}-{block_size}" - version = "dev" if result.get("is_dev", False) else "baseline" - - # Get actual insert performance - if "insert_performance" in result: - insert_qps = result["insert_performance"].get("vectors_per_second", 0) - else: - insert_qps = 0 - - # Calculate average query QPS - query_qps = 0 - if "query_performance" in result: - qp = result["query_performance"] - total_qps = 0 - count = 0 - for topk_key in ["topk_1", "topk_10", "topk_100"]: - if topk_key in qp: - for batch_key in ["batch_1", "batch_10", "batch_100"]: - if batch_key in qp[topk_key]: - total_qps += qp[topk_key][batch_key].get( - "queries_per_second", 0 - ) - count += 1 - if count > 0: - query_qps = total_qps / count - - # Collect all values for averaging - config_data[config][version]["insert"].append(insert_qps) - config_data[config][version]["query"].append(query_qps) - config_data[config][version]["count"] += 1 - - # Sort configurations - configs = sorted(config_data.keys()) - - # Calculate averages for heatmap - insert_baseline = [] - insert_dev = [] - query_baseline = [] - query_dev = [] - iteration_counts = {"baseline": 0, "dev": 0} - - for c in configs: - # Calculate average insert QPS - baseline_insert_vals = config_data[c]["baseline"]["insert"] - insert_baseline.append( - np.mean(baseline_insert_vals) if baseline_insert_vals else 0 - ) - - dev_insert_vals = config_data[c]["dev"]["insert"] - insert_dev.append(np.mean(dev_insert_vals) if dev_insert_vals else 0) - - # Calculate average query QPS - baseline_query_vals = config_data[c]["baseline"]["query"] - query_baseline.append( - np.mean(baseline_query_vals) if baseline_query_vals else 0 - ) - - dev_query_vals = config_data[c]["dev"]["query"] - query_dev.append(np.mean(dev_query_vals) if dev_query_vals else 0) - - # Track iteration counts - iteration_counts["baseline"] = max( - iteration_counts["baseline"], len(baseline_insert_vals) - ) - iteration_counts["dev"] = max(iteration_counts["dev"], len(dev_insert_vals)) - - # Create figure with custom heatmap - fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 8)) - - # Create data matrices - insert_data = np.array([insert_baseline, insert_dev]).T - query_data = np.array([query_baseline, query_dev]).T - - # Insert QPS heatmap - im1 = ax1.imshow(insert_data, cmap="YlOrRd", aspect="auto") - ax1.set_xticks([0, 1]) - ax1.set_xticklabels(["Baseline", "Development"]) - ax1.set_yticks(range(len(configs))) - ax1.set_yticklabels(configs) - ax1.set_title( - f"Insert Performance - AVERAGE across {iteration_counts['baseline']} iterations\n(1M vectors, 128 dims, HNSW index)" - ) - ax1.set_ylabel("Configuration") - - # Add text annotations with dynamic color based on background - # Get the colormap to determine actual colors - cmap1 = plt.cm.YlOrRd - norm1 = plt.Normalize(vmin=insert_data.min(), vmax=insert_data.max()) - - for i in range(len(configs)): - for j in range(2): - # Get the actual color from the colormap - val = insert_data[i, j] - rgba = cmap1(norm1(val)) - # Calculate luminance using standard formula - # Perceived luminance: 0.299*R + 0.587*G + 0.114*B - luminance = 0.299 * rgba[0] + 0.587 * rgba[1] + 0.114 * rgba[2] - # Use white text on dark backgrounds (low luminance) - text_color = "white" if luminance < 0.5 else "black" + hostname, is_dev = _extract_node_info(result) + + if hostname not in node_performance: + node_performance[hostname] = { + "insert_rates": [], + "insert_times": [], + "iterations": [], + "is_dev": is_dev, + } - # Show average value with indicator - text = ax1.text( - j, - i, - f"{int(insert_data[i, j])}\n(avg)", - ha="center", - va="center", - color=text_color, - fontweight="bold", - fontsize=9, + insert_perf = result.get("insert_performance", {}) + if insert_perf: + node_performance[hostname]["insert_rates"].append( + insert_perf.get("vectors_per_second", 0) ) - - # Add colorbar - cbar1 = plt.colorbar(im1, ax=ax1) - cbar1.set_label("Insert QPS") - - # Query QPS heatmap - im2 = ax2.imshow(query_data, cmap="YlGnBu", aspect="auto") - ax2.set_xticks([0, 1]) - ax2.set_xticklabels(["Baseline", "Development"]) - ax2.set_yticks(range(len(configs))) - ax2.set_yticklabels(configs) - ax2.set_title( - f"Query Performance - AVERAGE across {iteration_counts['dev']} iterations\n(1M vectors, 128 dims, HNSW index)" - ) - - # Add text annotations with dynamic color based on background - # Get the colormap to determine actual colors - cmap2 = plt.cm.YlGnBu - norm2 = plt.Normalize(vmin=query_data.min(), vmax=query_data.max()) - - for i in range(len(configs)): - for j in range(2): - # Get the actual color from the colormap - val = query_data[i, j] - rgba = cmap2(norm2(val)) - # Calculate luminance using standard formula - # Perceived luminance: 0.299*R + 0.587*G + 0.114*B - luminance = 0.299 * rgba[0] + 0.587 * rgba[1] + 0.114 * rgba[2] - # Use white text on dark backgrounds (low luminance) - text_color = "white" if luminance < 0.5 else "black" - - # Show average value with indicator - text = ax2.text( - j, - i, - f"{int(query_data[i, j])}\n(avg)", - ha="center", - va="center", - color=text_color, - fontweight="bold", - fontsize=9, + fs_performance[config_key]["insert_times"].append( + insert_perf.get("total_time_seconds", 0) + ) + fs_performance[config_key]["iterations"].append( + len(fs_performance[config_key]["insert_rates"]) ) - # Add colorbar - cbar2 = plt.colorbar(im2, ax=ax2) - cbar2.set_label("Query QPS") - - # Add overall figure title - fig.suptitle( - "Performance Heatmap - Showing AVERAGES across Multiple Test Iterations", - fontsize=14, - fontweight="bold", - y=1.02, - ) - - plt.tight_layout() - plt.savefig( - os.path.join(output_dir, "performance_heatmap.png"), - dpi=150, - bbox_inches="tight", - ) - plt.close() - - -def create_performance_trends(results, output_dir): - """Create line charts showing performance trends""" - # Group by filesystem type - fs_types = defaultdict( - lambda: { - "configs": [], - "baseline_insert": [], - "dev_insert": [], - "baseline_query": [], - "dev_query": [], - } - ) - - for result in results: - fs = result.get("filesystem", "unknown") - block_size = result.get("block_size", "default") - config = f"{block_size}" - - if config not in fs_types[fs]["configs"]: - fs_types[fs]["configs"].append(config) - fs_types[fs]["baseline_insert"].append(0) - fs_types[fs]["dev_insert"].append(0) - fs_types[fs]["baseline_query"].append(0) - fs_types[fs]["dev_query"].append(0) - - idx = fs_types[fs]["configs"].index(config) - - # Calculate average query QPS from all test configurations - query_qps = 0 - if "query_performance" in result: - qp = result["query_performance"] - total_qps = 0 - count = 0 - for topk_key in ["topk_1", "topk_10", "topk_100"]: - if topk_key in qp: - for batch_key in ["batch_1", "batch_10", "batch_100"]: - if batch_key in qp[topk_key]: - total_qps += qp[topk_key][batch_key].get( - "queries_per_second", 0 - ) - count += 1 - if count > 0: - query_qps = total_qps / count - - if result.get("is_dev", False): - if "insert_performance" in result: - fs_types[fs]["dev_insert"][idx] = result["insert_performance"].get( - "vectors_per_second", 0 - ) - fs_types[fs]["dev_query"][idx] = query_qps - else: - if "insert_performance" in result: - fs_types[fs]["baseline_insert"][idx] = result["insert_performance"].get( - "vectors_per_second", 0 - ) - fs_types[fs]["baseline_query"][idx] = query_qps - - # Create separate plots for each filesystem - for fs, data in fs_types.items(): - if not data["configs"]: - continue - - fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 10)) - - x = range(len(data["configs"])) - - # Insert performance - ax1.plot( - x, - data["baseline_insert"], - "o-", - label="Baseline", - linewidth=2, - markersize=8, - ) - ax1.plot( - x, data["dev_insert"], "s-", label="Development", linewidth=2, markersize=8 - ) - ax1.set_xlabel("Configuration") - ax1.set_ylabel("Insert QPS") - ax1.set_title(f"{fs.upper()} Insert Performance") - ax1.set_xticks(x) - ax1.set_xticklabels(data["configs"]) - ax1.legend() + # Check if we have multi-filesystem data + if len(fs_performance) > 1: + # Multi-filesystem mode: separate lines for each filesystem + fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6)) + + colors = ["b", "r", "g", "m", "c", "y", "k"] + color_idx = 0 + + for config_key, perf_data in fs_performance.items(): + if not perf_data["insert_rates"]: + continue + + color = colors[color_idx % len(colors)] + iterations = list(range(1, len(perf_data["insert_rates"]) + 1)) + + # Plot insert rate + ax1.plot( + iterations, + perf_data["insert_rates"], + f"{color}-o", + linewidth=2, + markersize=6, + label=config_key.upper(), + ) + + # Plot insert time + ax2.plot( + iterations, + perf_data["insert_times"], + f"{color}-o", + linewidth=2, + markersize=6, + label=config_key.upper(), + ) + + color_idx += 1 + + ax1.set_xlabel("Iteration") + ax1.set_ylabel("Vectors/Second") + ax1.set_title("Milvus Insert Rate by Storage Filesystem") ax1.grid(True, alpha=0.3) - - # Query performance - ax2.plot( - x, data["baseline_query"], "o-", label="Baseline", linewidth=2, markersize=8 - ) - ax2.plot( - x, data["dev_query"], "s-", label="Development", linewidth=2, markersize=8 - ) - ax2.set_xlabel("Configuration") - ax2.set_ylabel("Query QPS") - ax2.set_title(f"{fs.upper()} Query Performance") - ax2.set_xticks(x) - ax2.set_xticklabels(data["configs"]) - ax2.legend() + ax1.legend() + + ax2.set_xlabel("Iteration") + ax2.set_ylabel("Total Time (seconds)") + ax2.set_title("Milvus Insert Time by Storage Filesystem") ax2.grid(True, alpha=0.3) - - plt.tight_layout() - plt.savefig(os.path.join(output_dir, f"{fs}_performance_trends.png"), dpi=150) - plt.close() - - -def create_simple_performance_trends(results, output_dir): - """Create a simple performance trends chart for basic Milvus testing""" - if not results: - return - - # Extract configuration from first result for display - config_text = "" - if results: - first_result = results[0] - if "config" in first_result: - cfg = first_result["config"] - config_text = ( - f"Test Config:\n" - f"• {cfg.get('vector_dataset_size', 'N/A'):,} vectors/iteration\n" - f"• {cfg.get('vector_dimensions', 'N/A')} dimensions\n" - f"• {cfg.get('index_type', 'N/A')} index" + ax2.legend() + else: + # Single filesystem mode: original behavior + fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6)) + + # Extract insert data from single filesystem + config_key = list(fs_performance.keys())[0] if fs_performance else None + if config_key: + perf_data = fs_performance[config_key] + iterations = list(range(1, len(perf_data["insert_rates"]) + 1)) + + # Plot insert rate + ax1.plot( + iterations, + perf_data["insert_rates"], + "b-o", + linewidth=2, + markersize=6, ) - - # Separate baseline and dev results - baseline_results = [r for r in results if not r.get("is_dev", False)] - dev_results = [r for r in results if r.get("is_dev", False)] - - if not baseline_results and not dev_results: - return - - fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 10)) - - # Prepare data - baseline_insert = [] - baseline_query = [] - dev_insert = [] - dev_query = [] - labels = [] - - # Process baseline results - for i, result in enumerate(baseline_results): - if "insert_performance" in result: - baseline_insert.append( - result["insert_performance"].get("vectors_per_second", 0) + ax1.set_xlabel("Iteration") + ax1.set_ylabel("Vectors/Second") + ax1.set_title("Vector Insert Rate Performance") + ax1.grid(True, alpha=0.3) + + # Plot insert time + ax2.plot( + iterations, + perf_data["insert_times"], + "r-o", + linewidth=2, + markersize=6, ) - else: - baseline_insert.append(0) - - # Calculate average query QPS - query_qps = 0 - if "query_performance" in result: - qp = result["query_performance"] - total_qps = 0 - count = 0 - for topk_key in ["topk_1", "topk_10", "topk_100"]: - if topk_key in qp: - for batch_key in ["batch_1", "batch_10", "batch_100"]: - if batch_key in qp[topk_key]: - total_qps += qp[topk_key][batch_key].get( - "queries_per_second", 0 - ) - count += 1 - if count > 0: - query_qps = total_qps / count - baseline_query.append(query_qps) - labels.append(f"Iteration {i+1}") - - # Process dev results - for result in dev_results: - if "insert_performance" in result: - dev_insert.append(result["insert_performance"].get("vectors_per_second", 0)) - else: - dev_insert.append(0) - - query_qps = 0 - if "query_performance" in result: - qp = result["query_performance"] - total_qps = 0 - count = 0 - for topk_key in ["topk_1", "topk_10", "topk_100"]: - if topk_key in qp: - for batch_key in ["batch_1", "batch_10", "batch_100"]: - if batch_key in qp[topk_key]: - total_qps += qp[topk_key][batch_key].get( - "queries_per_second", 0 - ) - count += 1 - if count > 0: - query_qps = total_qps / count - dev_query.append(query_qps) - - x = range(len(baseline_results) if baseline_results else len(dev_results)) - - # Insert performance - with visible markers for all points - if baseline_insert: - # Line plot with smaller markers - ax1.plot( - x, - baseline_insert, - "-", - label="Baseline", - linewidth=1.5, - color="blue", - alpha=0.6, - ) - # Add distinct markers for each point - ax1.scatter( - x, - baseline_insert, - s=30, - color="blue", - alpha=0.8, - edgecolors="darkblue", - linewidth=0.5, - zorder=5, - ) - if dev_insert: - # Line plot with smaller markers - ax1.plot( - x[: len(dev_insert)], - dev_insert, - "-", - label="Development", - linewidth=1.5, - color="red", - alpha=0.6, - ) - # Add distinct markers for each point - ax1.scatter( - x[: len(dev_insert)], - dev_insert, - s=30, - color="red", - alpha=0.8, - edgecolors="darkred", - linewidth=0.5, - marker="s", - zorder=5, - ) - ax1.set_xlabel("Test Iteration (same configuration, repeated for reliability)") - ax1.set_ylabel("Insert QPS (vectors/second)") - ax1.set_title("Milvus Insert Performance") - - # Handle x-axis labels to prevent overlap - num_points = len(x) - if num_points > 20: - # Show every 5th label for many iterations - step = 5 - tick_positions = list(range(0, num_points, step)) - tick_labels = [ - labels[i] if labels else f"Iteration {i+1}" for i in tick_positions - ] - ax1.set_xticks(tick_positions) - ax1.set_xticklabels(tick_labels, rotation=45, ha="right") - elif num_points > 10: - # Show every 2nd label for moderate iterations - step = 2 - tick_positions = list(range(0, num_points, step)) - tick_labels = [ - labels[i] if labels else f"Iteration {i+1}" for i in tick_positions - ] - ax1.set_xticks(tick_positions) - ax1.set_xticklabels(tick_labels, rotation=45, ha="right") - else: - # Show all labels for few iterations - ax1.set_xticks(x) - ax1.set_xticklabels(labels if labels else [f"Iteration {i+1}" for i in x]) - - ax1.legend() - ax1.grid(True, alpha=0.3) - - # Add configuration text box - compact - if config_text: - ax1.text( - 0.02, - 0.98, - config_text, - transform=ax1.transAxes, - fontsize=6, - verticalalignment="top", - bbox=dict(boxstyle="round,pad=0.3", facecolor="wheat", alpha=0.85), - ) - - # Query performance - with visible markers for all points - if baseline_query: - # Line plot - ax2.plot( - x, - baseline_query, - "-", - label="Baseline", - linewidth=1.5, - color="blue", - alpha=0.6, - ) - # Add distinct markers for each point - ax2.scatter( - x, - baseline_query, - s=30, - color="blue", - alpha=0.8, - edgecolors="darkblue", - linewidth=0.5, - zorder=5, - ) - if dev_query: - # Line plot - ax2.plot( - x[: len(dev_query)], - dev_query, - "-", - label="Development", - linewidth=1.5, - color="red", - alpha=0.6, - ) - # Add distinct markers for each point - ax2.scatter( - x[: len(dev_query)], - dev_query, - s=30, - color="red", - alpha=0.8, - edgecolors="darkred", - linewidth=0.5, - marker="s", - zorder=5, - ) - ax2.set_xlabel("Test Iteration (same configuration, repeated for reliability)") - ax2.set_ylabel("Query QPS (queries/second)") - ax2.set_title("Milvus Query Performance") - - # Handle x-axis labels to prevent overlap - num_points = len(x) - if num_points > 20: - # Show every 5th label for many iterations - step = 5 - tick_positions = list(range(0, num_points, step)) - tick_labels = [ - labels[i] if labels else f"Iteration {i+1}" for i in tick_positions - ] - ax2.set_xticks(tick_positions) - ax2.set_xticklabels(tick_labels, rotation=45, ha="right") - elif num_points > 10: - # Show every 2nd label for moderate iterations - step = 2 - tick_positions = list(range(0, num_points, step)) - tick_labels = [ - labels[i] if labels else f"Iteration {i+1}" for i in tick_positions - ] - ax2.set_xticks(tick_positions) - ax2.set_xticklabels(tick_labels, rotation=45, ha="right") - else: - # Show all labels for few iterations - ax2.set_xticks(x) - ax2.set_xticklabels(labels if labels else [f"Iteration {i+1}" for i in x]) - - ax2.legend() - ax2.grid(True, alpha=0.3) - - # Add configuration text box - compact - if config_text: - ax2.text( - 0.02, - 0.98, - config_text, - transform=ax2.transAxes, - fontsize=6, - verticalalignment="top", - bbox=dict(boxstyle="round,pad=0.3", facecolor="wheat", alpha=0.85), - ) - + ax2.set_xlabel("Iteration") + ax2.set_ylabel("Total Time (seconds)") + ax2.set_title("Vector Insert Time Performance") + ax2.grid(True, alpha=0.3) + plt.tight_layout() plt.savefig(os.path.join(output_dir, "performance_trends.png"), dpi=150) plt.close() -def generate_summary_statistics(results, output_dir): - """Generate summary statistics and save to JSON""" - # Get unique filesystems, excluding "unknown" - filesystems = set() - for r in results: - fs = r.get("filesystem", "unknown") - if fs != "unknown": - filesystems.add(fs) - - summary = { - "total_tests": len(results), - "filesystems_tested": sorted(list(filesystems)), - "configurations": {}, - "performance_summary": { - "best_insert_qps": {"value": 0, "config": ""}, - "best_query_qps": {"value": 0, "config": ""}, - "average_insert_qps": 0, - "average_query_qps": 0, - }, - } - - # Calculate statistics - all_insert_qps = [] - all_query_qps = [] - - for result in results: - fs = result.get("filesystem", "unknown") - block_size = result.get("block_size", "default") - is_dev = "dev" if result.get("is_dev", False) else "baseline" - config_name = f"{fs}-{block_size}-{is_dev}" - - # Get actual performance metrics - if "insert_performance" in result: - insert_qps = result["insert_performance"].get("vectors_per_second", 0) - else: - insert_qps = 0 - - # Calculate average query QPS - query_qps = 0 - if "query_performance" in result: - qp = result["query_performance"] - total_qps = 0 - count = 0 - for topk_key in ["topk_1", "topk_10", "topk_100"]: - if topk_key in qp: - for batch_key in ["batch_1", "batch_10", "batch_100"]: - if batch_key in qp[topk_key]: - total_qps += qp[topk_key][batch_key].get( - "queries_per_second", 0 - ) - count += 1 - if count > 0: - query_qps = total_qps / count - - all_insert_qps.append(insert_qps) - all_query_qps.append(query_qps) - - summary["configurations"][config_name] = { - "insert_qps": insert_qps, - "query_qps": query_qps, - "host": result.get("host", "unknown"), - } - - if insert_qps > summary["performance_summary"]["best_insert_qps"]["value"]: - summary["performance_summary"]["best_insert_qps"] = { - "value": insert_qps, - "config": config_name, - } - - if query_qps > summary["performance_summary"]["best_query_qps"]["value"]: - summary["performance_summary"]["best_query_qps"] = { - "value": query_qps, - "config": config_name, - } - - summary["performance_summary"]["average_insert_qps"] = ( - np.mean(all_insert_qps) if all_insert_qps else 0 - ) - summary["performance_summary"]["average_query_qps"] = ( - np.mean(all_query_qps) if all_query_qps else 0 - ) - - # Save summary - with open(os.path.join(output_dir, "summary.json"), "w") as f: - json.dump(summary, f, indent=2) - - return summary - - -def create_comprehensive_fs_comparison(results, output_dir): - """Create comprehensive filesystem performance comparison including all configurations""" - import matplotlib.pyplot as plt - import numpy as np - from collections import defaultdict - - # Collect data for all filesystem configurations - config_data = defaultdict(lambda: {"baseline": [], "dev": []}) - - for result in results: - fs = result.get("filesystem", "unknown") - block_size = result.get("block_size", "") - - # Create configuration label - if block_size and block_size != "default": - config_label = f"{fs}-{block_size}" - else: - config_label = fs - - category = "dev" if result.get("is_dev", False) else "baseline" - - # Extract performance metrics - if "insert_performance" in result: - insert_qps = result["insert_performance"].get("vectors_per_second", 0) - else: - insert_qps = 0 - - config_data[config_label][category].append(insert_qps) - - # Sort configurations for consistent display - configs = sorted(config_data.keys()) - - # Calculate means and standard deviations - baseline_means = [] - baseline_stds = [] - dev_means = [] - dev_stds = [] - - for config in configs: - baseline_vals = config_data[config]["baseline"] - dev_vals = config_data[config]["dev"] - - baseline_means.append(np.mean(baseline_vals) if baseline_vals else 0) - baseline_stds.append(np.std(baseline_vals) if baseline_vals else 0) - dev_means.append(np.mean(dev_vals) if dev_vals else 0) - dev_stds.append(np.std(dev_vals) if dev_vals else 0) - - # Create the plot - fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(14, 10)) - - x = np.arange(len(configs)) - width = 0.35 - - # Top plot: Absolute performance - baseline_bars = ax1.bar( - x - width / 2, - baseline_means, - width, - yerr=baseline_stds, - label="Baseline", - color="#1f77b4", - capsize=5, - ) - dev_bars = ax1.bar( - x + width / 2, - dev_means, - width, - yerr=dev_stds, - label="Development", - color="#ff7f0e", - capsize=5, - ) - - ax1.set_ylabel("Insert QPS") - ax1.set_title("Vector Database Performance Across Filesystem Configurations") - ax1.set_xticks(x) - ax1.set_xticklabels(configs, rotation=45, ha="right") - ax1.legend() - ax1.grid(True, alpha=0.3) - - # Add value labels on bars - for bars in [baseline_bars, dev_bars]: - for bar in bars: - height = bar.get_height() - if height > 0: - ax1.annotate( - f"{height:.0f}", - xy=(bar.get_x() + bar.get_width() / 2, height), - xytext=(0, 3), - textcoords="offset points", - ha="center", - va="bottom", - fontsize=8, - ) - - # Bottom plot: Percentage improvement (dev vs baseline) - improvements = [] - for i in range(len(configs)): - if baseline_means[i] > 0: - improvement = ((dev_means[i] - baseline_means[i]) / baseline_means[i]) * 100 - else: - improvement = 0 - improvements.append(improvement) - - colors = ["green" if x > 0 else "red" for x in improvements] - improvement_bars = ax2.bar(x, improvements, color=colors, alpha=0.7) - - ax2.set_ylabel("Performance Change (%)") - ax2.set_title("Development vs Baseline Performance Change") - ax2.set_xticks(x) - ax2.set_xticklabels(configs, rotation=45, ha="right") - ax2.axhline(y=0, color="black", linestyle="-", linewidth=0.5) - ax2.grid(True, alpha=0.3) - - # Add percentage labels - for bar, val in zip(improvement_bars, improvements): - ax2.annotate( - f"{val:.1f}%", - xy=(bar.get_x() + bar.get_width() / 2, val), - xytext=(0, 3 if val > 0 else -15), - textcoords="offset points", - ha="center", - va="bottom" if val > 0 else "top", - fontsize=8, - ) - - plt.tight_layout() - plt.savefig(os.path.join(output_dir, "comprehensive_fs_comparison.png"), dpi=150) - plt.close() - - -def create_fs_latency_comparison(results, output_dir): - """Create latency comparison across filesystems""" - import matplotlib.pyplot as plt - import numpy as np - from collections import defaultdict - - # Collect latency data - config_latency = defaultdict(lambda: {"baseline": [], "dev": []}) - - for result in results: - fs = result.get("filesystem", "unknown") - block_size = result.get("block_size", "") - - if block_size and block_size != "default": - config_label = f"{fs}-{block_size}" - else: - config_label = fs - - category = "dev" if result.get("is_dev", False) else "baseline" - - # Extract latency metrics - if "query_performance" in result: - latency_p99 = result["query_performance"].get("latency_p99_ms", 0) - else: - latency_p99 = 0 - - if latency_p99 > 0: - config_latency[config_label][category].append(latency_p99) - - if not config_latency: +def create_heatmap_analysis(results, output_dir): + """Create multi-filesystem heatmap showing query performance""" + if not results: return - # Sort configurations - configs = sorted(config_latency.keys()) - - # Calculate statistics - baseline_p99 = [] - dev_p99 = [] - - for config in configs: - baseline_vals = config_latency[config]["baseline"] - dev_vals = config_latency[config]["dev"] - - baseline_p99.append(np.mean(baseline_vals) if baseline_vals else 0) - dev_p99.append(np.mean(dev_vals) if dev_vals else 0) - - # Create plot - fig, ax = plt.subplots(figsize=(12, 6)) - - x = np.arange(len(configs)) - width = 0.35 - - baseline_bars = ax.bar( - x - width / 2, baseline_p99, width, label="Baseline P99", color="#9467bd" - ) - dev_bars = ax.bar( - x + width / 2, dev_p99, width, label="Development P99", color="#e377c2" - ) - - ax.set_xlabel("Filesystem Configuration") - ax.set_ylabel("Latency P99 (ms)") - ax.set_title("Query Latency (P99) Comparison Across Filesystems") - ax.set_xticks(x) - ax.set_xticklabels(configs, rotation=45, ha="right") - ax.legend() - ax.grid(True, alpha=0.3) - - # Add value labels - for bars in [baseline_bars, dev_bars]: - for bar in bars: - height = bar.get_height() - if height > 0: - ax.annotate( - f"{height:.1f}", - xy=(bar.get_x() + bar.get_width() / 2, height), - xytext=(0, 3), - textcoords="offset points", - ha="center", - va="bottom", - fontsize=8, - ) + # Group data by filesystem configuration + fs_performance = defaultdict(lambda: { + "query_data": [], + "config_key": "", + }) + for result in results: + fs_type, block_size, config_key = _extract_filesystem_config(result) + + query_perf = result.get("query_performance", {}) + for topk, topk_data in query_perf.items(): + for batch, batch_data in topk_data.items(): + qps = batch_data.get("queries_per_second", 0) + fs_performance[config_key]["query_data"].append({ + "topk": topk, + "batch": batch, + "qps": qps, + }) + fs_performance[config_key]["config_key"] = config_key + + # Check if we have multi-filesystem data + if len(fs_performance) > 1: + # Multi-filesystem mode: separate heatmaps for each filesystem + num_fs = len(fs_performance) + fig, axes = plt.subplots(1, num_fs, figsize=(5*num_fs, 6)) + if num_fs == 1: + axes = [axes] + + # Define common structure for consistency + topk_order = ["topk_1", "topk_10", "topk_100"] + batch_order = ["batch_1", "batch_10", "batch_100"] + + for idx, (config_key, perf_data) in enumerate(fs_performance.items()): + # Create matrix for this filesystem + matrix = np.zeros((len(topk_order), len(batch_order))) + + # Fill matrix with data + query_dict = {} + for item in perf_data["query_data"]: + query_dict[(item["topk"], item["batch"])] = item["qps"] + + for i, topk in enumerate(topk_order): + for j, batch in enumerate(batch_order): + matrix[i, j] = query_dict.get((topk, batch), 0) + + # Plot heatmap + im = axes[idx].imshow(matrix, cmap='viridis', aspect='auto') + axes[idx].set_title(f"{config_key.upper()} Query Performance") + axes[idx].set_xticks(range(len(batch_order))) + axes[idx].set_xticklabels([b.replace("batch_", "Batch ") for b in batch_order]) + axes[idx].set_yticks(range(len(topk_order))) + axes[idx].set_yticklabels([t.replace("topk_", "Top-") for t in topk_order]) + + # Add text annotations + for i in range(len(topk_order)): + for j in range(len(batch_order)): + axes[idx].text(j, i, f'{matrix[i, j]:.0f}', + ha="center", va="center", color="white", fontweight="bold") + + # Add colorbar + cbar = plt.colorbar(im, ax=axes[idx]) + cbar.set_label('Queries Per Second (QPS)') + else: + # Single filesystem mode + fig, ax = plt.subplots(1, 1, figsize=(8, 6)) + + if fs_performance: + config_key = list(fs_performance.keys())[0] + perf_data = fs_performance[config_key] + + # Create matrix + topk_order = ["topk_1", "topk_10", "topk_100"] + batch_order = ["batch_1", "batch_10", "batch_100"] + matrix = np.zeros((len(topk_order), len(batch_order))) + + # Fill matrix with data + query_dict = {} + for item in perf_data["query_data"]: + query_dict[(item["topk"], item["batch"])] = item["qps"] + + for i, topk in enumerate(topk_order): + for j, batch in enumerate(batch_order): + matrix[i, j] = query_dict.get((topk, batch), 0) + + # Plot heatmap + im = ax.imshow(matrix, cmap='viridis', aspect='auto') + ax.set_title("Milvus Query Performance Heatmap") + ax.set_xticks(range(len(batch_order))) + ax.set_xticklabels([b.replace("batch_", "Batch ") for b in batch_order]) + ax.set_yticks(range(len(topk_order))) + ax.set_yticklabels([t.replace("topk_", "Top-") for t in topk_order]) + + # Add text annotations + for i in range(len(topk_order)): + for j in range(len(batch_order)): + ax.text(j, i, f'{matrix[i, j]:.0f}', + ha="center", va="center", color="white", fontweight="bold") + + # Add colorbar + cbar = plt.colorbar(im, ax=ax) + cbar.set_label('Queries Per Second (QPS)') + plt.tight_layout() - plt.savefig(os.path.join(output_dir, "filesystem_latency_comparison.png"), dpi=150) + plt.savefig(os.path.join(output_dir, "performance_heatmap.png"), dpi=150, bbox_inches="tight") plt.close() @@ -1119,56 +340,23 @@ def main(): results_dir = sys.argv[1] output_dir = sys.argv[2] - # Create output directory + # Ensure output directory exists os.makedirs(output_dir, exist_ok=True) # Load results results = load_results(results_dir) - if not results: - print("No results found to analyze") + print(f"No valid results found in {results_dir}") sys.exit(1) print(f"Loaded {len(results)} result files") # Generate graphs - print("Generating performance heatmap...") - create_heatmap_analysis(results, output_dir) - - print("Generating performance trends...") create_simple_performance_trends(results, output_dir) + create_heatmap_analysis(results, output_dir) - print("Generating summary statistics...") - summary = generate_summary_statistics(results, output_dir) - - # Check if we have multiple filesystems to compare - filesystems = set(r.get("filesystem", "unknown") for r in results) - if len(filesystems) > 1: - print("Generating filesystem comparison chart...") - create_filesystem_comparison_chart(results, output_dir) - - print("Generating comprehensive filesystem comparison...") - create_comprehensive_fs_comparison(results, output_dir) - - print("Generating filesystem latency comparison...") - create_fs_latency_comparison(results, output_dir) - - # Check if we have XFS results with different block sizes - xfs_results = [r for r in results if r.get("filesystem") == "xfs"] - block_sizes = set(r.get("block_size", "unknown") for r in xfs_results) - if len(block_sizes) > 1: - print("Generating XFS block size analysis...") - create_block_size_analysis(results, output_dir) - - print(f"\nAnalysis complete! Graphs saved to {output_dir}") - print(f"Total configurations tested: {summary['total_tests']}") - print( - f"Best insert QPS: {summary['performance_summary']['best_insert_qps']['value']} ({summary['performance_summary']['best_insert_qps']['config']})" - ) - print( - f"Best query QPS: {summary['performance_summary']['best_query_qps']['value']} ({summary['performance_summary']['best_query_qps']['config']})" - ) + print(f"Graphs generated in {output_dir}") if __name__ == "__main__": - main() + main() \ No newline at end of file diff --git a/workflows/ai/scripts/generate_html_report.py b/workflows/ai/scripts/generate_html_report.py index 3aa8342f..01ec734c 100755 --- a/workflows/ai/scripts/generate_html_report.py +++ b/workflows/ai/scripts/generate_html_report.py @@ -180,7 +180,7 @@ HTML_TEMPLATE = """
    -

    AI Vector Database Benchmark Results

    +

    Milvus Vector Database Benchmark Results

    Generated on {timestamp}
    @@ -238,11 +238,13 @@ HTML_TEMPLATE = """
-

Detailed Results Table

+

Milvus Performance by Storage Filesystem

+

This table shows how Milvus vector database performs when its data is stored on different filesystem types and configurations.

- + + @@ -293,27 +295,53 @@ def load_results(results_dir): # Get filesystem from JSON data fs_type = data.get("filesystem", None) - # If not in JSON, try to parse from filename (backwards compatibility) - if not fs_type and "debian13-ai" in filename: - host_parts = ( - filename.replace("results_debian13-ai-", "") - .replace("_1.json", "") + # Always try to parse from filename first since JSON data might be wrong + if "-ai-" in filename: + # Handle both debian13-ai- and prod-ai- prefixes + cleaned_filename = filename.replace("results_", "") + + # Extract the part after -ai- + if "debian13-ai-" in cleaned_filename: + host_part = cleaned_filename.replace("debian13-ai-", "") + elif "prod-ai-" in cleaned_filename: + host_part = cleaned_filename.replace("prod-ai-", "") + else: + # Generic extraction + ai_index = cleaned_filename.find("-ai-") + if ai_index != -1: + host_part = cleaned_filename[ai_index + 4 :] # Skip "-ai-" + else: + host_part = cleaned_filename + + # Remove file extensions and dev suffix + host_part = ( + host_part.replace("_1.json", "") .replace("_2.json", "") .replace("_3.json", "") - .split("-") + .replace("-dev", "") ) - if "xfs" in host_parts[0]: + + # Parse filesystem type and block size + if host_part.startswith("xfs-"): fs_type = "xfs" - block_size = host_parts[1] if len(host_parts) > 1 else "4k" - elif "ext4" in host_parts[0]: + # Extract block size: xfs-4k-4ks -> 4k + parts = host_part.split("-") + if len(parts) >= 2: + block_size = parts[1] # 4k, 16k, 32k, 64k + else: + block_size = "4k" + elif host_part.startswith("ext4-"): fs_type = "ext4" - block_size = host_parts[1] if len(host_parts) > 1 else "4k" - elif "btrfs" in host_parts[0]: + parts = host_part.split("-") + block_size = parts[1] if len(parts) > 1 else "4k" + elif host_part.startswith("btrfs"): fs_type = "btrfs" block_size = "default" else: - fs_type = "unknown" - block_size = "unknown" + # Fallback to JSON data if available + if not fs_type: + fs_type = "unknown" + block_size = "unknown" else: # Set appropriate block size based on filesystem if fs_type == "btrfs": @@ -371,12 +399,36 @@ def generate_table_rows(results, best_configs): if config_key in best_configs: row_class += " best-config" + # Generate descriptive labels showing Milvus is running on this filesystem + if result["filesystem"] == "xfs" and result["block_size"] != "default": + storage_label = f"XFS {result['block_size'].upper()}" + config_details = f"Block size: {result['block_size']}, Milvus data on XFS" + elif result["filesystem"] == "ext4": + storage_label = "EXT4" + if "bigalloc" in result.get("host", "").lower(): + config_details = "EXT4 with bigalloc, Milvus data on ext4" + else: + config_details = ( + f"Block size: {result['block_size']}, Milvus data on ext4" + ) + elif result["filesystem"] == "btrfs": + storage_label = "BTRFS" + config_details = "Default Btrfs settings, Milvus data on Btrfs" + else: + storage_label = result["filesystem"].upper() + config_details = f"Milvus data on {result['filesystem']}" + + # Extract clean node identifier from hostname + node_name = result["host"].replace("results_", "").replace(".json", "") + row = f""" - + + + """ @@ -483,8 +535,8 @@ def generate_html_report(results_dir, graphs_dir, output_path):
  • Block Size Analysis
  • """ filesystem_comparison_section = """
    -

    Filesystem Performance Comparison

    -

    Comparison of vector database performance across different filesystems, showing both baseline and development kernel results.

    +

    Milvus Storage Filesystem Comparison

    +

    Comparison of Milvus vector database performance when its data is stored on different filesystem types (XFS, ext4, Btrfs) with various configurations.

    Filesystem Comparison
    @@ -499,9 +551,9 @@ def generate_html_report(results_dir, graphs_dir, output_path):
    """ # Multi-fs mode: show filesystem info - fourth_card_title = "Filesystems Tested" + fourth_card_title = "Storage Filesystems" fourth_card_value = str(len(filesystems_tested)) - fourth_card_label = ", ".join(filesystems_tested).upper() + fourth_card_label = ", ".join(filesystems_tested).upper() + " for Milvus Data" else: # Single filesystem mode - hide multi-fs sections filesystem_nav_items = "" -- 2.50.1
    HostFilesystemConfiguration Type Insert QPS Query QPS
    {result['host']}{storage_label}{config_details} {result['type']} {result['insert_qps']:,} {result['query_qps']:,}{node_name} {result['timestamp']}