From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 60B3A30CD8D
	for <kdevops@lists.linux.dev>; Wed, 27 Aug 2025 09:32:05 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1756287134; cv=none; b=Z/YImF7OalifavEXo7BvB1nC2Q8JCx6I2ISUsvNW/YDrGSXqUCTeiQi1dWfEy0FPpneNbEdJon/y2/K3kPf7x7KUUvJLdMf2I2XP2/jWtA6j2A+xBUM0dL7OAYPst0iinw7MUvYQedA3VY4VHYrpOVEKpzasiS1ZyKwBDNfqNVQ=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1756287134; c=relaxed/simple;
	bh=pEerM1B8e82GXReitrM2PASArSg4uCblWjiWw+P6Hmg=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version:Content-Type; b=S5WbaMMFKHp0qAnkpGFQOGTvVZd7SNWBRpsmblz0Y8BTz7MDCuPnRPpkrR4yc2yNl52GlK7dew/AGOAhC5G2BgSJTza6Jvo53WtVgu6mGIlHVD1WFu6TAqs0AdCQoGQfJEkpuOTpGHPUMrC09P0i06lvNruNdV97OC93I54JCX0=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=OdPqPDt/; arc=none smtp.client-ip=198.137.202.133
Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org
Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="OdPqPDt/"
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
	d=infradead.org; s=bombadil.20210309; h=Sender:Content-Transfer-Encoding:
	Content-Type:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:
	To:From:Reply-To:Content-ID:Content-Description;
	bh=bKGskR1TW/OvyXon+3jjQnz0cEB/ItdE7aO8yiV9waM=; b=OdPqPDt/cNT/goHPHvVI8bdghF
	b+p+h1iHbISwMWu/dqDbbV7I1o2vFu8LUpoXbU49n5AEPzhqYPbmanrtc0pj2f0TIaAjdwvoLat4H
	GJ+pIk7eQpnuH3V/jD1LbTnHENAZv2Xi1eFfhQluj8iZ9f8uX7cV9p9jbVccYHTtHWutVnw2cSvGj
	OHvv7IC0yKb9qzBIvpSsGkxBZrXC88+ig/bWZbw5SHu7IIyk4cJMGCVaLmm/YIZ7hpllLcxJzaDpg
	SiTkjVkUeZu/UR505zrAhtyE2SxkIj/1jAEvUrBeBTnO6EEj3bv1TUDLZ5iqlMwR+9/UntFtjTsWI
	nsNeqAXw==;
Received: from mcgrof by bombadil.infradead.org with local (Exim 4.98.2 #2 (Red Hat Linux))
	id 1urCVg-0000000Equo-2GGx;
	Wed, 27 Aug 2025 09:32:04 +0000
From: Luis Chamberlain <mcgrof@kernel.org>
To: Chuck Lever <cel@kernel.org>,
	Daniel Gomez <da.gomez@kruces.com>,
	hui81.qi@samsung.com,
	kundan.kumar@samsung.com,
	kdevops@lists.linux.dev
Cc: Luis Chamberlain <mcgrof@kernel.org>
Subject: [PATCH 2/2] ai: add multi-filesystem testing support for Milvus benchmarks
Date: Wed, 27 Aug 2025 02:32:01 -0700
Message-ID: <20250827093202.3539990-3-mcgrof@kernel.org>
X-Mailer: git-send-email 2.49.0
In-Reply-To: <20250827093202.3539990-1-mcgrof@kernel.org>
References: <20250827093202.3539990-1-mcgrof@kernel.org>
Precedence: bulk
X-Mailing-List: kdevops@lists.linux.dev
List-Id: <kdevops.lists.linux.dev>
List-Subscribe: <mailto:kdevops+subscribe@lists.linux.dev>
List-Unsubscribe: <mailto:kdevops+unsubscribe@lists.linux.dev>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Sender: Luis Chamberlain <mcgrof@infradead.org>

Extend the AI workflow to support testing Milvus across multiple
filesystem configurations simultaneously. This enables comprehensive
performance comparisons between different filesystems and their
configuration options.

Key features:
- Dynamic node generation based on enabled filesystem configurations
- Support for XFS, EXT4, and BTRFS with various mount options
- Per-filesystem result collection and analysis
- A/B testing across all filesystem configurations
- Automated comparison graphs between filesystems

Filesystem configurations:
- XFS: default, nocrc, bigtime with various block sizes (512, 1k, 2k, 4k)
- EXT4: default, nojournal, bigalloc configurations
- BTRFS: default, zlib, lzo, zstd compression options

Defconfigs:
- ai-milvus-multifs: Test 7 filesystem configs with A/B testing
- ai-milvus-multifs-distro: Test with distribution kernels
- ai-milvus-multifs-extended: Extended configs (14 filesystems total)

Node generation:
The system dynamically generates nodes based on enabled filesystem
configurations. With A/B testing enabled, this creates baseline and
dev nodes for each filesystem (e.g., debian13-ai-xfs-4k and
debian13-ai-xfs-4k-dev).

Usage:
  make defconfig-ai-milvus-multifs
  make bringup    # Creates nodes for each filesystem
  make ai         # Setup infrastructure on all nodes
  make ai-tests   # Run benchmarks on all filesystems
  make ai-results # Collect and compare results

This enables systematic evaluation of how different filesystems and
their configurations affect vector database performance.

Generated-by: Claude AI
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 .github/workflows/docker-tests.yml            |    6 +
 Makefile                                      |    2 +-
 defconfigs/ai-milvus-multifs                  |   67 +
 defconfigs/ai-milvus-multifs-distro           |  109 ++
 defconfigs/ai-milvus-multifs-extended         |  108 ++
 docs/ai/vector-databases/README.md            |    1 -
 playbooks/ai_install.yml                      |    6 +
 playbooks/ai_multifs.yml                      |   24 +
 .../host_vars/debian13-ai-xfs-4k-4ks.yml      |   10 -
 .../files/analyze_results.py                  | 1132 +++++++++++---
 .../files/generate_better_graphs.py           |   16 +-
 .../files/generate_graphs.py                  |  888 ++++-------
 .../files/generate_html_report.py             |  263 +++-
 .../roles/ai_collect_results/tasks/main.yml   |   42 +-
 .../templates/analysis_config.json.j2         |    2 +-
 .../roles/ai_milvus_storage/tasks/main.yml    |  161 ++
 .../tasks/generate_comparison.yml             |  279 ++++
 playbooks/roles/ai_multifs_run/tasks/main.yml |   23 +
 .../tasks/run_single_filesystem.yml           |  104 ++
 .../templates/milvus_config.json.j2           |   42 +
 .../roles/ai_multifs_setup/defaults/main.yml  |   49 +
 .../roles/ai_multifs_setup/tasks/main.yml     |   70 +
 .../files/milvus_benchmark.py                 |  164 +-
 playbooks/roles/gen_hosts/tasks/main.yml      |   19 +
 .../roles/gen_hosts/templates/fstests.j2      |    2 +
 playbooks/roles/gen_hosts/templates/gitr.j2   |    2 +
 playbooks/roles/gen_hosts/templates/hosts.j2  |   35 +-
 .../roles/gen_hosts/templates/nfstest.j2      |    2 +
 playbooks/roles/gen_hosts/templates/pynfs.j2  |    2 +
 playbooks/roles/gen_nodes/tasks/main.yml      |   90 ++
 .../roles/guestfs/tasks/bringup/main.yml      |   15 +
 scripts/guestfs.Makefile                      |    2 +-
 workflows/ai/Kconfig                          |   13 +
 workflows/ai/Kconfig.fs                       |  118 ++
 workflows/ai/Kconfig.multifs                  |  184 +++
 workflows/ai/scripts/analysis_config.json     |    2 +-
 workflows/ai/scripts/analyze_results.py       | 1132 +++++++++++---
 workflows/ai/scripts/generate_graphs.py       | 1372 ++++-------------
 workflows/ai/scripts/generate_html_report.py  |   94 +-
 39 files changed, 4356 insertions(+), 2296 deletions(-)
 create mode 100644 defconfigs/ai-milvus-multifs
 create mode 100644 defconfigs/ai-milvus-multifs-distro
 create mode 100644 defconfigs/ai-milvus-multifs-extended
 create mode 100644 playbooks/ai_multifs.yml
 delete mode 100644 playbooks/host_vars/debian13-ai-xfs-4k-4ks.yml
 create mode 100644 playbooks/roles/ai_milvus_storage/tasks/main.yml
 create mode 100644 playbooks/roles/ai_multifs_run/tasks/generate_comparison.yml
 create mode 100644 playbooks/roles/ai_multifs_run/tasks/main.yml
 create mode 100644 playbooks/roles/ai_multifs_run/tasks/run_single_filesystem.yml
 create mode 100644 playbooks/roles/ai_multifs_run/templates/milvus_config.json.j2
 create mode 100644 playbooks/roles/ai_multifs_setup/defaults/main.yml
 create mode 100644 playbooks/roles/ai_multifs_setup/tasks/main.yml
 create mode 100644 workflows/ai/Kconfig.fs
 create mode 100644 workflows/ai/Kconfig.multifs

diff --git a/.github/workflows/docker-tests.yml b/.github/workflows/docker-tests.yml
index c0e0d03d..adea1182 100644
--- a/.github/workflows/docker-tests.yml
+++ b/.github/workflows/docker-tests.yml
@@ -53,3 +53,9 @@ jobs:
           echo "Running simple make targets on ${{ matrix.distro_container }} environment"
           make mrproper
 
+      - name: Test fio-tests defconfig
+        run: |
+          echo "Testing fio-tests CI configuration"
+          make defconfig-fio-tests-ci
+          make
+          echo "Configuration test passed for fio-tests"
diff --git a/Makefile b/Makefile
index 8755577e..83c67340 100644
--- a/Makefile
+++ b/Makefile
@@ -226,7 +226,7 @@ include scripts/bringup.Makefile
 endif
 
 DEFAULT_DEPS += $(ANSIBLE_INVENTORY_FILE)
-$(ANSIBLE_INVENTORY_FILE): .config $(ANSIBLE_CFG_FILE) $(KDEVOPS_HOSTS_TEMPLATE)
+$(ANSIBLE_INVENTORY_FILE): .config $(ANSIBLE_CFG_FILE) $(KDEVOPS_HOSTS_TEMPLATE) $(KDEVOPS_NODES)
 	$(Q)ANSIBLE_LOCALHOST_WARNING=False ANSIBLE_INVENTORY_UNPARSED_WARNING=False \
 		ansible-playbook $(ANSIBLE_VERBOSE) \
 		$(KDEVOPS_PLAYBOOKS_DIR)/gen_hosts.yml \
diff --git a/defconfigs/ai-milvus-multifs b/defconfigs/ai-milvus-multifs
new file mode 100644
index 00000000..7e5ad971
--- /dev/null
+++ b/defconfigs/ai-milvus-multifs
@@ -0,0 +1,67 @@
+CONFIG_GUESTFS=y
+CONFIG_LIBVIRT=y
+
+# Disable mirror features for CI/testing
+# CONFIG_ENABLE_LOCAL_LINUX_MIRROR is not set
+# CONFIG_USE_LOCAL_LINUX_MIRROR is not set
+# CONFIG_INSTALL_ONLY_GIT_DAEMON is not set
+# CONFIG_MIRROR_INSTALL is not set
+
+CONFIG_WORKFLOWS=y
+CONFIG_WORKFLOW_LINUX_CUSTOM=y
+
+CONFIG_BOOTLINUX=y
+CONFIG_BOOTLINUX_9P=y
+
+# Enable A/B testing with different kernel references
+CONFIG_KDEVOPS_BASELINE_AND_DEV=y
+CONFIG_BOOTLINUX_AB_DIFFERENT_REF=y
+
+# AI workflow configuration
+CONFIG_WORKFLOWS_TESTS=y
+CONFIG_WORKFLOWS_LINUX_TESTS=y
+CONFIG_WORKFLOWS_DEDICATED_WORKFLOW=y
+CONFIG_KDEVOPS_WORKFLOW_DEDICATE_AI=y
+
+# Vector database configuration
+CONFIG_AI_TESTS_VECTOR_DATABASE=y
+CONFIG_AI_VECTOR_DB_MILVUS=y
+CONFIG_AI_VECTOR_DB_MILVUS_DOCKER=y
+
+# Enable multi-filesystem testing
+CONFIG_AI_MULTIFS_ENABLE=y
+CONFIG_AI_ENABLE_MULTIFS_TESTING=y
+
+# Enable dedicated Milvus storage with node-based filesystem
+CONFIG_AI_MILVUS_STORAGE_ENABLE=y
+CONFIG_AI_MILVUS_USE_NODE_FS=y
+
+# Test XFS with different block sizes
+CONFIG_AI_MULTIFS_TEST_XFS=y
+CONFIG_AI_MULTIFS_XFS_4K_4KS=y
+CONFIG_AI_MULTIFS_XFS_16K_4KS=y
+CONFIG_AI_MULTIFS_XFS_32K_4KS=y
+CONFIG_AI_MULTIFS_XFS_64K_4KS=y
+
+# Test EXT4 configurations
+CONFIG_AI_MULTIFS_TEST_EXT4=y
+CONFIG_AI_MULTIFS_EXT4_4K=y
+CONFIG_AI_MULTIFS_EXT4_16K_BIGALLOC=y
+
+# Test BTRFS
+CONFIG_AI_MULTIFS_TEST_BTRFS=y
+CONFIG_AI_MULTIFS_BTRFS_DEFAULT=y
+
+# Performance settings
+CONFIG_AI_BENCHMARK_ENABLE_GRAPHING=y
+CONFIG_AI_BENCHMARK_ITERATIONS=5
+
+# Dataset configuration for benchmarking
+CONFIG_AI_VECTOR_DB_MILVUS_DATASET_SIZE=100000
+CONFIG_AI_VECTOR_DB_MILVUS_BATCH_SIZE=10000
+CONFIG_AI_VECTOR_DB_MILVUS_NUM_QUERIES=10000
+
+# Container configuration
+CONFIG_AI_VECTOR_DB_MILVUS_CONTAINER_IMAGE_2_5=y
+CONFIG_AI_VECTOR_DB_MILVUS_MEMORY_LIMIT="8g"
+CONFIG_AI_VECTOR_DB_MILVUS_CPU_LIMIT="4.0"
\ No newline at end of file
diff --git a/defconfigs/ai-milvus-multifs-distro b/defconfigs/ai-milvus-multifs-distro
new file mode 100644
index 00000000..fb71f2b5
--- /dev/null
+++ b/defconfigs/ai-milvus-multifs-distro
@@ -0,0 +1,109 @@
+# AI Multi-Filesystem Performance Testing Configuration (Distro Kernel)
+# This configuration enables testing AI workloads across multiple filesystem
+# configurations including XFS (4k and 16k block sizes), ext4 (4k and 16k bigalloc),
+# and btrfs (default profile) using the distribution kernel without A/B testing.
+
+# Base virtualization setup
+CONFIG_LIBVIRT=y
+CONFIG_LIBVIRT_MACHINE_TYPE_Q35=y
+CONFIG_LIBVIRT_STORAGE_POOL_PATH="/opt/kdevops/libvirt"
+CONFIG_LIBVIRT_ENABLE_LARGEIO=y
+CONFIG_LIBVIRT_EXTRA_STORAGE_DRIVE_NVME=y
+CONFIG_LIBVIRT_EXTRA_STORAGE_DRIVE_SIZE="50GiB"
+
+# Network configuration
+CONFIG_LIBVIRT_ENABLE_BRIDGED_NETWORKING=y
+CONFIG_LIBVIRT_NET_NAME="kdevops"
+
+# Host configuration
+CONFIG_KDEVOPS_HOSTS_TEMPLATE="hosts.j2"
+CONFIG_VAGRANT_NVME_DISK_SIZE="50GiB"
+
+# Base system requirements
+CONFIG_WORKFLOWS=y
+CONFIG_WORKFLOWS_TESTS=y
+CONFIG_WORKFLOWS_LINUX_TESTS=y
+CONFIG_WORKFLOWS_DEDICATED_WORKFLOW=y
+CONFIG_KDEVOPS_WORKFLOW_DEDICATE_AI=y
+
+# AI Workflow Configuration
+CONFIG_KDEVOPS_WORKFLOW_ENABLE_AI=y
+CONFIG_AI_TESTS_VECTOR_DATABASE=y
+CONFIG_AI_MILVUS_DOCKER=y
+CONFIG_AI_VECTOR_DB_TYPE_MILVUS=y
+
+# Milvus Configuration
+CONFIG_AI_MILVUS_HOST="localhost"
+CONFIG_AI_MILVUS_PORT=19530
+CONFIG_AI_MILVUS_DATABASE_NAME="ai_benchmark"
+
+# Test Parameters (optimized for multi-fs testing)
+CONFIG_AI_BENCHMARK_ITERATIONS=3
+CONFIG_AI_DATASET_1M=y
+CONFIG_AI_VECTOR_DIM_128=y
+CONFIG_AI_BENCHMARK_RUNTIME="180"
+CONFIG_AI_BENCHMARK_WARMUP_TIME="30"
+
+# Query patterns
+CONFIG_AI_BENCHMARK_QUERY_TOPK_1=y
+CONFIG_AI_BENCHMARK_QUERY_TOPK_10=y
+
+# Batch sizes
+CONFIG_AI_BENCHMARK_BATCH_1=y
+CONFIG_AI_BENCHMARK_BATCH_10=y
+
+# Index configuration
+CONFIG_AI_INDEX_HNSW=y
+CONFIG_AI_INDEX_TYPE="HNSW"
+CONFIG_AI_INDEX_HNSW_M=16
+CONFIG_AI_INDEX_HNSW_EF_CONSTRUCTION=200
+CONFIG_AI_INDEX_HNSW_EF=64
+
+# Results and visualization
+CONFIG_AI_BENCHMARK_RESULTS_DIR="/data/ai-benchmark"
+CONFIG_AI_BENCHMARK_ENABLE_GRAPHING=y
+CONFIG_AI_BENCHMARK_GRAPH_FORMAT="png"
+CONFIG_AI_BENCHMARK_GRAPH_DPI=300
+CONFIG_AI_BENCHMARK_GRAPH_THEME="default"
+
+# Multi-filesystem testing configuration
+CONFIG_AI_ENABLE_MULTIFS_TESTING=y
+CONFIG_AI_MULTIFS_RESULTS_DIR="/data/ai-multifs-benchmark"
+
+# Enable dedicated Milvus storage with node-based filesystem
+CONFIG_AI_MILVUS_STORAGE_ENABLE=y
+CONFIG_AI_MILVUS_USE_NODE_FS=y
+CONFIG_AI_MILVUS_DEVICE="/dev/disk/by-id/nvme-QEMU_NVMe_Ctrl_kdevops3"
+CONFIG_AI_MILVUS_MOUNT_POINT="/data/milvus"
+
+# XFS configurations
+CONFIG_AI_MULTIFS_TEST_XFS=y
+CONFIG_AI_MULTIFS_XFS_4K_4KS=y
+CONFIG_AI_MULTIFS_XFS_16K_4KS=y
+CONFIG_AI_MULTIFS_XFS_32K_4KS=y
+CONFIG_AI_MULTIFS_XFS_64K_4KS=y
+
+# ext4 configurations
+CONFIG_AI_MULTIFS_TEST_EXT4=y
+CONFIG_AI_MULTIFS_EXT4_4K=y
+CONFIG_AI_MULTIFS_EXT4_16K_BIGALLOC=y
+
+# btrfs configurations
+CONFIG_AI_MULTIFS_TEST_BTRFS=y
+CONFIG_AI_MULTIFS_BTRFS_DEFAULT=y
+
+# Standard filesystem configuration (for comparison)
+CONFIG_AI_FILESYSTEM_XFS=y
+CONFIG_AI_FILESYSTEM="xfs"
+CONFIG_AI_FSTYPE="xfs"
+CONFIG_AI_XFS_MKFS_OPTS="-f -s size=4096"
+CONFIG_AI_XFS_MOUNT_OPTS="rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota"
+
+# Use distribution kernel (no kernel building)
+# CONFIG_BOOTLINUX is not set
+
+# Memory configuration
+CONFIG_LIBVIRT_MEM_MB=16384
+
+# Disable A/B testing to use single baseline configuration
+# CONFIG_KDEVOPS_BASELINE_AND_DEV is not set
diff --git a/defconfigs/ai-milvus-multifs-extended b/defconfigs/ai-milvus-multifs-extended
new file mode 100644
index 00000000..7886c8c4
--- /dev/null
+++ b/defconfigs/ai-milvus-multifs-extended
@@ -0,0 +1,108 @@
+# AI Extended Multi-Filesystem Performance Testing Configuration (Distro Kernel)
+# This configuration enables testing AI workloads across multiple filesystem
+# configurations including XFS (4k, 16k, 32k, 64k block sizes), ext4 (4k and 16k bigalloc),
+# and btrfs (default profile) using the distribution kernel without A/B testing.
+
+# Base virtualization setup
+CONFIG_LIBVIRT=y
+CONFIG_LIBVIRT_MACHINE_TYPE_Q35=y
+CONFIG_LIBVIRT_STORAGE_POOL_PATH="/opt/kdevops/libvirt"
+CONFIG_LIBVIRT_ENABLE_LARGEIO=y
+CONFIG_LIBVIRT_EXTRA_STORAGE_DRIVE_NVME=y
+CONFIG_LIBVIRT_EXTRA_STORAGE_DRIVE_SIZE="50GiB"
+
+# Network configuration
+CONFIG_LIBVIRT_ENABLE_BRIDGED_NETWORKING=y
+CONFIG_LIBVIRT_NET_NAME="kdevops"
+
+# Host configuration
+CONFIG_KDEVOPS_HOSTS_TEMPLATE="hosts.j2"
+CONFIG_VAGRANT_NVME_DISK_SIZE="50GiB"
+
+# Base system requirements
+CONFIG_WORKFLOWS=y
+CONFIG_WORKFLOWS_TESTS=y
+CONFIG_WORKFLOWS_LINUX_TESTS=y
+CONFIG_WORKFLOWS_DEDICATED_WORKFLOW=y
+CONFIG_KDEVOPS_WORKFLOW_DEDICATE_AI=y
+
+# AI Workflow Configuration
+CONFIG_KDEVOPS_WORKFLOW_ENABLE_AI=y
+CONFIG_AI_TESTS_VECTOR_DATABASE=y
+CONFIG_AI_VECTOR_DB_MILVUS=y
+CONFIG_AI_VECTOR_DB_MILVUS_DOCKER=y
+
+# Test Parameters (optimized for multi-fs testing)
+CONFIG_AI_BENCHMARK_ITERATIONS=3
+CONFIG_AI_DATASET_1M=y
+CONFIG_AI_VECTOR_DIM_128=y
+CONFIG_AI_BENCHMARK_RUNTIME="180"
+CONFIG_AI_BENCHMARK_WARMUP_TIME="30"
+
+# Query patterns
+CONFIG_AI_BENCHMARK_QUERY_TOPK_1=y
+CONFIG_AI_BENCHMARK_QUERY_TOPK_10=y
+
+# Batch sizes
+CONFIG_AI_BENCHMARK_BATCH_1=y
+CONFIG_AI_BENCHMARK_BATCH_10=y
+
+# Index configuration
+CONFIG_AI_INDEX_HNSW=y
+CONFIG_AI_INDEX_TYPE="HNSW"
+CONFIG_AI_INDEX_HNSW_M=16
+CONFIG_AI_INDEX_HNSW_EF_CONSTRUCTION=200
+CONFIG_AI_INDEX_HNSW_EF=64
+
+# Results and visualization
+CONFIG_AI_BENCHMARK_RESULTS_DIR="/data/ai-benchmark"
+CONFIG_AI_BENCHMARK_ENABLE_GRAPHING=y
+CONFIG_AI_BENCHMARK_GRAPH_FORMAT="png"
+CONFIG_AI_BENCHMARK_GRAPH_DPI=300
+CONFIG_AI_BENCHMARK_GRAPH_THEME="default"
+
+# Multi-filesystem testing configuration
+CONFIG_AI_MULTIFS_ENABLE=y
+CONFIG_AI_ENABLE_MULTIFS_TESTING=y
+CONFIG_AI_MULTIFS_RESULTS_DIR="/data/ai-multifs-benchmark"
+
+# Enable dedicated Milvus storage with node-based filesystem
+CONFIG_AI_MILVUS_STORAGE_ENABLE=y
+CONFIG_AI_MILVUS_USE_NODE_FS=y
+CONFIG_AI_MILVUS_DEVICE="/dev/disk/by-id/nvme-QEMU_NVMe_Ctrl_kdevops3"
+CONFIG_AI_MILVUS_MOUNT_POINT="/data/milvus"
+
+# Extended XFS configurations (4k, 16k, 32k, 64k block sizes)
+CONFIG_AI_MULTIFS_TEST_XFS=y
+CONFIG_AI_MULTIFS_XFS_4K_4KS=y
+CONFIG_AI_MULTIFS_XFS_16K_4KS=y
+CONFIG_AI_MULTIFS_XFS_32K_4KS=y
+CONFIG_AI_MULTIFS_XFS_64K_4KS=y
+
+# ext4 configurations
+CONFIG_AI_MULTIFS_TEST_EXT4=y
+CONFIG_AI_MULTIFS_EXT4_4K=y
+CONFIG_AI_MULTIFS_EXT4_16K_BIGALLOC=y
+
+# btrfs configurations
+CONFIG_AI_MULTIFS_TEST_BTRFS=y
+CONFIG_AI_MULTIFS_BTRFS_DEFAULT=y
+
+# Standard filesystem configuration (for comparison)
+CONFIG_AI_FILESYSTEM_XFS=y
+CONFIG_AI_FILESYSTEM="xfs"
+CONFIG_AI_FSTYPE="xfs"
+CONFIG_AI_XFS_MKFS_OPTS="-f -s size=4096"
+CONFIG_AI_XFS_MOUNT_OPTS="rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota"
+
+# Use distribution kernel (no kernel building)
+# CONFIG_BOOTLINUX is not set
+
+# Memory configuration
+CONFIG_LIBVIRT_MEM_MB=16384
+
+# Baseline/dev testing setup
+CONFIG_KDEVOPS_BASELINE_AND_DEV=y
+# Build Linux
+CONFIG_WORKFLOW_LINUX_CUSTOM=y
+CONFIG_BOOTLINUX_AB_DIFFERENT_REF=y
diff --git a/docs/ai/vector-databases/README.md b/docs/ai/vector-databases/README.md
index 2a3955d7..0fdd204b 100644
--- a/docs/ai/vector-databases/README.md
+++ b/docs/ai/vector-databases/README.md
@@ -52,7 +52,6 @@ Vector databases heavily depend on storage performance. The workflow tests acros
 - **XFS**: Default for many production deployments
 - **ext4**: Traditional Linux filesystem
 - **btrfs**: Copy-on-write with compression support
-- **ZFS**: Advanced features for data integrity
 
 ## Configuration Dimensions
 
diff --git a/playbooks/ai_install.yml b/playbooks/ai_install.yml
index 70b734e4..38e6671c 100644
--- a/playbooks/ai_install.yml
+++ b/playbooks/ai_install.yml
@@ -4,5 +4,11 @@
   become: true
   become_user: root
   roles:
+    - role: ai_docker_storage
+      when: ai_docker_storage_enable | default(true)
+      tags: ['ai', 'docker', 'storage']
+    - role: ai_milvus_storage
+      when: ai_milvus_storage_enable | default(false)
+      tags: ['ai', 'milvus', 'storage']
     - role: milvus
       tags: ['ai', 'vector_db', 'milvus', 'install']
diff --git a/playbooks/ai_multifs.yml b/playbooks/ai_multifs.yml
new file mode 100644
index 00000000..637f11f4
--- /dev/null
+++ b/playbooks/ai_multifs.yml
@@ -0,0 +1,24 @@
+---
+- hosts: baseline
+  become: yes
+  gather_facts: yes
+  vars:
+    ai_benchmark_results_dir: "{{ ai_multifs_results_dir | default('/data/ai-multifs-benchmark') }}"
+  roles:
+    - role: ai_multifs_setup
+    - role: ai_multifs_run
+  tasks:
+    - name: Final multi-filesystem testing summary
+      debug:
+        msg: |
+          Multi-filesystem AI benchmark testing completed!
+
+          Results directory: {{ ai_multifs_results_dir }}
+          Comparison report: {{ ai_multifs_results_dir }}/comparison/multi_filesystem_comparison.html
+
+          Individual filesystem results:
+          {% for config in ai_multifs_configurations %}
+          {% if config.enabled %}
+          - {{ config.name }}: {{ ai_multifs_results_dir }}/{{ config.name }}/
+          {% endif %}
+          {% endfor %}
diff --git a/playbooks/host_vars/debian13-ai-xfs-4k-4ks.yml b/playbooks/host_vars/debian13-ai-xfs-4k-4ks.yml
deleted file mode 100644
index ffe9eb28..00000000
--- a/playbooks/host_vars/debian13-ai-xfs-4k-4ks.yml
+++ /dev/null
@@ -1,10 +0,0 @@
----
-# XFS 4k block, 4k sector configuration
-ai_docker_fstype: "xfs"
-ai_docker_xfs_blocksize: 4096
-ai_docker_xfs_sectorsize: 4096
-ai_docker_xfs_mkfs_opts: ""
-filesystem_type: "xfs"
-filesystem_block_size: "4k-4ks"
-ai_filesystem: "xfs"
-ai_data_device_path: "/var/lib/docker"
\ No newline at end of file
diff --git a/playbooks/roles/ai_collect_results/files/analyze_results.py b/playbooks/roles/ai_collect_results/files/analyze_results.py
index 3d11fb11..2dc4a1d6 100755
--- a/playbooks/roles/ai_collect_results/files/analyze_results.py
+++ b/playbooks/roles/ai_collect_results/files/analyze_results.py
@@ -226,6 +226,68 @@ class ResultsAnalyzer:
 
         return fs_info
 
+    def _extract_filesystem_config(
+        self, result: Dict[str, Any]
+    ) -> tuple[str, str, str]:
+        """Extract filesystem type and block size from result data.
+        Returns (fs_type, block_size, config_key)"""
+        filename = result.get("_file", "")
+
+        # Primary: Extract filesystem type from filename (more reliable than JSON)
+        fs_type = "unknown"
+        block_size = "default"
+
+        if "xfs" in filename:
+            fs_type = "xfs"
+            # Check larger sizes first to avoid substring matches
+            if "64k" in filename and "64k-" in filename:
+                block_size = "64k"
+            elif "32k" in filename and "32k-" in filename:
+                block_size = "32k"
+            elif "16k" in filename and "16k-" in filename:
+                block_size = "16k"
+            elif "4k" in filename and "4k-" in filename:
+                block_size = "4k"
+        elif "ext4" in filename:
+            fs_type = "ext4"
+            if "16k" in filename:
+                block_size = "16k"
+            elif "4k" in filename:
+                block_size = "4k"
+        elif "btrfs" in filename:
+            fs_type = "btrfs"
+            block_size = "default"
+        else:
+            # Fallback to JSON data if filename parsing fails
+            fs_type = result.get("filesystem", "unknown")
+            self.logger.warning(
+                f"Could not determine filesystem from filename {filename}, using JSON data: {fs_type}"
+            )
+
+        config_key = f"{fs_type}-{block_size}" if block_size != "default" else fs_type
+        return fs_type, block_size, config_key
+
+    def _extract_node_info(self, result: Dict[str, Any]) -> tuple[str, bool]:
+        """Extract node hostname and determine if it's a dev node.
+        Returns (hostname, is_dev_node)"""
+        # Get hostname from system_info (preferred) or fall back to filename
+        system_info = result.get("system_info", {})
+        hostname = system_info.get("hostname", "")
+
+        # If no hostname in system_info, try extracting from filename
+        if not hostname:
+            filename = result.get("_file", "")
+            # Remove results_ prefix and .json suffix
+            hostname = filename.replace("results_", "").replace(".json", "")
+            # Remove iteration number if present (_1, _2, etc.)
+            if "_" in hostname and hostname.split("_")[-1].isdigit():
+                hostname = "_".join(hostname.split("_")[:-1])
+
+        # Determine if this is a dev node
+        is_dev = hostname.endswith("-dev")
+
+        return hostname, is_dev
+
     def load_results(self) -> bool:
         """Load all result files from the results directory"""
         try:
@@ -391,6 +453,8 @@ class ResultsAnalyzer:
             html.append(
                 "        .highlight { background-color: #fff3cd; padding: 10px; border-radius: 3px; }"
             )
+            html.append("        .baseline-row { background-color: #e8f5e9; }")
+            html.append("        .dev-row { background-color: #e3f2fd; }")
             html.append("    </style>")
             html.append("</head>")
             html.append("<body>")
@@ -486,26 +550,69 @@ class ResultsAnalyzer:
             else:
                 html.append("        <p>No storage device information available.</p>")
 
-            # Filesystem section
-            html.append("        <h3>🗂️ Filesystem Configuration</h3>")
-            fs_info = self.system_info.get("filesystem_info", {})
-            html.append("        <table class='config-table'>")
-            html.append(
-                "            <tr><td>Filesystem Type</td><td>"
-                + str(fs_info.get("filesystem_type", "Unknown"))
-                + "</td></tr>"
-            )
-            html.append(
-                "            <tr><td>Mount Point</td><td>"
-                + str(fs_info.get("mount_point", "Unknown"))
-                + "</td></tr>"
-            )
-            html.append(
-                "            <tr><td>Mount Options</td><td>"
-                + str(fs_info.get("mount_options", "Unknown"))
-                + "</td></tr>"
-            )
-            html.append("        </table>")
+            # Node Configuration section - Extract from actual benchmark results
+            html.append("        <h3>🗂️ Node Configuration</h3>")
+
+            # Collect node and filesystem information from benchmark results
+            node_configs = {}
+            for result in self.results_data:
+                # Extract node information
+                hostname, is_dev = self._extract_node_info(result)
+                fs_type, block_size, config_key = self._extract_filesystem_config(
+                    result
+                )
+
+                system_info = result.get("system_info", {})
+                data_path = system_info.get("data_path", "/data/milvus")
+                mount_point = system_info.get("mount_point", "/data")
+                kernel_version = system_info.get("kernel_version", "unknown")
+
+                if hostname not in node_configs:
+                    node_configs[hostname] = {
+                        "hostname": hostname,
+                        "node_type": "Development" if is_dev else "Baseline",
+                        "filesystem": fs_type,
+                        "block_size": block_size,
+                        "data_path": data_path,
+                        "mount_point": mount_point,
+                        "kernel": kernel_version,
+                        "test_count": 0,
+                    }
+                node_configs[hostname]["test_count"] += 1
+
+            if node_configs:
+                html.append("        <table class='config-table'>")
+                html.append(
+                    "            <tr><th>Node</th><th>Type</th><th>Filesystem</th><th>Block Size</th><th>Data Path</th><th>Mount Point</th><th>Kernel</th><th>Tests</th></tr>"
+                )
+                # Sort nodes with baseline first, then dev
+                sorted_nodes = sorted(
+                    node_configs.items(),
+                    key=lambda x: (x[1]["node_type"] != "Baseline", x[0]),
+                )
+                for hostname, config_info in sorted_nodes:
+                    row_class = (
+                        "dev-row"
+                        if config_info["node_type"] == "Development"
+                        else "baseline-row"
+                    )
+                    html.append(f"            <tr class='{row_class}'>")
+                    html.append(f"                <td><strong>{hostname}</strong></td>")
+                    html.append(f"                <td>{config_info['node_type']}</td>")
+                    html.append(f"                <td>{config_info['filesystem']}</td>")
+                    html.append(f"                <td>{config_info['block_size']}</td>")
+                    html.append(f"                <td>{config_info['data_path']}</td>")
+                    html.append(
+                        f"                <td>{config_info['mount_point']}</td>"
+                    )
+                    html.append(f"                <td>{config_info['kernel']}</td>")
+                    html.append(f"                <td>{config_info['test_count']}</td>")
+                    html.append(f"            </tr>")
+                html.append("        </table>")
+            else:
+                html.append(
+                    "        <p>No node configuration data found in results.</p>"
+                )
             html.append("    </div>")
 
             # Test Configuration Section
@@ -551,92 +658,192 @@ class ResultsAnalyzer:
                 html.append("        </table>")
                 html.append("    </div>")
 
-            # Performance Results Section
+            # Performance Results Section - Per Node
             html.append("    <div class='section'>")
-            html.append("        <h2>📊 Performance Results Summary</h2>")
+            html.append("        <h2>📊 Performance Results by Node</h2>")
 
             if self.results_data:
-                # Insert performance
-                insert_times = [
-                    r.get("insert_performance", {}).get("total_time_seconds", 0)
-                    for r in self.results_data
-                ]
-                insert_rates = [
-                    r.get("insert_performance", {}).get("vectors_per_second", 0)
-                    for r in self.results_data
-                ]
-
-                if insert_times and any(t > 0 for t in insert_times):
-                    html.append("        <h3>📈 Vector Insert Performance</h3>")
-                    html.append("        <table class='metric-table'>")
-                    html.append(
-                        f"            <tr><td>Average Insert Time</td><td>{np.mean(insert_times):.2f} seconds</td></tr>"
-                    )
-                    html.append(
-                        f"            <tr><td>Average Insert Rate</td><td>{np.mean(insert_rates):.2f} vectors/sec</td></tr>"
+                # Group results by node
+                node_performance = {}
+
+                for result in self.results_data:
+                    # Use node hostname as the grouping key
+                    hostname, is_dev = self._extract_node_info(result)
+                    fs_type, block_size, config_key = self._extract_filesystem_config(
+                        result
                     )
-                    html.append(
-                        f"            <tr><td>Insert Rate Range</td><td>{np.min(insert_rates):.2f} - {np.max(insert_rates):.2f} vectors/sec</td></tr>"
-                    )
-                    html.append("        </table>")
 
-                # Index performance
-                index_times = [
-                    r.get("index_performance", {}).get("creation_time_seconds", 0)
-                    for r in self.results_data
-                ]
-                if index_times and any(t > 0 for t in index_times):
-                    html.append("        <h3>🔗 Index Creation Performance</h3>")
-                    html.append("        <table class='metric-table'>")
-                    html.append(
-                        f"            <tr><td>Average Index Creation Time</td><td>{np.mean(index_times):.2f} seconds</td></tr>"
+                    if hostname not in node_performance:
+                        node_performance[hostname] = {
+                            "hostname": hostname,
+                            "node_type": "Development" if is_dev else "Baseline",
+                            "insert_rates": [],
+                            "insert_times": [],
+                            "index_times": [],
+                            "query_performance": {},
+                            "filesystem": fs_type,
+                            "block_size": block_size,
+                        }
+
+                    # Add insert performance
+                    insert_perf = result.get("insert_performance", {})
+                    if insert_perf:
+                        rate = insert_perf.get("vectors_per_second", 0)
+                        time = insert_perf.get("total_time_seconds", 0)
+                        if rate > 0:
+                            node_performance[hostname]["insert_rates"].append(rate)
+                        if time > 0:
+                            node_performance[hostname]["insert_times"].append(time)
+
+                    # Add index performance
+                    index_perf = result.get("index_performance", {})
+                    if index_perf:
+                        time = index_perf.get("creation_time_seconds", 0)
+                        if time > 0:
+                            node_performance[hostname]["index_times"].append(time)
+
+                    # Collect query performance (use first result for each node)
+                    query_perf = result.get("query_performance", {})
+                    if (
+                        query_perf
+                        and not node_performance[hostname]["query_performance"]
+                    ):
+                        node_performance[hostname]["query_performance"] = query_perf
+
+                # Display results for each node, sorted with baseline first
+                sorted_nodes = sorted(
+                    node_performance.items(),
+                    key=lambda x: (x[1]["node_type"] != "Baseline", x[0]),
+                )
+                for hostname, perf_data in sorted_nodes:
+                    node_type_badge = (
+                        "🔵" if perf_data["node_type"] == "Development" else "🟢"
                     )
                     html.append(
-                        f"            <tr><td>Index Time Range</td><td>{np.min(index_times):.2f} - {np.max(index_times):.2f} seconds</td></tr>"
+                        f"        <h3>{node_type_badge} {hostname} ({perf_data['node_type']})</h3>"
                     )
-                    html.append("        </table>")
-
-                # Query performance
-                html.append("        <h3>🔍 Query Performance</h3>")
-                first_query_perf = self.results_data[0].get("query_performance", {})
-                if first_query_perf:
-                    html.append("        <table>")
                     html.append(
-                        "            <tr><th>Query Type</th><th>Batch Size</th><th>QPS</th><th>Avg Latency (ms)</th></tr>"
+                        f"        <p>Filesystem: {perf_data['filesystem']}, Block Size: {perf_data['block_size']}</p>"
                     )
 
-                    for topk, topk_data in first_query_perf.items():
-                        for batch, batch_data in topk_data.items():
-                            qps = batch_data.get("queries_per_second", 0)
-                            avg_time = batch_data.get("average_time_seconds", 0) * 1000
-
-                            # Color coding for performance
-                            qps_class = ""
-                            if qps > 1000:
-                                qps_class = "performance-good"
-                            elif qps > 100:
-                                qps_class = "performance-warning"
-                            else:
-                                qps_class = "performance-poor"
-
-                            html.append(f"            <tr>")
-                            html.append(
-                                f"                <td>{topk.replace('topk_', 'Top-')}</td>"
-                            )
-                            html.append(
-                                f"                <td>{batch.replace('batch_', 'Batch ')}</td>"
-                            )
-                            html.append(
-                                f"                <td class='{qps_class}'>{qps:.2f}</td>"
-                            )
-                            html.append(f"                <td>{avg_time:.2f}</td>")
-                            html.append(f"            </tr>")
+                    # Insert performance
+                    insert_rates = perf_data["insert_rates"]
+                    if insert_rates:
+                        html.append("        <h4>📈 Vector Insert Performance</h4>")
+                        html.append("        <table class='metric-table'>")
+                        html.append(
+                            f"            <tr><td>Average Insert Rate</td><td>{np.mean(insert_rates):.2f} vectors/sec</td></tr>"
+                        )
+                        html.append(
+                            f"            <tr><td>Insert Rate Range</td><td>{np.min(insert_rates):.2f} - {np.max(insert_rates):.2f} vectors/sec</td></tr>"
+                        )
+                        html.append(
+                            f"            <tr><td>Test Iterations</td><td>{len(insert_rates)}</td></tr>"
+                        )
+                        html.append("        </table>")
+
+                    # Index performance
+                    index_times = perf_data["index_times"]
+                    if index_times:
+                        html.append("        <h4>🔗 Index Creation Performance</h4>")
+                        html.append("        <table class='metric-table'>")
+                        html.append(
+                            f"            <tr><td>Average Index Creation Time</td><td>{np.mean(index_times):.3f} seconds</td></tr>"
+                        )
+                        html.append(
+                            f"            <tr><td>Index Time Range</td><td>{np.min(index_times):.3f} - {np.max(index_times):.3f} seconds</td></tr>"
+                        )
+                        html.append("        </table>")
+
+                    # Query performance
+                    query_perf = perf_data["query_performance"]
+                    if query_perf:
+                        html.append("        <h4>🔍 Query Performance</h4>")
+                        html.append("        <table>")
+                        html.append(
+                            "            <tr><th>Query Type</th><th>Batch Size</th><th>QPS</th><th>Avg Latency (ms)</th></tr>"
+                        )
 
-                    html.append("        </table>")
+                        for topk, topk_data in query_perf.items():
+                            for batch, batch_data in topk_data.items():
+                                qps = batch_data.get("queries_per_second", 0)
+                                avg_time = (
+                                    batch_data.get("average_time_seconds", 0) * 1000
+                                )
+
+                                # Color coding for performance
+                                qps_class = ""
+                                if qps > 1000:
+                                    qps_class = "performance-good"
+                                elif qps > 100:
+                                    qps_class = "performance-warning"
+                                else:
+                                    qps_class = "performance-poor"
+
+                                html.append(f"            <tr>")
+                                html.append(
+                                    f"                <td>{topk.replace('topk_', 'Top-')}</td>"
+                                )
+                                html.append(
+                                    f"                <td>{batch.replace('batch_', 'Batch ')}</td>"
+                                )
+                                html.append(
+                                    f"                <td class='{qps_class}'>{qps:.2f}</td>"
+                                )
+                                html.append(f"                <td>{avg_time:.2f}</td>")
+                                html.append(f"            </tr>")
+                        html.append("        </table>")
+
+                    html.append("        <br>")  # Add spacing between configurations
 
-                html.append("    </div>")
+            html.append("    </div>")
 
             # Footer
+            # Performance Graphs Section
+            html.append("    <div class='section'>")
+            html.append("        <h2>📈 Performance Visualizations</h2>")
+            html.append(
+                "        <p>The following graphs provide visual analysis of the benchmark results across all tested filesystem configurations:</p>"
+            )
+            html.append("        <ul>")
+            html.append(
+                "            <li><strong>Insert Performance:</strong> Shows vector insertion rates and times for each filesystem configuration</li>"
+            )
+            html.append(
+                "            <li><strong>Query Performance:</strong> Displays query performance heatmaps for different Top-K and batch sizes</li>"
+            )
+            html.append(
+                "            <li><strong>Index Performance:</strong> Compares index creation times across filesystems</li>"
+            )
+            html.append(
+                "            <li><strong>Performance Matrix:</strong> Comprehensive comparison matrix of all metrics</li>"
+            )
+            html.append(
+                "            <li><strong>Filesystem Comparison:</strong> Side-by-side comparison of filesystem performance</li>"
+            )
+            html.append("        </ul>")
+            html.append(
+                "        <p><em>Note: Graphs are generated as separate PNG files in the same directory as this report.</em></p>"
+            )
+            html.append("        <div style='margin-top: 20px;'>")
+            html.append(
+                "            <img src='insert_performance.png' alt='Insert Performance' style='max-width: 100%; height: auto; margin-bottom: 20px;'>"
+            )
+            html.append(
+                "            <img src='query_performance.png' alt='Query Performance' style='max-width: 100%; height: auto; margin-bottom: 20px;'>"
+            )
+            html.append(
+                "            <img src='index_performance.png' alt='Index Performance' style='max-width: 100%; height: auto; margin-bottom: 20px;'>"
+            )
+            html.append(
+                "            <img src='performance_matrix.png' alt='Performance Matrix' style='max-width: 100%; height: auto; margin-bottom: 20px;'>"
+            )
+            html.append(
+                "            <img src='filesystem_comparison.png' alt='Filesystem Comparison' style='max-width: 100%; height: auto; margin-bottom: 20px;'>"
+            )
+            html.append("        </div>")
+            html.append("    </div>")
+
             html.append("    <div class='section'>")
             html.append("        <h2>📝 Notes</h2>")
             html.append("        <ul>")
@@ -661,10 +868,11 @@ class ResultsAnalyzer:
             return "\n".join(html)
 
         except Exception as e:
-            self.logger.error(f"Error generating HTML report: {e}")
-            return (
-                f"<html><body><h1>Error generating HTML report: {e}</h1></body></html>"
-            )
+            import traceback
+
+            tb = traceback.format_exc()
+            self.logger.error(f"Error generating HTML report: {e}\n{tb}")
+            return f"<html><body><h1>Error generating HTML report: {e}</h1><pre>{tb}</pre></body></html>"
 
     def generate_graphs(self) -> bool:
         """Generate performance visualization graphs"""
@@ -691,6 +899,9 @@ class ResultsAnalyzer:
             # Graph 4: Performance Comparison Matrix
             self._plot_performance_matrix()
 
+            # Graph 5: Multi-filesystem Comparison (if applicable)
+            self._plot_filesystem_comparison()
+
             self.logger.info("Graphs generated successfully")
             return True
 
@@ -699,34 +910,188 @@ class ResultsAnalyzer:
             return False
 
     def _plot_insert_performance(self):
-        """Plot insert performance metrics"""
-        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
+        """Plot insert performance metrics with node differentiation"""
+        # Group data by node
+        node_performance = {}
 
-        # Extract insert data
-        iterations = []
-        insert_rates = []
-        insert_times = []
+        for result in self.results_data:
+            hostname, is_dev = self._extract_node_info(result)
+
+            if hostname not in node_performance:
+                node_performance[hostname] = {
+                    "insert_rates": [],
+                    "insert_times": [],
+                    "iterations": [],
+                    "is_dev": is_dev,
+                }
 
-        for i, result in enumerate(self.results_data):
             insert_perf = result.get("insert_performance", {})
             if insert_perf:
-                iterations.append(i + 1)
-                insert_rates.append(insert_perf.get("vectors_per_second", 0))
-                insert_times.append(insert_perf.get("total_time_seconds", 0))
-
-        # Plot insert rate
-        ax1.plot(iterations, insert_rates, "b-o", linewidth=2, markersize=6)
-        ax1.set_xlabel("Iteration")
-        ax1.set_ylabel("Vectors/Second")
-        ax1.set_title("Vector Insert Rate Performance")
-        ax1.grid(True, alpha=0.3)
-
-        # Plot insert time
-        ax2.plot(iterations, insert_times, "r-o", linewidth=2, markersize=6)
-        ax2.set_xlabel("Iteration")
-        ax2.set_ylabel("Total Time (seconds)")
-        ax2.set_title("Vector Insert Time Performance")
-        ax2.grid(True, alpha=0.3)
+                node_performance[hostname]["insert_rates"].append(
+                    insert_perf.get("vectors_per_second", 0)
+                )
+                node_performance[hostname]["insert_times"].append(
+                    insert_perf.get("total_time_seconds", 0)
+                )
+                node_performance[hostname]["iterations"].append(
+                    len(node_performance[hostname]["insert_rates"])
+                )
+
+        # Check if we have multiple nodes
+        if len(node_performance) > 1:
+            # Multi-node mode: separate lines for each node
+            fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(18, 7))
+
+            # Sort nodes with baseline first, then dev
+            sorted_nodes = sorted(
+                node_performance.items(), key=lambda x: (x[1]["is_dev"], x[0])
+            )
+
+            # Create color palettes for baseline and dev nodes
+            baseline_colors = [
+                "#2E7D32",
+                "#43A047",
+                "#66BB6A",
+                "#81C784",
+                "#A5D6A7",
+                "#C8E6C9",
+            ]  # Greens
+            dev_colors = [
+                "#0D47A1",
+                "#1565C0",
+                "#1976D2",
+                "#1E88E5",
+                "#2196F3",
+                "#42A5F5",
+                "#64B5F6",
+            ]  # Blues
+
+            # Additional colors if needed
+            extra_colors = [
+                "#E65100",
+                "#F57C00",
+                "#FF9800",
+                "#FFB300",
+                "#FFC107",
+                "#FFCA28",
+            ]  # Oranges
+
+            # Line styles to cycle through
+            line_styles = ["-", "--", "-.", ":"]
+            markers = ["o", "s", "^", "v", "D", "p", "*", "h"]
+
+            baseline_idx = 0
+            dev_idx = 0
+
+            # Use different colors and styles for each node
+            for idx, (hostname, perf_data) in enumerate(sorted_nodes):
+                if not perf_data["insert_rates"]:
+                    continue
+
+                # Choose color and style based on node type and index
+                if perf_data["is_dev"]:
+                    # Development nodes - blues
+                    color = dev_colors[dev_idx % len(dev_colors)]
+                    linestyle = line_styles[
+                        (dev_idx // len(dev_colors)) % len(line_styles)
+                    ]
+                    marker = markers[4 + (dev_idx % 4)]  # Use markers 4-7 for dev
+                    label = f"{hostname} (Dev)"
+                    dev_idx += 1
+                else:
+                    # Baseline nodes - greens
+                    color = baseline_colors[baseline_idx % len(baseline_colors)]
+                    linestyle = line_styles[
+                        (baseline_idx // len(baseline_colors)) % len(line_styles)
+                    ]
+                    marker = markers[
+                        baseline_idx % 4
+                    ]  # Use first 4 markers for baseline
+                    label = f"{hostname} (Baseline)"
+                    baseline_idx += 1
+
+                iterations = list(range(1, len(perf_data["insert_rates"]) + 1))
+
+                # Plot insert rate with alpha for better visibility
+                ax1.plot(
+                    iterations,
+                    perf_data["insert_rates"],
+                    color=color,
+                    linestyle=linestyle,
+                    marker=marker,
+                    linewidth=1.5,
+                    markersize=5,
+                    label=label,
+                    alpha=0.8,
+                )
+
+                # Plot insert time
+                ax2.plot(
+                    iterations,
+                    perf_data["insert_times"],
+                    color=color,
+                    linestyle=linestyle,
+                    marker=marker,
+                    linewidth=1.5,
+                    markersize=5,
+                    label=label,
+                    alpha=0.8,
+                )
+
+            ax1.set_xlabel("Iteration")
+            ax1.set_ylabel("Vectors/Second")
+            ax1.set_title("Milvus Insert Rate by Node")
+            ax1.grid(True, alpha=0.3)
+            # Position legend outside plot area for better visibility with many nodes
+            ax1.legend(bbox_to_anchor=(1.05, 1), loc="upper left", fontsize=7, ncol=1)
+
+            ax2.set_xlabel("Iteration")
+            ax2.set_ylabel("Total Time (seconds)")
+            ax2.set_title("Milvus Insert Time by Node")
+            ax2.grid(True, alpha=0.3)
+            # Position legend outside plot area for better visibility with many nodes
+            ax2.legend(bbox_to_anchor=(1.05, 1), loc="upper left", fontsize=7, ncol=1)
+
+            plt.suptitle(
+                "Insert Performance Analysis: Baseline vs Development",
+                fontsize=14,
+                y=1.02,
+            )
+        else:
+            # Single node mode: original behavior
+            fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
+
+            # Extract insert data from single node
+            hostname = list(node_performance.keys())[0] if node_performance else None
+            if hostname:
+                perf_data = node_performance[hostname]
+                iterations = list(range(1, len(perf_data["insert_rates"]) + 1))
+
+                # Plot insert rate
+                ax1.plot(
+                    iterations,
+                    perf_data["insert_rates"],
+                    "b-o",
+                    linewidth=2,
+                    markersize=6,
+                )
+                ax1.set_xlabel("Iteration")
+                ax1.set_ylabel("Vectors/Second")
+                ax1.set_title(f"Vector Insert Rate Performance - {hostname}")
+                ax1.grid(True, alpha=0.3)
+
+                # Plot insert time
+                ax2.plot(
+                    iterations,
+                    perf_data["insert_times"],
+                    "r-o",
+                    linewidth=2,
+                    markersize=6,
+                )
+                ax2.set_xlabel("Iteration")
+                ax2.set_ylabel("Total Time (seconds)")
+                ax2.set_title(f"Vector Insert Time Performance - {hostname}")
+                ax2.grid(True, alpha=0.3)
 
         plt.tight_layout()
         output_file = os.path.join(
@@ -739,52 +1104,110 @@ class ResultsAnalyzer:
         plt.close()
 
     def _plot_query_performance(self):
-        """Plot query performance metrics"""
+        """Plot query performance metrics comparing baseline vs dev nodes"""
         if not self.results_data:
             return
 
-        # Collect query performance data
-        query_data = []
+        # Group data by filesystem configuration
+        fs_groups = {}
         for result in self.results_data:
+            hostname, is_dev = self._extract_node_info(result)
+            fs_type, block_size, config_key = self._extract_filesystem_config(result)
+
+            if config_key not in fs_groups:
+                fs_groups[config_key] = {"baseline": [], "dev": []}
+
             query_perf = result.get("query_performance", {})
-            for topk, topk_data in query_perf.items():
-                for batch, batch_data in topk_data.items():
-                    query_data.append(
-                        {
-                            "topk": topk.replace("topk_", ""),
-                            "batch": batch.replace("batch_", ""),
-                            "qps": batch_data.get("queries_per_second", 0),
-                            "avg_time": batch_data.get("average_time_seconds", 0)
-                            * 1000,  # Convert to ms
-                        }
-                    )
+            if query_perf:
+                node_type = "dev" if is_dev else "baseline"
+                for topk, topk_data in query_perf.items():
+                    for batch, batch_data in topk_data.items():
+                        fs_groups[config_key][node_type].append(
+                            {
+                                "hostname": hostname,
+                                "topk": topk.replace("topk_", ""),
+                                "batch": batch.replace("batch_", ""),
+                                "qps": batch_data.get("queries_per_second", 0),
+                                "avg_time": batch_data.get("average_time_seconds", 0)
+                                * 1000,
+                            }
+                        )
 
-        if not query_data:
+        if not fs_groups:
             return
 
-        df = pd.DataFrame(query_data)
+        # Create subplots for each filesystem config
+        n_configs = len(fs_groups)
+        fig_height = max(8, 4 * n_configs)
+        fig, axes = plt.subplots(n_configs, 2, figsize=(16, fig_height))
 
-        # Create subplots
-        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
+        if n_configs == 1:
+            axes = axes.reshape(1, -1)
 
-        # QPS heatmap
-        qps_pivot = df.pivot_table(
-            values="qps", index="topk", columns="batch", aggfunc="mean"
-        )
-        sns.heatmap(qps_pivot, annot=True, fmt=".1f", ax=ax1, cmap="YlOrRd")
-        ax1.set_title("Queries Per Second (QPS)")
-        ax1.set_xlabel("Batch Size")
-        ax1.set_ylabel("Top-K")
-
-        # Latency heatmap
-        latency_pivot = df.pivot_table(
-            values="avg_time", index="topk", columns="batch", aggfunc="mean"
-        )
-        sns.heatmap(latency_pivot, annot=True, fmt=".1f", ax=ax2, cmap="YlOrRd")
-        ax2.set_title("Average Query Latency (ms)")
-        ax2.set_xlabel("Batch Size")
-        ax2.set_ylabel("Top-K")
+        for idx, (config_key, data) in enumerate(sorted(fs_groups.items())):
+            # Create DataFrames for baseline and dev
+            baseline_df = (
+                pd.DataFrame(data["baseline"]) if data["baseline"] else pd.DataFrame()
+            )
+            dev_df = pd.DataFrame(data["dev"]) if data["dev"] else pd.DataFrame()
+
+            # Baseline QPS heatmap
+            ax_base = axes[idx][0]
+            if not baseline_df.empty:
+                baseline_pivot = baseline_df.pivot_table(
+                    values="qps", index="topk", columns="batch", aggfunc="mean"
+                )
+                sns.heatmap(
+                    baseline_pivot,
+                    annot=True,
+                    fmt=".1f",
+                    ax=ax_base,
+                    cmap="Greens",
+                    cbar_kws={"label": "QPS"},
+                )
+                ax_base.set_title(f"{config_key.upper()} - Baseline QPS")
+                ax_base.set_xlabel("Batch Size")
+                ax_base.set_ylabel("Top-K")
+            else:
+                ax_base.text(
+                    0.5,
+                    0.5,
+                    f"No baseline data for {config_key}",
+                    ha="center",
+                    va="center",
+                    transform=ax_base.transAxes,
+                )
+                ax_base.set_title(f"{config_key.upper()} - Baseline QPS")
 
+            # Dev QPS heatmap
+            ax_dev = axes[idx][1]
+            if not dev_df.empty:
+                dev_pivot = dev_df.pivot_table(
+                    values="qps", index="topk", columns="batch", aggfunc="mean"
+                )
+                sns.heatmap(
+                    dev_pivot,
+                    annot=True,
+                    fmt=".1f",
+                    ax=ax_dev,
+                    cmap="Blues",
+                    cbar_kws={"label": "QPS"},
+                )
+                ax_dev.set_title(f"{config_key.upper()} - Development QPS")
+                ax_dev.set_xlabel("Batch Size")
+                ax_dev.set_ylabel("Top-K")
+            else:
+                ax_dev.text(
+                    0.5,
+                    0.5,
+                    f"No dev data for {config_key}",
+                    ha="center",
+                    va="center",
+                    transform=ax_dev.transAxes,
+                )
+                ax_dev.set_title(f"{config_key.upper()} - Development QPS")
+
+        plt.suptitle("Query Performance: Baseline vs Development", fontsize=16, y=1.02)
         plt.tight_layout()
         output_file = os.path.join(
             self.output_dir,
@@ -796,32 +1219,101 @@ class ResultsAnalyzer:
         plt.close()
 
     def _plot_index_performance(self):
-        """Plot index creation performance"""
-        iterations = []
-        index_times = []
+        """Plot index creation performance comparing baseline vs dev"""
+        # Group by filesystem configuration
+        fs_groups = {}
+
+        for result in self.results_data:
+            hostname, is_dev = self._extract_node_info(result)
+            fs_type, block_size, config_key = self._extract_filesystem_config(result)
+
+            if config_key not in fs_groups:
+                fs_groups[config_key] = {"baseline": [], "dev": []}
 
-        for i, result in enumerate(self.results_data):
             index_perf = result.get("index_performance", {})
             if index_perf:
-                iterations.append(i + 1)
-                index_times.append(index_perf.get("creation_time_seconds", 0))
+                time = index_perf.get("creation_time_seconds", 0)
+                if time > 0:
+                    node_type = "dev" if is_dev else "baseline"
+                    fs_groups[config_key][node_type].append(time)
 
-        if not index_times:
+        if not fs_groups:
             return
 
-        plt.figure(figsize=(10, 6))
-        plt.bar(iterations, index_times, alpha=0.7, color="green")
-        plt.xlabel("Iteration")
-        plt.ylabel("Index Creation Time (seconds)")
-        plt.title("Index Creation Performance")
-        plt.grid(True, alpha=0.3)
-
-        # Add average line
-        avg_time = np.mean(index_times)
-        plt.axhline(
-            y=avg_time, color="red", linestyle="--", label=f"Average: {avg_time:.2f}s"
+        # Create comparison bar chart
+        fig, ax = plt.subplots(figsize=(14, 8))
+
+        configs = sorted(fs_groups.keys())
+        x = np.arange(len(configs))
+        width = 0.35
+
+        # Calculate averages for each config
+        baseline_avgs = []
+        dev_avgs = []
+        baseline_stds = []
+        dev_stds = []
+
+        for config in configs:
+            baseline_times = fs_groups[config]["baseline"]
+            dev_times = fs_groups[config]["dev"]
+
+            baseline_avgs.append(np.mean(baseline_times) if baseline_times else 0)
+            dev_avgs.append(np.mean(dev_times) if dev_times else 0)
+            baseline_stds.append(np.std(baseline_times) if baseline_times else 0)
+            dev_stds.append(np.std(dev_times) if dev_times else 0)
+
+        # Create bars
+        bars1 = ax.bar(
+            x - width / 2,
+            baseline_avgs,
+            width,
+            yerr=baseline_stds,
+            label="Baseline",
+            color="#4CAF50",
+            capsize=5,
+        )
+        bars2 = ax.bar(
+            x + width / 2,
+            dev_avgs,
+            width,
+            yerr=dev_stds,
+            label="Development",
+            color="#2196F3",
+            capsize=5,
         )
-        plt.legend()
+
+        # Add value labels on bars
+        for bar, val in zip(bars1, baseline_avgs):
+            if val > 0:
+                height = bar.get_height()
+                ax.text(
+                    bar.get_x() + bar.get_width() / 2.0,
+                    height,
+                    f"{val:.3f}s",
+                    ha="center",
+                    va="bottom",
+                    fontsize=9,
+                )
+
+        for bar, val in zip(bars2, dev_avgs):
+            if val > 0:
+                height = bar.get_height()
+                ax.text(
+                    bar.get_x() + bar.get_width() / 2.0,
+                    height,
+                    f"{val:.3f}s",
+                    ha="center",
+                    va="bottom",
+                    fontsize=9,
+                )
+
+        ax.set_xlabel("Filesystem Configuration", fontsize=12)
+        ax.set_ylabel("Index Creation Time (seconds)", fontsize=12)
+        ax.set_title("Index Creation Performance: Baseline vs Development", fontsize=14)
+        ax.set_xticks(x)
+        ax.set_xticklabels([c.upper() for c in configs], rotation=45, ha="right")
+        ax.legend(loc="upper right")
+        ax.grid(True, alpha=0.3, axis="y")
 
         output_file = os.path.join(
             self.output_dir,
@@ -833,61 +1325,148 @@ class ResultsAnalyzer:
         plt.close()
 
     def _plot_performance_matrix(self):
-        """Plot comprehensive performance comparison matrix"""
+        """Plot performance comparison matrix for each filesystem config"""
         if len(self.results_data) < 2:
             return
 
-        # Extract key metrics for comparison
-        metrics = []
-        for i, result in enumerate(self.results_data):
+        # Group by filesystem configuration
+        fs_metrics = {}
+
+        for result in self.results_data:
+            hostname, is_dev = self._extract_node_info(result)
+            fs_type, block_size, config_key = self._extract_filesystem_config(result)
+
+            if config_key not in fs_metrics:
+                fs_metrics[config_key] = {"baseline": [], "dev": []}
+
+            # Collect metrics
             insert_perf = result.get("insert_performance", {})
             index_perf = result.get("index_performance", {})
+            query_perf = result.get("query_performance", {})
 
             metric = {
-                "iteration": i + 1,
+                "hostname": hostname,
                 "insert_rate": insert_perf.get("vectors_per_second", 0),
                 "index_time": index_perf.get("creation_time_seconds", 0),
             }
 
-            # Add query metrics
-            query_perf = result.get("query_performance", {})
+            # Get representative query performance (topk_10, batch_1)
             if "topk_10" in query_perf and "batch_1" in query_perf["topk_10"]:
                 metric["query_qps"] = query_perf["topk_10"]["batch_1"].get(
                     "queries_per_second", 0
                 )
+            else:
+                metric["query_qps"] = 0
 
-            metrics.append(metric)
+            node_type = "dev" if is_dev else "baseline"
+            fs_metrics[config_key][node_type].append(metric)
 
-        df = pd.DataFrame(metrics)
+        if not fs_metrics:
+            return
 
-        # Normalize metrics for comparison
-        numeric_cols = ["insert_rate", "index_time", "query_qps"]
-        for col in numeric_cols:
-            if col in df.columns:
-                df[f"{col}_norm"] = (df[col] - df[col].min()) / (
-                    df[col].max() - df[col].min() + 1e-6
-                )
+        # Create subplots for each filesystem
+        n_configs = len(fs_metrics)
+        n_cols = min(3, n_configs)
+        n_rows = (n_configs + n_cols - 1) // n_cols
+
+        fig, axes = plt.subplots(n_rows, n_cols, figsize=(n_cols * 6, n_rows * 5))
+        if n_rows == 1 and n_cols == 1:
+            axes = [[axes]]
+        elif n_rows == 1:
+            axes = [axes]
+        elif n_cols == 1:
+            axes = [[ax] for ax in axes]
+
+        for idx, (config_key, data) in enumerate(sorted(fs_metrics.items())):
+            row = idx // n_cols
+            col = idx % n_cols
+            ax = axes[row][col]
+
+            # Calculate averages
+            baseline_metrics = data["baseline"]
+            dev_metrics = data["dev"]
+
+            if baseline_metrics and dev_metrics:
+                categories = ["Insert Rate\n(vec/s)", "Index Time\n(s)", "Query QPS"]
+
+                baseline_avg = [
+                    np.mean([m["insert_rate"] for m in baseline_metrics]),
+                    np.mean([m["index_time"] for m in baseline_metrics]),
+                    np.mean([m["query_qps"] for m in baseline_metrics]),
+                ]
 
-        # Create radar chart
-        fig, ax = plt.subplots(figsize=(10, 8), subplot_kw=dict(projection="polar"))
+                dev_avg = [
+                    np.mean([m["insert_rate"] for m in dev_metrics]),
+                    np.mean([m["index_time"] for m in dev_metrics]),
+                    np.mean([m["query_qps"] for m in dev_metrics]),
+                ]
 
-        angles = np.linspace(0, 2 * np.pi, len(numeric_cols), endpoint=False).tolist()
-        angles += angles[:1]  # Complete the circle
+                x = np.arange(len(categories))
+                width = 0.35
 
-        for i, row in df.iterrows():
-            values = [row.get(f"{col}_norm", 0) for col in numeric_cols]
-            values += values[:1]  # Complete the circle
+                bars1 = ax.bar(
+                    x - width / 2,
+                    baseline_avg,
+                    width,
+                    label="Baseline",
+                    color="#4CAF50",
+                )
+                bars2 = ax.bar(
+                    x + width / 2, dev_avg, width, label="Development", color="#2196F3"
+                )
 
-            ax.plot(
-                angles, values, "o-", linewidth=2, label=f'Iteration {row["iteration"]}'
-            )
-            ax.fill(angles, values, alpha=0.25)
+                # Add value labels
+                for bar, val in zip(bars1, baseline_avg):
+                    height = bar.get_height()
+                    ax.text(
+                        bar.get_x() + bar.get_width() / 2.0,
+                        height,
+                        f"{val:.0f}" if val > 100 else f"{val:.2f}",
+                        ha="center",
+                        va="bottom",
+                        fontsize=8,
+                    )
 
-        ax.set_xticks(angles[:-1])
-        ax.set_xticklabels(["Insert Rate", "Index Time (inv)", "Query QPS"])
-        ax.set_ylim(0, 1)
-        ax.set_title("Performance Comparison Matrix (Normalized)", y=1.08)
-        ax.legend(loc="upper right", bbox_to_anchor=(1.3, 1.0))
+                for bar, val in zip(bars2, dev_avg):
+                    height = bar.get_height()
+                    ax.text(
+                        bar.get_x() + bar.get_width() / 2.0,
+                        height,
+                        f"{val:.0f}" if val > 100 else f"{val:.2f}",
+                        ha="center",
+                        va="bottom",
+                        fontsize=8,
+                    )
+
+                ax.set_xlabel("Metrics")
+                ax.set_ylabel("Value")
+                ax.set_title(f"{config_key.upper()}")
+                ax.set_xticks(x)
+                ax.set_xticklabels(categories)
+                ax.legend(loc="upper right", fontsize=8)
+                ax.grid(True, alpha=0.3, axis="y")
+            else:
+                ax.text(
+                    0.5,
+                    0.5,
+                    f"Insufficient data\nfor {config_key}",
+                    ha="center",
+                    va="center",
+                    transform=ax.transAxes,
+                )
+                ax.set_title(f"{config_key.upper()}")
+
+        # Hide unused subplots
+        for idx in range(n_configs, n_rows * n_cols):
+            row = idx // n_cols
+            col = idx % n_cols
+            axes[row][col].set_visible(False)
+
+        plt.suptitle(
+            "Performance Comparison Matrix: Baseline vs Development",
+            fontsize=14,
+            y=1.02,
+        )
 
         output_file = os.path.join(
             self.output_dir,
@@ -898,6 +1477,149 @@ class ResultsAnalyzer:
         )
         plt.close()
 
+    def _plot_filesystem_comparison(self):
+        """Plot node performance comparison chart"""
+        if len(self.results_data) < 2:
+            return
+
+        # Group results by node
+        node_performance = {}
+
+        for result in self.results_data:
+            hostname, is_dev = self._extract_node_info(result)
+
+            if hostname not in node_performance:
+                node_performance[hostname] = {
+                    "insert_rates": [],
+                    "index_times": [],
+                    "query_qps": [],
+                    "is_dev": is_dev,
+                }
+
+            # Collect metrics
+            insert_perf = result.get("insert_performance", {})
+            if insert_perf:
+                node_performance[hostname]["insert_rates"].append(
+                    insert_perf.get("vectors_per_second", 0)
+                )
+
+            index_perf = result.get("index_performance", {})
+            if index_perf:
+                node_performance[hostname]["index_times"].append(
+                    index_perf.get("creation_time_seconds", 0)
+                )
+
+            # Get top-10 batch-1 query performance as representative
+            query_perf = result.get("query_performance", {})
+            if "topk_10" in query_perf and "batch_1" in query_perf["topk_10"]:
+                qps = query_perf["topk_10"]["batch_1"].get("queries_per_second", 0)
+                node_performance[hostname]["query_qps"].append(qps)
+
+        # Only create comparison if we have multiple nodes
+        if len(node_performance) > 1:
+            # Calculate averages
+            node_metrics = {}
+            for hostname, perf_data in node_performance.items():
+                node_metrics[hostname] = {
+                    "avg_insert_rate": (
+                        np.mean(perf_data["insert_rates"])
+                        if perf_data["insert_rates"]
+                        else 0
+                    ),
+                    "avg_index_time": (
+                        np.mean(perf_data["index_times"])
+                        if perf_data["index_times"]
+                        else 0
+                    ),
+                    "avg_query_qps": (
+                        np.mean(perf_data["query_qps"]) if perf_data["query_qps"] else 0
+                    ),
+                    "is_dev": perf_data["is_dev"],
+                }
+
+            # Create comparison bar chart with more space
+            fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(24, 8))
+
+            # Sort nodes with baseline first
+            sorted_nodes = sorted(
+                node_metrics.items(), key=lambda x: (x[1]["is_dev"], x[0])
+            )
+            node_names = [hostname for hostname, _ in sorted_nodes]
+
+            # Use different colors for baseline vs dev
+            colors = [
+                "#4CAF50" if not node_metrics[hostname]["is_dev"] else "#2196F3"
+                for hostname in node_names
+            ]
+
+            # Add labels for clarity
+            labels = [
+                f"{hostname}\n({'Dev' if node_metrics[hostname]['is_dev'] else 'Baseline'})"
+                for hostname in node_names
+            ]
+
+            # Insert rate comparison
+            insert_rates = [
+                node_metrics[hostname]["avg_insert_rate"] for hostname in node_names
+            ]
+            bars1 = ax1.bar(labels, insert_rates, color=colors)
+            ax1.set_title("Average Milvus Insert Rate by Node")
+            ax1.set_ylabel("Vectors/Second")
+            # Rotate labels for better readability
+            ax1.set_xticklabels(labels, rotation=45, ha="right", fontsize=8)
+
+            # Index time comparison (lower is better)
+            index_times = [
+                node_metrics[hostname]["avg_index_time"] for hostname in node_names
+            ]
+            bars2 = ax2.bar(labels, index_times, color=colors)
+            ax2.set_title("Average Milvus Index Time by Node")
+            ax2.set_ylabel("Seconds (Lower is Better)")
+            ax2.set_xticklabels(labels, rotation=45, ha="right", fontsize=8)
+
+            # Query QPS comparison
+            query_qps = [
+                node_metrics[hostname]["avg_query_qps"] for hostname in node_names
+            ]
+            bars3 = ax3.bar(labels, query_qps, color=colors)
+            ax3.set_title("Average Milvus Query QPS by Node")
+            ax3.set_ylabel("Queries/Second")
+            ax3.set_xticklabels(labels, rotation=45, ha="right", fontsize=8)
+
+            # Add value labels on bars
+            for bars, values in [
+                (bars1, insert_rates),
+                (bars2, index_times),
+                (bars3, query_qps),
+            ]:
+                for bar, value in zip(bars, values):
+                    height = bar.get_height()
+                    ax = bar.axes
+                    ax.text(
+                        bar.get_x() + bar.get_width() / 2.0,
+                        height + height * 0.01,
+                        f"{value:.1f}",
+                        ha="center",
+                        va="bottom",
+                        fontsize=10,
+                    )
+
+            plt.suptitle(
+                "Milvus Performance Comparison: Baseline vs Development Nodes",
+                fontsize=16,
+                y=1.02,
+            )
+            plt.tight_layout()
+
+            output_file = os.path.join(
+                self.output_dir,
+                f"filesystem_comparison.{self.config.get('graph_format', 'png')}",
+            )
+            plt.savefig(
+                output_file, dpi=self.config.get("graph_dpi", 300), bbox_inches="tight"
+            )
+            plt.close()
+
     def analyze(self) -> bool:
         """Run complete analysis"""
         self.logger.info("Starting results analysis...")
diff --git a/playbooks/roles/ai_collect_results/files/generate_better_graphs.py b/playbooks/roles/ai_collect_results/files/generate_better_graphs.py
index 645bac9e..b3681ff9 100755
--- a/playbooks/roles/ai_collect_results/files/generate_better_graphs.py
+++ b/playbooks/roles/ai_collect_results/files/generate_better_graphs.py
@@ -29,17 +29,18 @@ def extract_filesystem_from_filename(filename):
         if "_" in node_name:
             parts = node_name.split("_")
             node_name = "_".join(parts[:-1])  # Remove last part (iteration)
-        
+
         # Extract filesystem type from node name
         if "-xfs-" in node_name:
             return "xfs"
         elif "-ext4-" in node_name:
-            return "ext4"  
+            return "ext4"
         elif "-btrfs-" in node_name:
             return "btrfs"
-    
+
     return "unknown"
 
+
 def extract_node_config_from_filename(filename):
     """Extract detailed node configuration from filename"""
     # Expected format: results_debian13-ai-xfs-4k-4ks_1.json
@@ -50,14 +51,15 @@ def extract_node_config_from_filename(filename):
         if "_" in node_name:
             parts = node_name.split("_")
             node_name = "_".join(parts[:-1])  # Remove last part (iteration)
-        
+
         # Remove -dev suffix if present
         node_name = node_name.replace("-dev", "")
-        
+
         return node_name.replace("debian13-ai-", "")
-    
+
     return "unknown"
 
+
 def detect_filesystem():
     """Detect the filesystem type of /data on test nodes"""
     # This is now a fallback - we primarily use filename-based detection
@@ -104,7 +106,7 @@ def load_results(results_dir):
                 # Extract node type from filename
                 filename = os.path.basename(json_file)
                 data["filename"] = filename
-                
+
                 # Extract filesystem type and config from filename
                 data["filesystem"] = extract_filesystem_from_filename(filename)
                 data["node_config"] = extract_node_config_from_filename(filename)
diff --git a/playbooks/roles/ai_collect_results/files/generate_graphs.py b/playbooks/roles/ai_collect_results/files/generate_graphs.py
index 53a835e2..fafc62bf 100755
--- a/playbooks/roles/ai_collect_results/files/generate_graphs.py
+++ b/playbooks/roles/ai_collect_results/files/generate_graphs.py
@@ -9,7 +9,6 @@ import sys
 import glob
 import numpy as np
 import matplotlib
-
 matplotlib.use("Agg")  # Use non-interactive backend
 import matplotlib.pyplot as plt
 from datetime import datetime
@@ -17,68 +16,78 @@ from pathlib import Path
 from collections import defaultdict
 
 
+def _extract_filesystem_config(result):
+    """Extract filesystem type and block size from result data.
+    Returns (fs_type, block_size, config_key)"""
+    filename = result.get("_file", "")
+
+    # Primary: Extract filesystem type from filename (more reliable than JSON)
+    fs_type = "unknown"
+    block_size = "default"
+
+    if "xfs" in filename:
+        fs_type = "xfs"
+        # Check larger sizes first to avoid substring matches
+        if "64k" in filename and "64k-" in filename:
+            block_size = "64k"
+        elif "32k" in filename and "32k-" in filename:
+            block_size = "32k"
+        elif "16k" in filename and "16k-" in filename:
+            block_size = "16k"
+        elif "4k" in filename and "4k-" in filename:
+            block_size = "4k"
+    elif "ext4" in filename:
+        fs_type = "ext4"
+        if "4k" in filename and "4k-" in filename:
+            block_size = "4k"
+        elif "16k" in filename and "16k-" in filename:
+            block_size = "16k"
+    elif "btrfs" in filename:
+        fs_type = "btrfs"
+
+    # Fallback: Check JSON data if filename parsing failed
+    if fs_type == "unknown":
+        fs_type = result.get("filesystem", "unknown")
+
+    # Create descriptive config key
+    config_key = f"{fs_type}-{block_size}" if block_size != "default" else fs_type
+    return fs_type, block_size, config_key
+
+
+def _extract_node_info(result):
+    """Extract node hostname and determine if it's a dev node.
+    Returns (hostname, is_dev_node)"""
+    # Get hostname from system_info (preferred) or fall back to filename
+    system_info = result.get("system_info", {})
+    hostname = system_info.get("hostname", "")
+    
+    # If no hostname in system_info, try extracting from filename
+    if not hostname:
+        filename = result.get("_file", "")
+        # Remove results_ prefix and .json suffix
+        hostname = filename.replace("results_", "").replace(".json", "")
+        # Remove iteration number if present (_1, _2, etc.)
+        if "_" in hostname and hostname.split("_")[-1].isdigit():
+            hostname = "_".join(hostname.split("_")[:-1])
+    
+    # Determine if this is a dev node
+    is_dev = hostname.endswith("-dev")
+    
+    return hostname, is_dev
+
+
 def load_results(results_dir):
     """Load all JSON result files from the directory"""
     results = []
-    json_files = glob.glob(os.path.join(results_dir, "*.json"))
+    # Only load results_*.json files, not consolidated or other JSON files
+    json_files = glob.glob(os.path.join(results_dir, "results_*.json"))
 
     for json_file in json_files:
         try:
             with open(json_file, "r") as f:
                 data = json.load(f)
-                # Extract filesystem info - prefer from JSON data over filename
-                filename = os.path.basename(json_file)
-                
-                # First, try to get filesystem from the JSON data itself
-                fs_type = data.get("filesystem", None)
-                
-                # If not in JSON, try to parse from filename (backwards compatibility)
-                if not fs_type:
-                    parts = filename.replace("results_", "").replace(".json", "").split("-")
-                    
-                    # Parse host info
-                    if "debian13-ai-" in filename:
-                        host_parts = (
-                            filename.replace("results_debian13-ai-", "")
-                            .replace("_1.json", "")
-                            .replace("_2.json", "")
-                            .replace("_3.json", "")
-                            .split("-")
-                        )
-                        if "xfs" in host_parts[0]:
-                            fs_type = "xfs"
-                            # Extract block size (e.g., "4k", "16k", etc.)
-                            block_size = host_parts[1] if len(host_parts) > 1 else "unknown"
-                        elif "ext4" in host_parts[0]:
-                            fs_type = "ext4"
-                            block_size = host_parts[1] if len(host_parts) > 1 else "4k"
-                        elif "btrfs" in host_parts[0]:
-                            fs_type = "btrfs"
-                            block_size = "default"
-                        else:
-                            fs_type = "unknown"
-                            block_size = "unknown"
-                    else:
-                        fs_type = "unknown"
-                        block_size = "unknown"
-                else:
-                    # If filesystem came from JSON, set appropriate block size
-                    if fs_type == "btrfs":
-                        block_size = "default"
-                    elif fs_type in ["ext4", "xfs"]:
-                        block_size = data.get("block_size", "4k")
-                    else:
-                        block_size = data.get("block_size", "default")
-                
-                is_dev = "dev" in filename
-                
-                # Use filesystem from JSON if available, otherwise use parsed value
-                if "filesystem" not in data:
-                    data["filesystem"] = fs_type
-                data["block_size"] = block_size
-                data["is_dev"] = is_dev
-                data["filename"] = filename
-
+                # Add filename for filesystem detection
+                data["_file"] = os.path.basename(json_file)
                 results.append(data)
         except Exception as e:
             print(f"Error loading {json_file}: {e}")
@@ -86,554 +95,243 @@ def load_results(results_dir):
     return results
 
 
-def create_filesystem_comparison_chart(results, output_dir):
-    """Create a bar chart comparing performance across filesystems"""
-    # Group by filesystem and baseline/dev
-    fs_data = defaultdict(lambda: {"baseline": [], "dev": []})
-
-    for result in results:
-        fs = result.get("filesystem", "unknown")
-        category = "dev" if result.get("is_dev", False) else "baseline"
-
-        # Extract actual performance data from results
-        if "insert_performance" in result:
-            insert_qps = result["insert_performance"].get("vectors_per_second", 0)
-        else:
-            insert_qps = 0
-        fs_data[fs][category].append(insert_qps)
-
-    # Prepare data for plotting
-    filesystems = list(fs_data.keys())
-    baseline_means = [
-        np.mean(fs_data[fs]["baseline"]) if fs_data[fs]["baseline"] else 0
-        for fs in filesystems
-    ]
-    dev_means = [
-        np.mean(fs_data[fs]["dev"]) if fs_data[fs]["dev"] else 0 for fs in filesystems
-    ]
-
-    x = np.arange(len(filesystems))
-    width = 0.35
-
-    fig, ax = plt.subplots(figsize=(10, 6))
-    baseline_bars = ax.bar(
-        x - width / 2, baseline_means, width, label="Baseline", color="#1f77b4"
-    )
-    dev_bars = ax.bar(
-        x + width / 2, dev_means, width, label="Development", color="#ff7f0e"
-    )
-
-    ax.set_xlabel("Filesystem")
-    ax.set_ylabel("Insert QPS")
-    ax.set_title("Vector Database Performance by Filesystem")
-    ax.set_xticks(x)
-    ax.set_xticklabels(filesystems)
-    ax.legend()
-    ax.grid(True, alpha=0.3)
-
-    # Add value labels on bars
-    for bars in [baseline_bars, dev_bars]:
-        for bar in bars:
-            height = bar.get_height()
-            if height > 0:
-                ax.annotate(
-                    f"{height:.0f}",
-                    xy=(bar.get_x() + bar.get_width() / 2, height),
-                    xytext=(0, 3),
-                    textcoords="offset points",
-                    ha="center",
-                    va="bottom",
-                )
-
-    plt.tight_layout()
-    plt.savefig(os.path.join(output_dir, "filesystem_comparison.png"), dpi=150)
-    plt.close()
-
-
-def create_block_size_analysis(results, output_dir):
-    """Create analysis for different block sizes (XFS specific)"""
-    # Filter XFS results
-    xfs_results = [r for r in results if r.get("filesystem") == "xfs"]
-
-    if not xfs_results:
+def create_simple_performance_trends(results, output_dir):
+    """Create multi-node performance trends chart"""
+    if not results:
         return
 
-    # Group by block size
-    block_size_data = defaultdict(lambda: {"baseline": [], "dev": []})
-
-    for result in xfs_results:
-        block_size = result.get("block_size", "unknown")
-        category = "dev" if result.get("is_dev", False) else "baseline"
-        if "insert_performance" in result:
-            insert_qps = result["insert_performance"].get("vectors_per_second", 0)
-        else:
-            insert_qps = 0
-        block_size_data[block_size][category].append(insert_qps)
-
-    # Sort block sizes
-    block_sizes = sorted(
-        block_size_data.keys(),
-        key=lambda x: (
-            int(x.replace("k", "").replace("s", ""))
-            if x not in ["unknown", "default"]
-            else 0
-        ),
-    )
-
-    # Create grouped bar chart
-    baseline_means = [
-        (
-            np.mean(block_size_data[bs]["baseline"])
-            if block_size_data[bs]["baseline"]
-            else 0
-        )
-        for bs in block_sizes
-    ]
-    dev_means = [
-        np.mean(block_size_data[bs]["dev"]) if block_size_data[bs]["dev"] else 0
-        for bs in block_sizes
-    ]
-
-    x = np.arange(len(block_sizes))
-    width = 0.35
-
-    fig, ax = plt.subplots(figsize=(12, 6))
-    baseline_bars = ax.bar(
-        x - width / 2, baseline_means, width, label="Baseline", color="#2ca02c"
-    )
-    dev_bars = ax.bar(
-        x + width / 2, dev_means, width, label="Development", color="#d62728"
-    )
-
-    ax.set_xlabel("Block Size")
-    ax.set_ylabel("Insert QPS")
-    ax.set_title("XFS Performance by Block Size")
-    ax.set_xticks(x)
-    ax.set_xticklabels(block_sizes)
-    ax.legend()
-    ax.grid(True, alpha=0.3)
-
-    # Add value labels
-    for bars in [baseline_bars, dev_bars]:
-        for bar in bars:
-            height = bar.get_height()
-            if height > 0:
-                ax.annotate(
-                    f"{height:.0f}",
-                    xy=(bar.get_x() + bar.get_width() / 2, height),
-                    xytext=(0, 3),
-                    textcoords="offset points",
-                    ha="center",
-                    va="bottom",
-                )
-
-    plt.tight_layout()
-    plt.savefig(os.path.join(output_dir, "xfs_block_size_analysis.png"), dpi=150)
-    plt.close()
-
-
-def create_heatmap_analysis(results, output_dir):
-    """Create a heatmap showing performance across all configurations"""
-    # Group data by configuration and version
-    config_data = defaultdict(
-        lambda: {
-            "baseline": {"insert": 0, "query": 0},
-            "dev": {"insert": 0, "query": 0},
-        }
-    )
+    # Group results by node
+    node_performance = defaultdict(lambda: {
+        "insert_rates": [],
+        "insert_times": [],
+        "iterations": [],
+        "is_dev": False,
+    })
 
     for result in results:
-        fs = result.get("filesystem", "unknown")
-        block_size = result.get("block_size", "default")
-        config = f"{fs}-{block_size}"
-        version = "dev" if result.get("is_dev", False) else "baseline"
-
-        # Get actual insert performance
-        if "insert_performance" in result:
-            insert_qps = result["insert_performance"].get("vectors_per_second", 0)
-        else:
-            insert_qps = 0
-
-        # Calculate average query QPS
-        query_qps = 0
-        if "query_performance" in result:
-            qp = result["query_performance"]
-            total_qps = 0
-            count = 0
-            for topk_key in ["topk_1", "topk_10", "topk_100"]:
-                if topk_key in qp:
-                    for batch_key in ["batch_1", "batch_10", "batch_100"]:
-                        if batch_key in qp[topk_key]:
-                            total_qps += qp[topk_key][batch_key].get(
-                                "queries_per_second", 0
-                            )
-                            count += 1
-            if count > 0:
-                query_qps = total_qps / count
-
-        config_data[config][version]["insert"] = insert_qps
-        config_data[config][version]["query"] = query_qps
-
-    # Sort configurations
-    configs = sorted(config_data.keys())
-
-    # Prepare data for heatmap
-    insert_baseline = [config_data[c]["baseline"]["insert"] for c in configs]
-    insert_dev = [config_data[c]["dev"]["insert"] for c in configs]
-    query_baseline = [config_data[c]["baseline"]["query"] for c in configs]
-    query_dev = [config_data[c]["dev"]["query"] for c in configs]
-
-    # Create figure with custom heatmap
-    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 8))
-
-    # Create data matrices
-    insert_data = np.array([insert_baseline, insert_dev]).T
-    query_data = np.array([query_baseline, query_dev]).T
-
-    # Insert QPS heatmap
-    im1 = ax1.imshow(insert_data, cmap="YlOrRd", aspect="auto")
-    ax1.set_xticks([0, 1])
-    ax1.set_xticklabels(["Baseline", "Development"])
-    ax1.set_yticks(range(len(configs)))
-    ax1.set_yticklabels(configs)
-    ax1.set_title("Insert Performance Heatmap")
-    ax1.set_ylabel("Configuration")
-
-    # Add text annotations
-    for i in range(len(configs)):
-        for j in range(2):
-            text = ax1.text(
-                j,
-                i,
-                f"{int(insert_data[i, j])}",
-                ha="center",
-                va="center",
-                color="black",
-            )
+        hostname, is_dev = _extract_node_info(result)
+        
+        if hostname not in node_performance:
+            node_performance[hostname] = {
+                "insert_rates": [],
+                "insert_times": [],
+                "iterations": [],
+                "is_dev": is_dev,
+            }
 
-    # Add colorbar
-    cbar1 = plt.colorbar(im1, ax=ax1)
-    cbar1.set_label("Insert QPS")
-
-    # Query QPS heatmap
-    im2 = ax2.imshow(query_data, cmap="YlGnBu", aspect="auto")
-    ax2.set_xticks([0, 1])
-    ax2.set_xticklabels(["Baseline", "Development"])
-    ax2.set_yticks(range(len(configs)))
-    ax2.set_yticklabels(configs)
-    ax2.set_title("Query Performance Heatmap")
-
-    # Add text annotations
-    for i in range(len(configs)):
-        for j in range(2):
-            text = ax2.text(
-                j,
-                i,
-                f"{int(query_data[i, j])}",
-                ha="center",
-                va="center",
-                color="black",
+        insert_perf = result.get("insert_performance", {})
+        if insert_perf:
+            node_performance[hostname]["insert_rates"].append(
+                insert_perf.get("vectors_per_second", 0)
+            )
+            fs_performance[config_key]["insert_times"].append(
+                insert_perf.get("total_time_seconds", 0)
+            )
+            fs_performance[config_key]["iterations"].append(
+                len(fs_performance[config_key]["insert_rates"])
             )
 
-    # Add colorbar
-    cbar2 = plt.colorbar(im2, ax=ax2)
-    cbar2.set_label("Query QPS")
-
-    plt.tight_layout()
-    plt.savefig(os.path.join(output_dir, "performance_heatmap.png"), dpi=150)
-    plt.close()
-
-
-def create_performance_trends(results, output_dir):
-    """Create line charts showing performance trends"""
-    # Group by filesystem type
-    fs_types = defaultdict(
-        lambda: {
-            "configs": [],
-            "baseline_insert": [],
-            "dev_insert": [],
-            "baseline_query": [],
-            "dev_query": [],
-        }
-    )
-
-    for result in results:
-        fs = result.get("filesystem", "unknown")
-        block_size = result.get("block_size", "default")
-        config = f"{block_size}"
-
-        if config not in fs_types[fs]["configs"]:
-            fs_types[fs]["configs"].append(config)
-            fs_types[fs]["baseline_insert"].append(0)
-            fs_types[fs]["dev_insert"].append(0)
-            fs_types[fs]["baseline_query"].append(0)
-            fs_types[fs]["dev_query"].append(0)
-
-        idx = fs_types[fs]["configs"].index(config)
-
-        # Calculate average query QPS from all test configurations
-        query_qps = 0
-        if "query_performance" in result:
-            qp = result["query_performance"]
-            total_qps = 0
-            count = 0
-            for topk_key in ["topk_1", "topk_10", "topk_100"]:
-                if topk_key in qp:
-                    for batch_key in ["batch_1", "batch_10", "batch_100"]:
-                        if batch_key in qp[topk_key]:
-                            total_qps += qp[topk_key][batch_key].get(
-                                "queries_per_second", 0
-                            )
-                            count += 1
-            if count > 0:
-                query_qps = total_qps / count
-
-        if result.get("is_dev", False):
-            if "insert_performance" in result:
-                fs_types[fs]["dev_insert"][idx] = result["insert_performance"].get(
-                    "vectors_per_second", 0
-                )
-            fs_types[fs]["dev_query"][idx] = query_qps
-        else:
-            if "insert_performance" in result:
-                fs_types[fs]["baseline_insert"][idx] = result["insert_performance"].get(
-                    "vectors_per_second", 0
-                )
-            fs_types[fs]["baseline_query"][idx] = query_qps
-
-    # Create separate plots for each filesystem
-    for fs, data in fs_types.items():
-        if not data["configs"]:
-            continue
-
-        fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 10))
-
-        x = range(len(data["configs"]))
-
-        # Insert performance
-        ax1.plot(
-            x,
-            data["baseline_insert"],
-            "o-",
-            label="Baseline",
-            linewidth=2,
-            markersize=8,
-        )
-        ax1.plot(
-            x, data["dev_insert"], "s-", label="Development", linewidth=2, markersize=8
-        )
-        ax1.set_xlabel("Configuration")
-        ax1.set_ylabel("Insert QPS")
-        ax1.set_title(f"{fs.upper()} Insert Performance")
-        ax1.set_xticks(x)
-        ax1.set_xticklabels(data["configs"])
-        ax1.legend()
+    # Check if we have multi-filesystem data
+    if len(fs_performance) > 1:
+        # Multi-filesystem mode: separate lines for each filesystem
+        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
+        
+        colors = ["b", "r", "g", "m", "c", "y", "k"]
+        color_idx = 0
+        
+        for config_key, perf_data in fs_performance.items():
+            if not perf_data["insert_rates"]:
+                continue
+                
+            color = colors[color_idx % len(colors)]
+            iterations = list(range(1, len(perf_data["insert_rates"]) + 1))
+            
+            # Plot insert rate  
+            ax1.plot(
+                iterations,
+                perf_data["insert_rates"], 
+                f"{color}-o",
+                linewidth=2,
+                markersize=6,
+                label=config_key.upper(),
+            )
+            
+            # Plot insert time
+            ax2.plot(
+                iterations,
+                perf_data["insert_times"],
+                f"{color}-o", 
+                linewidth=2,
+                markersize=6,
+                label=config_key.upper(),
+            )
+            
+            color_idx += 1
+            
+        ax1.set_xlabel("Iteration")
+        ax1.set_ylabel("Vectors/Second")
+        ax1.set_title("Milvus Insert Rate by Storage Filesystem")
         ax1.grid(True, alpha=0.3)
-
-        # Query performance
-        ax2.plot(
-            x, data["baseline_query"], "o-", label="Baseline", linewidth=2, markersize=8
-        )
-        ax2.plot(
-            x, data["dev_query"], "s-", label="Development", linewidth=2, markersize=8
-        )
-        ax2.set_xlabel("Configuration")
-        ax2.set_ylabel("Query QPS")
-        ax2.set_title(f"{fs.upper()} Query Performance")
-        ax2.set_xticks(x)
-        ax2.set_xticklabels(data["configs"])
-        ax2.legend()
+        ax1.legend()
+        
+        ax2.set_xlabel("Iteration")
+        ax2.set_ylabel("Total Time (seconds)")
+        ax2.set_title("Milvus Insert Time by Storage Filesystem")
         ax2.grid(True, alpha=0.3)
-
-        plt.tight_layout()
-        plt.savefig(os.path.join(output_dir, f"{fs}_performance_trends.png"), dpi=150)
-        plt.close()
+        ax2.legend()
+    else:
+        # Single filesystem mode: original behavior
+        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
+        
+        # Extract insert data from single filesystem
+        config_key = list(fs_performance.keys())[0] if fs_performance else None
+        if config_key:
+            perf_data = fs_performance[config_key]
+            iterations = list(range(1, len(perf_data["insert_rates"]) + 1))
+            
+            # Plot insert rate
+            ax1.plot(
+                iterations,
+                perf_data["insert_rates"],
+                "b-o",
+                linewidth=2,
+                markersize=6,
+            )
+            ax1.set_xlabel("Iteration")
+            ax1.set_ylabel("Vectors/Second") 
+            ax1.set_title("Vector Insert Rate Performance")
+            ax1.grid(True, alpha=0.3)
+            
+            # Plot insert time
+            ax2.plot(
+                iterations,
+                perf_data["insert_times"],
+                "r-o",
+                linewidth=2,
+                markersize=6,
+            )
+            ax2.set_xlabel("Iteration")
+            ax2.set_ylabel("Total Time (seconds)")
+            ax2.set_title("Vector Insert Time Performance") 
+            ax2.grid(True, alpha=0.3)
+            
+    plt.tight_layout()
+    plt.savefig(os.path.join(output_dir, "performance_trends.png"), dpi=150)
+    plt.close()
 
 
-def create_simple_performance_trends(results, output_dir):
-    """Create a simple performance trends chart for basic Milvus testing"""
+def create_heatmap_analysis(results, output_dir):
+    """Create multi-filesystem heatmap showing query performance"""
     if not results:
         return
-    
-    # Separate baseline and dev results
-    baseline_results = [r for r in results if not r.get("is_dev", False)]
-    dev_results = [r for r in results if r.get("is_dev", False)]
-    
-    if not baseline_results and not dev_results:
-        return
-    
-    fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 10))
-    
-    # Prepare data
-    baseline_insert = []
-    baseline_query = []
-    dev_insert = []
-    dev_query = []
-    labels = []
-    
-    # Process baseline results
-    for i, result in enumerate(baseline_results):
-        if "insert_performance" in result:
-            baseline_insert.append(result["insert_performance"].get("vectors_per_second", 0))
-        else:
-            baseline_insert.append(0)
+
+    # Group data by filesystem configuration
+    fs_performance = defaultdict(lambda: {
+        "query_data": [],
+        "config_key": "",
+    })
+
+    for result in results:
+        fs_type, block_size, config_key = _extract_filesystem_config(result)
         
-        # Calculate average query QPS
-        query_qps = 0
-        if "query_performance" in result:
-            qp = result["query_performance"]
-            total_qps = 0
-            count = 0
-            for topk_key in ["topk_1", "topk_10", "topk_100"]:
-                if topk_key in qp:
-                    for batch_key in ["batch_1", "batch_10", "batch_100"]:
-                        if batch_key in qp[topk_key]:
-                            total_qps += qp[topk_key][batch_key].get("queries_per_second", 0)
-                            count += 1
-            if count > 0:
-                query_qps = total_qps / count
-        baseline_query.append(query_qps)
-        labels.append(f"Run {i+1}")
-    
-    # Process dev results
-    for result in dev_results:
-        if "insert_performance" in result:
-            dev_insert.append(result["insert_performance"].get("vectors_per_second", 0))
-        else:
-            dev_insert.append(0)
+        query_perf = result.get("query_performance", {})
+        for topk, topk_data in query_perf.items():
+            for batch, batch_data in topk_data.items():
+                qps = batch_data.get("queries_per_second", 0)
+                fs_performance[config_key]["query_data"].append({
+                    "topk": topk,
+                    "batch": batch,
+                    "qps": qps,
+                })
+                fs_performance[config_key]["config_key"] = config_key
+
+    # Check if we have multi-filesystem data
+    if len(fs_performance) > 1:
+        # Multi-filesystem mode: separate heatmaps for each filesystem
+        num_fs = len(fs_performance)
+        fig, axes = plt.subplots(1, num_fs, figsize=(5*num_fs, 6))
+        if num_fs == 1:
+            axes = [axes]
         
-        query_qps = 0
-        if "query_performance" in result:
-            qp = result["query_performance"]
-            total_qps = 0
-            count = 0
-            for topk_key in ["topk_1", "topk_10", "topk_100"]:
-                if topk_key in qp:
-                    for batch_key in ["batch_1", "batch_10", "batch_100"]:
-                        if batch_key in qp[topk_key]:
-                            total_qps += qp[topk_key][batch_key].get("queries_per_second", 0)
-                            count += 1
-            if count > 0:
-                query_qps = total_qps / count
-        dev_query.append(query_qps)
-    
-    x = range(len(baseline_results) if baseline_results else len(dev_results))
-    
-    # Insert performance
-    if baseline_insert:
-        ax1.plot(x, baseline_insert, "o-", label="Baseline", linewidth=2, markersize=8)
-    if dev_insert:
-        ax1.plot(x[:len(dev_insert)], dev_insert, "s-", label="Development", linewidth=2, markersize=8)
-    ax1.set_xlabel("Test Run")
-    ax1.set_ylabel("Insert QPS")
-    ax1.set_title("Milvus Insert Performance")
-    ax1.set_xticks(x)
-    ax1.set_xticklabels(labels if labels else [f"Run {i+1}" for i in x])
-    ax1.legend()
-    ax1.grid(True, alpha=0.3)
-    
-    # Query performance
-    if baseline_query:
-        ax2.plot(x, baseline_query, "o-", label="Baseline", linewidth=2, markersize=8)
-    if dev_query:
-        ax2.plot(x[:len(dev_query)], dev_query, "s-", label="Development", linewidth=2, markersize=8)
-    ax2.set_xlabel("Test Run")
-    ax2.set_ylabel("Query QPS")
-    ax2.set_title("Milvus Query Performance")
-    ax2.set_xticks(x)
-    ax2.set_xticklabels(labels if labels else [f"Run {i+1}" for i in x])
-    ax2.legend()
-    ax2.grid(True, alpha=0.3)
+        # Define common structure for consistency
+        topk_order = ["topk_1", "topk_10", "topk_100"]
+        batch_order = ["batch_1", "batch_10", "batch_100"]
+        
+        for idx, (config_key, perf_data) in enumerate(fs_performance.items()):
+            # Create matrix for this filesystem
+            matrix = np.zeros((len(topk_order), len(batch_order)))
+            
+            # Fill matrix with data
+            query_dict = {}
+            for item in perf_data["query_data"]:
+                query_dict[(item["topk"], item["batch"])] = item["qps"]
+                
+            for i, topk in enumerate(topk_order):
+                for j, batch in enumerate(batch_order):
+                    matrix[i, j] = query_dict.get((topk, batch), 0)
+            
+            # Plot heatmap
+            im = axes[idx].imshow(matrix, cmap='viridis', aspect='auto')
+            axes[idx].set_title(f"{config_key.upper()} Query Performance")
+            axes[idx].set_xticks(range(len(batch_order)))
+            axes[idx].set_xticklabels([b.replace("batch_", "Batch ") for b in batch_order])
+            axes[idx].set_yticks(range(len(topk_order)))
+            axes[idx].set_yticklabels([t.replace("topk_", "Top-") for t in topk_order])
+            
+            # Add text annotations
+            for i in range(len(topk_order)):
+                for j in range(len(batch_order)):
+                    axes[idx].text(j, i, f'{matrix[i, j]:.0f}',
+                                 ha="center", va="center", color="white", fontweight="bold")
+            
+            # Add colorbar
+            cbar = plt.colorbar(im, ax=axes[idx])
+            cbar.set_label('Queries Per Second (QPS)')
+    else:
+        # Single filesystem mode
+        fig, ax = plt.subplots(1, 1, figsize=(8, 6))
+        
+        if fs_performance:
+            config_key = list(fs_performance.keys())[0]
+            perf_data = fs_performance[config_key]
+            
+            # Create matrix
+            topk_order = ["topk_1", "topk_10", "topk_100"]
+            batch_order = ["batch_1", "batch_10", "batch_100"]
+            matrix = np.zeros((len(topk_order), len(batch_order)))
+            
+            # Fill matrix with data
+            query_dict = {}
+            for item in perf_data["query_data"]:
+                query_dict[(item["topk"], item["batch"])] = item["qps"]
+                
+            for i, topk in enumerate(topk_order):
+                for j, batch in enumerate(batch_order):
+                    matrix[i, j] = query_dict.get((topk, batch), 0)
+            
+            # Plot heatmap
+            im = ax.imshow(matrix, cmap='viridis', aspect='auto')
+            ax.set_title("Milvus Query Performance Heatmap")
+            ax.set_xticks(range(len(batch_order)))
+            ax.set_xticklabels([b.replace("batch_", "Batch ") for b in batch_order])
+            ax.set_yticks(range(len(topk_order)))
+            ax.set_yticklabels([t.replace("topk_", "Top-") for t in topk_order])
+            
+            # Add text annotations
+            for i in range(len(topk_order)):
+                for j in range(len(batch_order)):
+                    ax.text(j, i, f'{matrix[i, j]:.0f}',
+                           ha="center", va="center", color="white", fontweight="bold")
+            
+            # Add colorbar
+            cbar = plt.colorbar(im, ax=ax)
+            cbar.set_label('Queries Per Second (QPS)')
     
     plt.tight_layout()
-    plt.savefig(os.path.join(output_dir, "performance_trends.png"), dpi=150)
+    plt.savefig(os.path.join(output_dir, "performance_heatmap.png"), dpi=150, bbox_inches="tight")
     plt.close()
 
 
-def generate_summary_statistics(results, output_dir):
-    """Generate summary statistics and save to JSON"""
-    summary = {
-        "total_tests": len(results),
-        "filesystems_tested": list(
-            set(r.get("filesystem", "unknown") for r in results)
-        ),
-        "configurations": {},
-        "performance_summary": {
-            "best_insert_qps": {"value": 0, "config": ""},
-            "best_query_qps": {"value": 0, "config": ""},
-            "average_insert_qps": 0,
-            "average_query_qps": 0,
-        },
-    }
-
-    # Calculate statistics
-    all_insert_qps = []
-    all_query_qps = []
-
-    for result in results:
-        fs = result.get("filesystem", "unknown")
-        block_size = result.get("block_size", "default")
-        is_dev = "dev" if result.get("is_dev", False) else "baseline"
-        config_name = f"{fs}-{block_size}-{is_dev}"
-
-        # Get actual performance metrics
-        if "insert_performance" in result:
-            insert_qps = result["insert_performance"].get("vectors_per_second", 0)
-        else:
-            insert_qps = 0
-
-        # Calculate average query QPS
-        query_qps = 0
-        if "query_performance" in result:
-            qp = result["query_performance"]
-            total_qps = 0
-            count = 0
-            for topk_key in ["topk_1", "topk_10", "topk_100"]:
-                if topk_key in qp:
-                    for batch_key in ["batch_1", "batch_10", "batch_100"]:
-                        if batch_key in qp[topk_key]:
-                            total_qps += qp[topk_key][batch_key].get(
-                                "queries_per_second", 0
-                            )
-                            count += 1
-            if count > 0:
-                query_qps = total_qps / count
-
-        all_insert_qps.append(insert_qps)
-        all_query_qps.append(query_qps)
-
-        summary["configurations"][config_name] = {
-            "insert_qps": insert_qps,
-            "query_qps": query_qps,
-            "host": result.get("host", "unknown"),
-        }
-
-        if insert_qps > summary["performance_summary"]["best_insert_qps"]["value"]:
-            summary["performance_summary"]["best_insert_qps"] = {
-                "value": insert_qps,
-                "config": config_name,
-            }
-
-        if query_qps > summary["performance_summary"]["best_query_qps"]["value"]:
-            summary["performance_summary"]["best_query_qps"] = {
-                "value": query_qps,
-                "config": config_name,
-            }
-
-    summary["performance_summary"]["average_insert_qps"] = (
-        np.mean(all_insert_qps) if all_insert_qps else 0
-    )
-    summary["performance_summary"]["average_query_qps"] = (
-        np.mean(all_query_qps) if all_query_qps else 0
-    )
-
-    # Save summary
-    with open(os.path.join(output_dir, "summary.json"), "w") as f:
-        json.dump(summary, f, indent=2)
-
-    return summary
-
-
 def main():
     if len(sys.argv) < 3:
         print("Usage: generate_graphs.py <results_dir> <output_dir>")
@@ -642,37 +340,23 @@ def main():
     results_dir = sys.argv[1]
     output_dir = sys.argv[2]
 
-    # Create output directory
+    # Ensure output directory exists
     os.makedirs(output_dir, exist_ok=True)
 
     # Load results
     results = load_results(results_dir)
-
     if not results:
-        print("No results found to analyze")
+        print(f"No valid results found in {results_dir}")
         sys.exit(1)
 
     print(f"Loaded {len(results)} result files")
 
     # Generate graphs
-    print("Generating performance heatmap...")
-    create_heatmap_analysis(results, output_dir)
-
-    print("Generating performance trends...")
     create_simple_performance_trends(results, output_dir)
+    create_heatmap_analysis(results, output_dir)
 
-    print("Generating summary statistics...")
-    summary = generate_summary_statistics(results, output_dir)
-
-    print(f"\nAnalysis complete! Graphs saved to {output_dir}")
-    print(f"Total configurations tested: {summary['total_tests']}")
-    print(
-        f"Best insert QPS: {summary['performance_summary']['best_insert_qps']['value']} ({summary['performance_summary']['best_insert_qps']['config']})"
-    )
-    print(
-        f"Best query QPS: {summary['performance_summary']['best_query_qps']['value']} ({summary['performance_summary']['best_query_qps']['config']})"
-    )
+    print(f"Graphs generated in {output_dir}")
 
 
 if __name__ == "__main__":
-    main()
+    main()
\ No newline at end of file
diff --git a/playbooks/roles/ai_collect_results/files/generate_html_report.py b/playbooks/roles/ai_collect_results/files/generate_html_report.py
index a205577c..01ec734c 100755
--- a/playbooks/roles/ai_collect_results/files/generate_html_report.py
+++ b/playbooks/roles/ai_collect_results/files/generate_html_report.py
@@ -69,6 +69,24 @@ HTML_TEMPLATE = """
             color: #7f8c8d;
             font-size: 0.9em;
         }}
+        .config-box {{
+            background: #f8f9fa;
+            border-left: 4px solid #3498db;
+            padding: 15px;
+            margin: 20px 0;
+            border-radius: 4px;
+        }}
+        .config-box h3 {{
+            margin-top: 0;
+            color: #2c3e50;
+        }}
+        .config-box ul {{
+            margin: 10px 0;
+            padding-left: 20px;
+        }}
+        .config-box li {{
+            margin: 5px 0;
+        }}
         .section {{
             background: white;
             padding: 30px;
@@ -162,15 +180,16 @@ HTML_TEMPLATE = """
 </head>
 <body>
     <div class="header">
-        <h1>AI Vector Database Benchmark Results</h1>
+        <h1>Milvus Vector Database Benchmark Results</h1>
         <div class="subtitle">Generated on {timestamp}</div>
     </div>
     
     <nav class="navigation">
         <ul>
             <li><a href="#summary">Summary</a></li>
+            {filesystem_nav_items}
             <li><a href="#performance-metrics">Performance Metrics</a></li>
-            <li><a href="#performance-trends">Performance Trends</a></li>
+            <li><a href="#performance-heatmap">Performance Heatmap</a></li>
             <li><a href="#detailed-results">Detailed Results</a></li>
         </ul>
     </nav>
@@ -192,34 +211,40 @@ HTML_TEMPLATE = """
             <div class="label">{best_query_config}</div>
         </div>
         <div class="card">
-            <h3>Test Runs</h3>
-            <div class="value">{total_tests}</div>
-            <div class="label">Benchmark Executions</div>
+            <h3>{fourth_card_title}</h3>
+            <div class="value">{fourth_card_value}</div>
+            <div class="label">{fourth_card_label}</div>
         </div>
     </div>
     
-    <div id="performance-metrics" class="section">
-        <h2>Performance Metrics</h2>
-        <p>Key performance indicators for Milvus vector database operations.</p>
+    {filesystem_comparison_section}
+    
+    {block_size_analysis_section}
+    
+    <div id="performance-heatmap" class="section">
+        <h2>Performance Heatmap</h2>
+        <p>Heatmap visualization showing performance metrics across all tested configurations.</p>
         <div class="graph-container">
-            <img src="graphs/performance_heatmap.png" alt="Performance Metrics">
+            <img src="graphs/performance_heatmap.png" alt="Performance Heatmap">
         </div>
     </div>
     
-    <div id="performance-trends" class="section">
-        <h2>Performance Trends</h2>
-        <p>Performance comparison between baseline and development configurations.</p>
-        <div class="graph-container">
-            <img src="graphs/performance_trends.png" alt="Performance Trends">
+    <div id="performance-metrics" class="section">
+        <h2>Performance Metrics</h2>
+        {config_summary}
+        <div class="graph-grid">
+            {performance_trend_graphs}
         </div>
     </div>
     
     <div id="detailed-results" class="section">
-        <h2>Detailed Results Table</h2>
+        <h2>Milvus Performance by Storage Filesystem</h2>
+        <p>This table shows how Milvus vector database performs when its data is stored on different filesystem types and configurations.</p>
         <table class="results-table">
             <thead>
                 <tr>
-                    <th>Host</th>
+                    <th>Filesystem</th>
+                    <th>Configuration</th>
                     <th>Type</th>
                     <th>Insert QPS</th>
                     <th>Query QPS</th>
@@ -260,51 +285,77 @@ def load_results(results_dir):
                 data = json.load(f)
                 # Get filesystem from JSON data first, then fallback to filename parsing
                 filename = os.path.basename(json_file)
-                
+
                 # Skip results without valid performance data
                 insert_perf = data.get("insert_performance", {})
                 query_perf = data.get("query_performance", {})
                 if not insert_perf or not query_perf:
                     continue
-                
+
                 # Get filesystem from JSON data
                 fs_type = data.get("filesystem", None)
-                
-                # If not in JSON, try to parse from filename (backwards compatibility)
-                if not fs_type and "debian13-ai" in filename:
-                    host_parts = (
-                        filename.replace("results_debian13-ai-", "")
-                        .replace("_1.json", "")
+
+                # Always try to parse from filename first since JSON data might be wrong
+                if "-ai-" in filename:
+                    # Handle both debian13-ai- and prod-ai- prefixes
+                    cleaned_filename = filename.replace("results_", "")
+
+                    # Extract the part after -ai-
+                    if "debian13-ai-" in cleaned_filename:
+                        host_part = cleaned_filename.replace("debian13-ai-", "")
+                    elif "prod-ai-" in cleaned_filename:
+                        host_part = cleaned_filename.replace("prod-ai-", "")
+                    else:
+                        # Generic extraction
+                        ai_index = cleaned_filename.find("-ai-")
+                        if ai_index != -1:
+                            host_part = cleaned_filename[ai_index + 4 :]  # Skip "-ai-"
+                        else:
+                            host_part = cleaned_filename
+
+                    # Remove file extensions and dev suffix
+                    host_part = (
+                        host_part.replace("_1.json", "")
                         .replace("_2.json", "")
                         .replace("_3.json", "")
-                        .split("-")
+                        .replace("-dev", "")
                     )
-                    if "xfs" in host_parts[0]:
+
+                    # Parse filesystem type and block size
+                    if host_part.startswith("xfs-"):
                         fs_type = "xfs"
-                        block_size = host_parts[1] if len(host_parts) > 1 else "4k"
-                    elif "ext4" in host_parts[0]:
+                        # Extract block size: xfs-4k-4ks -> 4k
+                        parts = host_part.split("-")
+                        if len(parts) >= 2:
+                            block_size = parts[1]  # 4k, 16k, 32k, 64k
+                        else:
+                            block_size = "4k"
+                    elif host_part.startswith("ext4-"):
                         fs_type = "ext4"
-                        block_size = host_parts[1] if len(host_parts) > 1 else "4k"
-                    elif "btrfs" in host_parts[0]:
+                        parts = host_part.split("-")
+                        block_size = parts[1] if len(parts) > 1 else "4k"
+                    elif host_part.startswith("btrfs"):
                         fs_type = "btrfs"
                         block_size = "default"
                     else:
-                        fs_type = "unknown"
-                        block_size = "unknown"
+                        # Fallback to JSON data if available
+                        if not fs_type:
+                            fs_type = "unknown"
+                            block_size = "unknown"
                 else:
                     # Set appropriate block size based on filesystem
                     if fs_type == "btrfs":
                         block_size = "default"
                     else:
                         block_size = data.get("block_size", "default")
-                
+
                 # Default to unknown if still not found
                 if not fs_type:
                     fs_type = "unknown"
                     block_size = "unknown"
-                
+
                 is_dev = "dev" in filename
-                
+
                 # Calculate average QPS from query performance data
                 query_qps = 0
                 query_count = 0
@@ -316,7 +367,7 @@ def load_results(results_dir):
                             query_count += 1
                 if query_count > 0:
                     query_qps = query_qps / query_count
-                
+
                 results.append(
                     {
                         "host": filename.replace("results_", "").replace(".json", ""),
@@ -348,12 +399,36 @@ def generate_table_rows(results, best_configs):
         if config_key in best_configs:
             row_class += " best-config"
 
+        # Generate descriptive labels showing Milvus is running on this filesystem
+        if result["filesystem"] == "xfs" and result["block_size"] != "default":
+            storage_label = f"XFS {result['block_size'].upper()}"
+            config_details = f"Block size: {result['block_size']}, Milvus data on XFS"
+        elif result["filesystem"] == "ext4":
+            storage_label = "EXT4"
+            if "bigalloc" in result.get("host", "").lower():
+                config_details = "EXT4 with bigalloc, Milvus data on ext4"
+            else:
+                config_details = (
+                    f"Block size: {result['block_size']}, Milvus data on ext4"
+                )
+        elif result["filesystem"] == "btrfs":
+            storage_label = "BTRFS"
+            config_details = "Default Btrfs settings, Milvus data on Btrfs"
+        else:
+            storage_label = result["filesystem"].upper()
+            config_details = f"Milvus data on {result['filesystem']}"
+
+        # Extract clean node identifier from hostname
+        node_name = result["host"].replace("results_", "").replace(".json", "")
+
         row = f"""
         <tr class="{row_class}">
-            <td>{result['host']}</td>
+            <td><strong>{storage_label}</strong></td>
+            <td>{config_details}</td>
             <td>{result['type']}</td>
             <td>{result['insert_qps']:,}</td>
             <td>{result['query_qps']:,}</td>
+            <td><code>{node_name}</code></td>
             <td>{result['timestamp']}</td>
         </tr>
         """
@@ -362,10 +437,66 @@ def generate_table_rows(results, best_configs):
     return "\n".join(rows)
 
 
+def generate_config_summary(results_dir):
+    """Generate configuration summary HTML from results"""
+    # Try to load first result file to get configuration
+    result_files = glob.glob(os.path.join(results_dir, "results_*.json"))
+    if not result_files:
+        return ""
+
+    try:
+        with open(result_files[0], "r") as f:
+            data = json.load(f)
+            config = data.get("config", {})
+
+            # Format configuration details
+            config_html = """
+        <div class="config-box">
+            <h3>Test Configuration</h3>
+            <ul>
+                <li><strong>Vector Dataset Size:</strong> {:,} vectors</li>
+                <li><strong>Vector Dimensions:</strong> {}</li>
+                <li><strong>Index Type:</strong> {} (M={}, ef_construction={}, ef={})</li>
+                <li><strong>Benchmark Runtime:</strong> {} seconds</li>
+                <li><strong>Batch Size:</strong> {:,}</li>
+                <li><strong>Test Iterations:</strong> {} runs with identical configuration</li>
+            </ul>
+        </div>
+            """.format(
+                config.get("vector_dataset_size", "N/A"),
+                config.get("vector_dimensions", "N/A"),
+                config.get("index_type", "N/A"),
+                config.get("index_hnsw_m", "N/A"),
+                config.get("index_hnsw_ef_construction", "N/A"),
+                config.get("index_hnsw_ef", "N/A"),
+                config.get("benchmark_runtime", "N/A"),
+                config.get("batch_size", "N/A"),
+                len(result_files),
+            )
+            return config_html
+    except Exception as e:
+        print(f"Warning: Could not generate config summary: {e}")
+        return ""
+
+
 def find_performance_trend_graphs(graphs_dir):
-    """Find performance trend graph"""
-    # Not used in basic implementation since we embed the graph directly
-    return ""
+    """Find performance trend graphs"""
+    graphs = []
+    # Look for filesystem-specific graphs in multi-fs mode
+    for fs in ["xfs", "ext4", "btrfs"]:
+        graph_path = f"{fs}_performance_trends.png"
+        if os.path.exists(os.path.join(graphs_dir, graph_path)):
+            graphs.append(
+                f'<div class="graph-container"><img src="graphs/{graph_path}" alt="{fs.upper()} Performance Trends"></div>'
+            )
+    # Fallback to simple performance trends for single mode
+    if not graphs and os.path.exists(
+        os.path.join(graphs_dir, "performance_trends.png")
+    ):
+        graphs.append(
+            '<div class="graph-container"><img src="graphs/performance_trends.png" alt="Performance Trends"></div>'
+        )
+    return "\n".join(graphs)
 
 
 def generate_html_report(results_dir, graphs_dir, output_path):
@@ -393,6 +524,50 @@ def generate_html_report(results_dir, graphs_dir, output_path):
     if summary["performance_summary"]["best_query_qps"]["config"]:
         best_configs.add(summary["performance_summary"]["best_query_qps"]["config"])
 
+    # Check if multi-filesystem testing is enabled (more than one filesystem)
+    filesystems_tested = summary.get("filesystems_tested", [])
+    is_multifs_enabled = len(filesystems_tested) > 1
+
+    # Generate conditional sections based on multi-fs status
+    if is_multifs_enabled:
+        filesystem_nav_items = """
+            <li><a href="#filesystem-comparison">Filesystem Comparison</a></li>
+            <li><a href="#block-size-analysis">Block Size Analysis</a></li>"""
+
+        filesystem_comparison_section = """<div id="filesystem-comparison" class="section">
+        <h2>Milvus Storage Filesystem Comparison</h2>
+        <p>Comparison of Milvus vector database performance when its data is stored on different filesystem types (XFS, ext4, Btrfs) with various configurations.</p>
+        <div class="graph-container">
+            <img src="graphs/filesystem_comparison.png" alt="Filesystem Comparison">
+        </div>
+    </div>"""
+
+        block_size_analysis_section = """<div id="block-size-analysis" class="section">
+        <h2>XFS Block Size Analysis</h2>
+        <p>Performance analysis of XFS filesystem with different block sizes (4K, 16K, 32K, 64K).</p>
+        <div class="graph-container">
+            <img src="graphs/xfs_block_size_analysis.png" alt="XFS Block Size Analysis">
+        </div>
+    </div>"""
+
+        # Multi-fs mode: show filesystem info
+        fourth_card_title = "Storage Filesystems"
+        fourth_card_value = str(len(filesystems_tested))
+        fourth_card_label = ", ".join(filesystems_tested).upper() + " for Milvus Data"
+    else:
+        # Single filesystem mode - hide multi-fs sections
+        filesystem_nav_items = ""
+        filesystem_comparison_section = ""
+        block_size_analysis_section = ""
+
+        # Single mode: show test iterations
+        fourth_card_title = "Test Iterations"
+        fourth_card_value = str(summary["total_tests"])
+        fourth_card_label = "Identical Configuration Runs"
+
+    # Generate configuration summary
+    config_summary = generate_config_summary(results_dir)
+
     # Generate HTML
     html_content = HTML_TEMPLATE.format(
         timestamp=datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
@@ -401,6 +576,14 @@ def generate_html_report(results_dir, graphs_dir, output_path):
         best_insert_config=summary["performance_summary"]["best_insert_qps"]["config"],
         best_query_qps=f"{summary['performance_summary']['best_query_qps']['value']:,}",
         best_query_config=summary["performance_summary"]["best_query_qps"]["config"],
+        fourth_card_title=fourth_card_title,
+        fourth_card_value=fourth_card_value,
+        fourth_card_label=fourth_card_label,
+        filesystem_nav_items=filesystem_nav_items,
+        filesystem_comparison_section=filesystem_comparison_section,
+        block_size_analysis_section=block_size_analysis_section,
+        config_summary=config_summary,
+        performance_trend_graphs=find_performance_trend_graphs(graphs_dir),
         table_rows=generate_table_rows(results, best_configs),
     )
 
diff --git a/playbooks/roles/ai_collect_results/tasks/main.yml b/playbooks/roles/ai_collect_results/tasks/main.yml
index 6a15d89c..9586890a 100644
--- a/playbooks/roles/ai_collect_results/tasks/main.yml
+++ b/playbooks/roles/ai_collect_results/tasks/main.yml
@@ -134,13 +134,22 @@
   ansible.builtin.command: >
     python3 {{ local_scripts_dir }}/analyze_results.py
     --results-dir {{ local_results_dir }}
-    --output-dir {{ local_results_dir }}
+    --output-dir {{ local_results_dir }}/graphs
     {% if ai_benchmark_enable_graphing | bool %}--config {{ local_scripts_dir }}/analysis_config.json{% endif %}
   register: analysis_result
   run_once: true
   delegate_to: localhost
   when: collected_results.files is defined and collected_results.files | length > 0
   tags: ['results', 'analysis']
+  failed_when: analysis_result.rc != 0
+
+- name: Display analysis script output
+  ansible.builtin.debug:
+    var: analysis_result
+  run_once: true
+  delegate_to: localhost
+  when: collected_results.files is defined and collected_results.files | length > 0
+  tags: ['results', 'analysis']
 
 
 - name: Create graphs directory
@@ -155,35 +164,8 @@
     - collected_results.files | length > 0
   tags: ['results', 'graphs']
 
-- name: Generate performance graphs
-  ansible.builtin.command: >
-    python3 {{ local_scripts_dir }}/generate_better_graphs.py
-    {{ local_results_dir }}
-    {{ local_results_dir }}/graphs
-  register: graph_generation_result
-  failed_when: false
-  run_once: true
-  delegate_to: localhost
-  when:
-    - collected_results.files is defined
-    - collected_results.files | length > 0
-    - ai_benchmark_enable_graphing|bool
-  tags: ['results', 'graphs']
-
-- name: Fallback to basic graphs if better graphs fail
-  ansible.builtin.command: >
-    python3 {{ local_scripts_dir }}/generate_graphs.py
-    {{ local_results_dir }}
-    {{ local_results_dir }}/graphs
-  run_once: true
-  delegate_to: localhost
-  when:
-    - collected_results.files is defined
-    - collected_results.files | length > 0
-    - ai_benchmark_enable_graphing|bool
-    - graph_generation_result is defined
-    - graph_generation_result.rc != 0
-  tags: ['results', 'graphs']
+# Graph generation is now handled by analyze_results.py above
+# No separate graph generation step needed
 
 - name: Generate HTML report
   ansible.builtin.command: >
diff --git a/playbooks/roles/ai_collect_results/templates/analysis_config.json.j2 b/playbooks/roles/ai_collect_results/templates/analysis_config.json.j2
index 5a879649..459cd602 100644
--- a/playbooks/roles/ai_collect_results/templates/analysis_config.json.j2
+++ b/playbooks/roles/ai_collect_results/templates/analysis_config.json.j2
@@ -2,5 +2,5 @@
   "enable_graphing": {{ ai_benchmark_enable_graphing|default(true)|lower }},
   "graph_format": "{{ ai_benchmark_graph_format|default('png') }}",
   "graph_dpi": {{ ai_benchmark_graph_dpi|default(150) }},
-  "graph_theme": "{{ ai_benchmark_graph_theme|default('seaborn') }}"
+  "graph_theme": "{{ ai_benchmark_graph_theme|default('default') }}"
 }
diff --git a/playbooks/roles/ai_milvus_storage/tasks/main.yml b/playbooks/roles/ai_milvus_storage/tasks/main.yml
new file mode 100644
index 00000000..f8e4ea63
--- /dev/null
+++ b/playbooks/roles/ai_milvus_storage/tasks/main.yml
@@ -0,0 +1,161 @@
+---
+- name: Import optional extra_args file
+  include_vars: "{{ item }}"
+  ignore_errors: yes
+  with_items:
+    - "../extra_vars.yaml"
+  tags: vars
+
+- name: Milvus storage setup
+  when: ai_milvus_storage_enable|bool
+  block:
+    - name: Install filesystem utilities
+      package:
+        name:
+          - xfsprogs
+          - e2fsprogs
+          - btrfs-progs
+        state: present
+      become: yes
+      become_method: sudo
+
+    - name: Check if device exists
+      stat:
+        path: "{{ ai_milvus_device }}"
+      register: milvus_device_stat
+      failed_when: not milvus_device_stat.stat.exists
+
+    - name: Check if Milvus storage is already mounted
+      command: mountpoint -q {{ ai_milvus_mount_point }}
+      register: milvus_mount_check
+      changed_when: false
+      failed_when: false
+
+    - name: Setup Milvus storage filesystem
+      when: milvus_mount_check.rc != 0
+      block:
+        - name: Create Milvus mount point directory
+          file:
+            path: "{{ ai_milvus_mount_point }}"
+            state: directory
+            mode: '0755'
+          become: yes
+          become_method: sudo
+
+        - name: Detect filesystem type from node name
+          set_fact:
+            detected_fstype: >-
+              {%- if 'xfs' in inventory_hostname -%}
+                xfs
+              {%- elif 'ext4' in inventory_hostname -%}
+                ext4
+              {%- elif 'btrfs' in inventory_hostname -%}
+                btrfs
+              {%- else -%}
+                {{ ai_milvus_fstype | default('xfs') }}
+              {%- endif -%}
+          when: ai_milvus_use_node_fs | default(false) | bool
+
+        - name: Detect XFS parameters from node name
+          set_fact:
+            milvus_xfs_blocksize: >-
+              {%- if '64k' in inventory_hostname -%}
+                65536
+              {%- elif '32k' in inventory_hostname -%}
+                32768
+              {%- elif '16k' in inventory_hostname -%}
+                16384
+              {%- else -%}
+                {{ ai_milvus_xfs_blocksize | default(4096) }}
+              {%- endif -%}
+            milvus_xfs_sectorsize: >-
+              {%- if '4ks' in inventory_hostname -%}
+                4096
+              {%- elif '512s' in inventory_hostname -%}
+                512
+              {%- else -%}
+                {{ ai_milvus_xfs_sectorsize | default(4096) }}
+              {%- endif -%}
+          when:
+            - ai_milvus_use_node_fs | default(false) | bool
+            - detected_fstype | default(ai_milvus_fstype) == 'xfs'
+
+        - name: Detect ext4 parameters from node name
+          set_fact:
+            milvus_ext4_opts: >-
+              {%- if '16k' in inventory_hostname and 'bigalloc' in inventory_hostname -%}
+                -F -b 4096 -C 16384 -O bigalloc
+              {%- elif '4k' in inventory_hostname -%}
+                -F -b 4096
+              {%- else -%}
+                {{ ai_milvus_ext4_mkfs_opts | default('-F') }}
+              {%- endif -%}
+          when:
+            - ai_milvus_use_node_fs | default(false) | bool
+            - detected_fstype | default(ai_milvus_fstype) == 'ext4'
+
+        - name: Set final filesystem type
+          set_fact:
+            milvus_fstype: "{{ detected_fstype | default(ai_milvus_fstype | default('xfs')) }}"
+
+        - name: Format device with XFS
+          command: >
+            mkfs.xfs -f
+            -b size={{ milvus_xfs_blocksize | default(ai_milvus_xfs_blocksize | default(4096)) }}
+            -s size={{ milvus_xfs_sectorsize | default(ai_milvus_xfs_sectorsize | default(4096)) }}
+            {{ ai_milvus_xfs_mkfs_opts | default('') }}
+            {{ ai_milvus_device }}
+          when: milvus_fstype == "xfs"
+          become: yes
+          become_method: sudo
+
+        - name: Format device with Btrfs
+          command: mkfs.btrfs {{ ai_milvus_btrfs_mkfs_opts | default('-f') }} {{ ai_milvus_device }}
+          when: milvus_fstype == "btrfs"
+          become: yes
+          become_method: sudo
+
+        - name: Format device with ext4
+          command: mkfs.ext4 {{ milvus_ext4_opts | default(ai_milvus_ext4_mkfs_opts | default('-F')) }} {{ ai_milvus_device }}
+          when: milvus_fstype == "ext4"
+          become: yes
+          become_method: sudo
+
+        - name: Mount Milvus storage filesystem
+          mount:
+            path: "{{ ai_milvus_mount_point }}"
+            src: "{{ ai_milvus_device }}"
+            fstype: "{{ milvus_fstype }}"
+            opts: defaults,noatime
+            state: mounted
+          become: yes
+          become_method: sudo
+
+        - name: Add Milvus storage mount to fstab
+          mount:
+            path: "{{ ai_milvus_mount_point }}"
+            src: "{{ ai_milvus_device }}"
+            fstype: "{{ milvus_fstype }}"
+            opts: defaults,noatime
+            state: present
+          become: yes
+          become_method: sudo
+
+    - name: Ensure Milvus directories exist with proper permissions
+      file:
+        path: "{{ item }}"
+        state: directory
+        mode: '0755'
+        owner: root
+        group: root
+      become: yes
+      become_method: sudo
+      loop:
+        - "{{ ai_milvus_mount_point }}"
+        - "{{ ai_milvus_mount_point }}/data"
+        - "{{ ai_milvus_mount_point }}/etcd"
+        - "{{ ai_milvus_mount_point }}/minio"
+
+    - name: Display Milvus storage setup complete
+      debug:
+        msg: "Milvus storage has been prepared at: {{ ai_milvus_mount_point }} with filesystem: {{ milvus_fstype | default(ai_milvus_fstype | default('xfs')) }}"
diff --git a/playbooks/roles/ai_multifs_run/tasks/generate_comparison.yml b/playbooks/roles/ai_multifs_run/tasks/generate_comparison.yml
new file mode 100644
index 00000000..b4453b81
--- /dev/null
+++ b/playbooks/roles/ai_multifs_run/tasks/generate_comparison.yml
@@ -0,0 +1,279 @@
+---
+- name: Create multi-filesystem comparison script
+  copy:
+    content: |
+      #!/usr/bin/env python3
+      """
+      Multi-Filesystem AI Benchmark Comparison Report Generator
+
+      This script analyzes AI benchmark results across different filesystem
+      configurations and generates a comprehensive comparison report.
+      """
+
+      import json
+      import glob
+      import os
+      import sys
+      from datetime import datetime
+      from typing import Dict, List, Any
+
+      def load_filesystem_results(results_dir: str) -> Dict[str, Any]:
+          """Load results from all filesystem configurations"""
+          fs_results = {}
+
+          # Find all filesystem configuration directories
+          fs_dirs = [d for d in os.listdir(results_dir)
+                    if os.path.isdir(os.path.join(results_dir, d)) and d != 'comparison']
+
+          for fs_name in fs_dirs:
+              fs_path = os.path.join(results_dir, fs_name)
+
+              # Load configuration
+              config_file = os.path.join(fs_path, 'filesystem_config.txt')
+              config_info = {}
+              if os.path.exists(config_file):
+                  with open(config_file, 'r') as f:
+                      config_info['config_text'] = f.read()
+
+              # Load benchmark results
+              result_files = glob.glob(os.path.join(fs_path, 'results_*.json'))
+              benchmark_results = []
+
+              for result_file in result_files:
+                  try:
+                      with open(result_file, 'r') as f:
+                          data = json.load(f)
+                          benchmark_results.append(data)
+                  except Exception as e:
+                      print(f"Error loading {result_file}: {e}")
+
+              fs_results[fs_name] = {
+                  'config': config_info,
+                  'results': benchmark_results,
+                  'path': fs_path
+              }
+
+          return fs_results
+
+      def generate_comparison_report(fs_results: Dict[str, Any], output_dir: str):
+          """Generate HTML comparison report"""
+          html = []
+
+          # HTML header
+          html.append("<!DOCTYPE html>")
+          html.append("<html lang='en'>")
+          html.append("<head>")
+          html.append("    <meta charset='UTF-8'>")
+          html.append("    <title>AI Multi-Filesystem Benchmark Comparison</title>")
+          html.append("    <style>")
+          html.append("        body { font-family: Arial, sans-serif; margin: 20px; }")
+          html.append("        .header { background-color: #f0f8ff; padding: 20px; border-radius: 5px; margin-bottom: 20px; }")
+          html.append("        .fs-section { margin-bottom: 30px; border: 1px solid #ddd; padding: 15px; border-radius: 5px; }")
+          html.append("        .comparison-table { width: 100%; border-collapse: collapse; margin: 20px 0; }")
+          html.append("        .comparison-table th, .comparison-table td { border: 1px solid #ddd; padding: 8px; text-align: left; }")
+          html.append("        .comparison-table th { background-color: #f2f2f2; }")
+          html.append("        .metric-best { background-color: #d4edda; font-weight: bold; }")
+          html.append("        .metric-worst { background-color: #f8d7da; }")
+          html.append("        .chart-container { margin: 20px 0; padding: 15px; background-color: #f9f9f9; border-radius: 5px; }")
+          html.append("    </style>")
+          html.append("</head>")
+          html.append("<body>")
+
+          # Report header
+          html.append("    <div class='header'>")
+          html.append("        <h1>🗂️ AI Multi-Filesystem Benchmark Comparison</h1>")
+          html.append(f"        <p><strong>Generated:</strong> {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}</p>")
+          html.append(f"        <p><strong>Filesystem Configurations Tested:</strong> {len(fs_results)}</p>")
+          html.append("    </div>")
+
+          # Performance comparison table
+          html.append("    <h2>📊 Performance Comparison Summary</h2>")
+          html.append("    <table class='comparison-table'>")
+          html.append("        <tr>")
+          html.append("            <th>Filesystem</th>")
+          html.append("            <th>Avg Insert Rate (vectors/sec)</th>")
+          html.append("            <th>Avg Index Time (sec)</th>")
+          html.append("            <th>Avg Query QPS (Top-10, Batch-1)</th>")
+          html.append("            <th>Avg Query Latency (ms)</th>")
+          html.append("        </tr>")
+
+          # Calculate metrics for comparison
+          fs_metrics = {}
+          for fs_name, fs_data in fs_results.items():
+              if not fs_data['results']:
+                  continue
+
+              # Calculate averages across all iterations
+              insert_rates = []
+              index_times = []
+              query_qps = []
+              query_latencies = []
+
+              for result in fs_data['results']:
+                  if 'insert_performance' in result:
+                      insert_rates.append(result['insert_performance'].get('vectors_per_second', 0))
+
+                  if 'index_performance' in result:
+                      index_times.append(result['index_performance'].get('creation_time_seconds', 0))
+
+                  if 'query_performance' in result:
+                      qp = result['query_performance']
+                      if 'topk_10' in qp and 'batch_1' in qp['topk_10']:
+                          batch_data = qp['topk_10']['batch_1']
+                          query_qps.append(batch_data.get('queries_per_second', 0))
+                          query_latencies.append(batch_data.get('average_time_seconds', 0) * 1000)
+
+              fs_metrics[fs_name] = {
+                  'insert_rate': sum(insert_rates) / len(insert_rates) if insert_rates else 0,
+                  'index_time': sum(index_times) / len(index_times) if index_times else 0,
+                  'query_qps': sum(query_qps) / len(query_qps) if query_qps else 0,
+                  'query_latency': sum(query_latencies) / len(query_latencies) if query_latencies else 0
+              }
+
+          # Find best/worst for highlighting
+          if fs_metrics:
+              best_insert = max(fs_metrics.keys(), key=lambda x: fs_metrics[x]['insert_rate'])
+              best_index = min(fs_metrics.keys(), key=lambda x: fs_metrics[x]['index_time'])
+              best_qps = max(fs_metrics.keys(), key=lambda x: fs_metrics[x]['query_qps'])
+              best_latency = min(fs_metrics.keys(), key=lambda x: fs_metrics[x]['query_latency'])
+
+              worst_insert = min(fs_metrics.keys(), key=lambda x: fs_metrics[x]['insert_rate'])
+              worst_index = max(fs_metrics.keys(), key=lambda x: fs_metrics[x]['index_time'])
+              worst_qps = min(fs_metrics.keys(), key=lambda x: fs_metrics[x]['query_qps'])
+              worst_latency = max(fs_metrics.keys(), key=lambda x: fs_metrics[x]['query_latency'])
+
+          # Generate comparison rows
+          for fs_name, metrics in fs_metrics.items():
+              html.append("        <tr>")
+              html.append(f"            <td><strong>{fs_name}</strong></td>")
+
+              # Insert rate
+              cell_class = ""
+              if fs_name == best_insert:
+                  cell_class = "metric-best"
+              elif fs_name == worst_insert:
+                  cell_class = "metric-worst"
+              html.append(f"            <td class='{cell_class}'>{metrics['insert_rate']:.2f}</td>")
+
+              # Index time
+              cell_class = ""
+              if fs_name == best_index:
+                  cell_class = "metric-best"
+              elif fs_name == worst_index:
+                  cell_class = "metric-worst"
+              html.append(f"            <td class='{cell_class}'>{metrics['index_time']:.2f}</td>")
+
+              # Query QPS
+              cell_class = ""
+              if fs_name == best_qps:
+                  cell_class = "metric-best"
+              elif fs_name == worst_qps:
+                  cell_class = "metric-worst"
+              html.append(f"            <td class='{cell_class}'>{metrics['query_qps']:.2f}</td>")
+
+              # Query latency
+              cell_class = ""
+              if fs_name == best_latency:
+                  cell_class = "metric-best"
+              elif fs_name == worst_latency:
+                  cell_class = "metric-worst"
+              html.append(f"            <td class='{cell_class}'>{metrics['query_latency']:.2f}</td>")
+
+              html.append("        </tr>")
+
+          html.append("    </table>")
+
+          # Individual filesystem details
+          html.append("    <h2>📁 Individual Filesystem Details</h2>")
+          for fs_name, fs_data in fs_results.items():
+              html.append(f"    <div class='fs-section'>")
+              html.append(f"        <h3>{fs_name}</h3>")
+
+              if 'config_text' in fs_data['config']:
+                  html.append("        <h4>Configuration:</h4>")
+                  html.append("        <pre>" + fs_data['config']['config_text'][:500] + "</pre>")
+
+              html.append(f"        <p><strong>Benchmark Iterations:</strong> {len(fs_data['results'])}</p>")
+
+              if fs_name in fs_metrics:
+                  metrics = fs_metrics[fs_name]
+                  html.append("        <table class='comparison-table'>")
+                  html.append("            <tr><th>Metric</th><th>Value</th></tr>")
+                  html.append(f"            <tr><td>Average Insert Rate</td><td>{metrics['insert_rate']:.2f} vectors/sec</td></tr>")
+                  html.append(f"            <tr><td>Average Index Time</td><td>{metrics['index_time']:.2f} seconds</td></tr>")
+                  html.append(f"            <tr><td>Average Query QPS</td><td>{metrics['query_qps']:.2f}</td></tr>")
+                  html.append(f"            <tr><td>Average Query Latency</td><td>{metrics['query_latency']:.2f} ms</td></tr>")
+                  html.append("        </table>")
+
+              html.append("    </div>")
+
+          # Footer
+          html.append("    <div style='margin-top: 40px; padding: 20px; background-color: #f8f9fa; border-radius: 5px;'>")
+          html.append("        <h3>📝 Analysis Notes</h3>")
+          html.append("        <ul>")
+          html.append("            <li>Green highlighting indicates the best performing filesystem for each metric</li>")
+          html.append("            <li>Red highlighting indicates the worst performing filesystem for each metric</li>")
+          html.append("            <li>Results are averaged across all benchmark iterations for each filesystem</li>")
+          html.append("            <li>Performance can vary based on hardware, kernel version, and workload characteristics</li>")
+          html.append("        </ul>")
+          html.append("    </div>")
+
+          html.append("</body>")
+          html.append("</html>")
+
+          # Write HTML report
+          report_file = os.path.join(output_dir, "multi_filesystem_comparison.html")
+          with open(report_file, 'w') as f:
+              f.write("\n".join(html))
+
+          print(f"Multi-filesystem comparison report generated: {report_file}")
+
+          # Generate JSON summary
+          summary_data = {
+              'generation_time': datetime.now().isoformat(),
+              'filesystem_count': len(fs_results),
+              'metrics_summary': fs_metrics,
+              'raw_results': {fs: data['results'] for fs, data in fs_results.items()}
+          }
+
+          summary_file = os.path.join(output_dir, "multi_filesystem_summary.json")
+          with open(summary_file, 'w') as f:
+              json.dump(summary_data, f, indent=2)
+
+          print(f"Multi-filesystem summary data: {summary_file}")
+
+      def main():
+          results_dir = "{{ ai_multifs_results_dir }}"
+          comparison_dir = os.path.join(results_dir, "comparison")
+          os.makedirs(comparison_dir, exist_ok=True)
+
+          print("Loading filesystem results...")
+          fs_results = load_filesystem_results(results_dir)
+
+          if not fs_results:
+              print("No filesystem results found!")
+              return 1
+
+          print(f"Found results for {len(fs_results)} filesystem configurations")
+          print("Generating comparison report...")
+
+          generate_comparison_report(fs_results, comparison_dir)
+
+          print("Multi-filesystem comparison completed!")
+          return 0
+
+      if __name__ == "__main__":
+          sys.exit(main())
+    dest: "{{ ai_multifs_results_dir }}/generate_comparison.py"
+    mode: '0755'
+
+- name: Run multi-filesystem comparison analysis
+  command: python3 {{ ai_multifs_results_dir }}/generate_comparison.py
+  register: comparison_result
+
+- name: Display comparison completion message
+  debug:
+    msg: |
+      Multi-filesystem comparison completed!
+      Comparison report: {{ ai_multifs_results_dir }}/comparison/multi_filesystem_comparison.html
+      Summary data: {{ ai_multifs_results_dir }}/comparison/multi_filesystem_summary.json
diff --git a/playbooks/roles/ai_multifs_run/tasks/main.yml b/playbooks/roles/ai_multifs_run/tasks/main.yml
new file mode 100644
index 00000000..38dbba12
--- /dev/null
+++ b/playbooks/roles/ai_multifs_run/tasks/main.yml
@@ -0,0 +1,23 @@
+---
+- name: Import optional extra_args file
+  include_vars: "{{ item }}"
+  ignore_errors: yes
+  with_items:
+    - "../extra_vars.yaml"
+  tags: vars
+
+- name: Filter enabled filesystem configurations
+  set_fact:
+    enabled_fs_configs: "{{ ai_multifs_configurations | selectattr('enabled', 'equalto', true) | list }}"
+
+- name: Run AI benchmarks on each filesystem configuration
+  include_tasks: run_single_filesystem.yml
+  loop: "{{ enabled_fs_configs }}"
+  loop_control:
+    loop_var: fs_config
+    index_var: fs_index
+  when: enabled_fs_configs | length > 0
+
+- name: Generate multi-filesystem comparison report
+  include_tasks: generate_comparison.yml
+  when: enabled_fs_configs | length > 1
diff --git a/playbooks/roles/ai_multifs_run/tasks/run_single_filesystem.yml b/playbooks/roles/ai_multifs_run/tasks/run_single_filesystem.yml
new file mode 100644
index 00000000..fd194550
--- /dev/null
+++ b/playbooks/roles/ai_multifs_run/tasks/run_single_filesystem.yml
@@ -0,0 +1,104 @@
+---
+- name: Display current filesystem configuration
+  debug:
+    msg: "Testing filesystem configuration {{ fs_index + 1 }}/{{ enabled_fs_configs | length }}: {{ fs_config.name }}"
+
+- name: Unmount filesystem if mounted
+  mount:
+    path: "{{ ai_multifs_mount_point }}"
+    state: unmounted
+  ignore_errors: yes
+
+- name: Create filesystem with specific configuration
+  shell: "{{ fs_config.mkfs_cmd }} {{ ai_multifs_device }}"
+  register: mkfs_result
+
+- name: Display mkfs output
+  debug:
+    msg: "mkfs output: {{ mkfs_result.stdout }}"
+  when: mkfs_result.stdout != ""
+
+- name: Mount filesystem with specific options
+  mount:
+    path: "{{ ai_multifs_mount_point }}"
+    src: "{{ ai_multifs_device }}"
+    fstype: "{{ fs_config.filesystem }}"
+    opts: "{{ fs_config.mount_opts }}"
+    state: mounted
+
+- name: Create filesystem-specific results directory
+  file:
+    path: "{{ ai_multifs_results_dir }}/{{ fs_config.name }}"
+    state: directory
+    mode: '0755'
+
+- name: Update AI benchmark configuration for current filesystem
+  set_fact:
+    current_fs_benchmark_dir: "{{ ai_multifs_mount_point }}/ai-benchmark-data"
+    current_fs_results_dir: "{{ ai_multifs_results_dir }}/{{ fs_config.name }}"
+
+- name: Create AI benchmark data directory on current filesystem
+  file:
+    path: "{{ current_fs_benchmark_dir }}"
+    state: directory
+    mode: '0755'
+
+- name: Generate AI benchmark configuration for current filesystem
+  template:
+    src: milvus_config.json.j2
+    dest: "{{ current_fs_results_dir }}/milvus_config.json"
+    mode: '0644'
+
+- name: Run AI benchmark on current filesystem
+  shell: |
+    cd {{ current_fs_benchmark_dir }}
+    python3 {{ playbook_dir }}/roles/ai_run_benchmarks/files/milvus_benchmark.py \
+      --config {{ current_fs_results_dir }}/milvus_config.json \
+      --output {{ current_fs_results_dir }}/results_{{ fs_config.name }}_$(date +%Y%m%d_%H%M%S).json
+  register: benchmark_result
+  async: 7200  # 2 hour timeout
+  poll: 30
+
+- name: Display benchmark completion
+  debug:
+    msg: "Benchmark completed for {{ fs_config.name }}: {{ benchmark_result.stdout_lines[-5:] | default(['No output']) }}"
+
+- name: Record filesystem configuration metadata
+  copy:
+    content: |
+      # Filesystem Configuration: {{ fs_config.name }}
+      Filesystem Type: {{ fs_config.filesystem }}
+      mkfs Command: {{ fs_config.mkfs_cmd }}
+      Mount Options: {{ fs_config.mount_opts }}
+      Device: {{ ai_multifs_device }}
+      Mount Point: {{ ai_multifs_mount_point }}
+      Data Directory: {{ current_fs_benchmark_dir }}
+      Results Directory: {{ current_fs_results_dir }}
+      Test Start Time: {{ ansible_date_time.iso8601 }}
+
+      mkfs Output:
+      {{ mkfs_result.stdout }}
+      {{ mkfs_result.stderr }}
+    dest: "{{ current_fs_results_dir }}/filesystem_config.txt"
+    mode: '0644'
+
+- name: Capture filesystem statistics after benchmark
+  shell: |
+    echo "=== Filesystem Usage ===" > {{ current_fs_results_dir }}/filesystem_stats.txt
+    df -h {{ ai_multifs_mount_point }} >> {{ current_fs_results_dir }}/filesystem_stats.txt
+    echo "" >> {{ current_fs_results_dir }}/filesystem_stats.txt
+    echo "=== Filesystem Info ===" >> {{ current_fs_results_dir }}/filesystem_stats.txt
+    {% if fs_config.filesystem == 'xfs' %}
+    xfs_info {{ ai_multifs_mount_point }} >> {{ current_fs_results_dir }}/filesystem_stats.txt 2>&1 || true
+    {% elif fs_config.filesystem == 'ext4' %}
+    tune2fs -l {{ ai_multifs_device }} >> {{ current_fs_results_dir }}/filesystem_stats.txt 2>&1 || true
+    {% elif fs_config.filesystem == 'btrfs' %}
+    btrfs filesystem show {{ ai_multifs_mount_point }} >> {{ current_fs_results_dir }}/filesystem_stats.txt 2>&1 || true
+    btrfs filesystem usage {{ ai_multifs_mount_point }} >> {{ current_fs_results_dir }}/filesystem_stats.txt 2>&1 || true
+    {% endif %}
+  ignore_errors: yes
+
+- name: Unmount filesystem after benchmark
+  mount:
+    path: "{{ ai_multifs_mount_point }}"
+    state: unmounted
diff --git a/playbooks/roles/ai_multifs_run/templates/milvus_config.json.j2 b/playbooks/roles/ai_multifs_run/templates/milvus_config.json.j2
new file mode 100644
index 00000000..6216bf46
--- /dev/null
+++ b/playbooks/roles/ai_multifs_run/templates/milvus_config.json.j2
@@ -0,0 +1,42 @@
+{
+    "milvus": {
+        "host": "{{ ai_milvus_host }}",
+        "port": {{ ai_milvus_port }},
+        "database_name": "{{ ai_milvus_database_name }}_{{ fs_config.name }}"
+    },
+    "benchmark": {
+        "vector_dataset_size": {{ ai_vector_dataset_size }},
+        "vector_dimensions": {{ ai_vector_dimensions }},
+        "index_type": "{{ ai_index_type }}",
+        "iterations": {{ ai_benchmark_iterations }},
+        "runtime_seconds": {{ ai_benchmark_runtime }},
+        "warmup_seconds": {{ ai_benchmark_warmup_time }},
+        "query_patterns": {
+            "topk_1": {{ ai_benchmark_query_topk_1 | lower }},
+            "topk_10": {{ ai_benchmark_query_topk_10 | lower }},
+            "topk_100": {{ ai_benchmark_query_topk_100 | lower }}
+        },
+        "batch_sizes": {
+            "batch_1": {{ ai_benchmark_batch_1 | lower }},
+            "batch_10": {{ ai_benchmark_batch_10 | lower }},
+            "batch_100": {{ ai_benchmark_batch_100 | lower }}
+        }
+    },
+    "index_params": {
+{% if ai_index_type == "HNSW" %}
+        "M": {{ ai_index_hnsw_m }},
+        "efConstruction": {{ ai_index_hnsw_ef_construction }},
+        "ef": {{ ai_index_hnsw_ef }}
+{% elif ai_index_type == "IVF_FLAT" %}
+        "nlist": {{ ai_index_ivf_nlist }},
+        "nprobe": {{ ai_index_ivf_nprobe }}
+{% endif %}
+    },
+    "filesystem": {
+        "name": "{{ fs_config.name }}",
+        "type": "{{ fs_config.filesystem }}",
+        "mkfs_cmd": "{{ fs_config.mkfs_cmd }}",
+        "mount_opts": "{{ fs_config.mount_opts }}",
+        "data_directory": "{{ current_fs_benchmark_dir }}"
+    }
+}
diff --git a/playbooks/roles/ai_multifs_setup/defaults/main.yml b/playbooks/roles/ai_multifs_setup/defaults/main.yml
new file mode 100644
index 00000000..c35d179f
--- /dev/null
+++ b/playbooks/roles/ai_multifs_setup/defaults/main.yml
@@ -0,0 +1,49 @@
+---
+# Default values for AI multi-filesystem testing
+ai_multifs_results_dir: "/data/ai-multifs-benchmark"
+ai_multifs_device: "/dev/vdb"
+ai_multifs_mount_point: "/mnt/ai-multifs-test"
+
+# Filesystem configurations to test
+ai_multifs_configurations:
+  - name: "xfs_4k_4ks"
+    filesystem: "xfs"
+    mkfs_cmd: "mkfs.xfs -f -s size=4096 -b size=4096"
+    mount_opts: "rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota"
+    enabled: "{{ ai_multifs_test_xfs and ai_multifs_xfs_4k_4ks }}"
+
+  - name: "xfs_16k_4ks"
+    filesystem: "xfs"
+    mkfs_cmd: "mkfs.xfs -f -s size=4096 -b size=16384"
+    mount_opts: "rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota"
+    enabled: "{{ ai_multifs_test_xfs and ai_multifs_xfs_16k_4ks }}"
+
+  - name: "xfs_32k_4ks"
+    filesystem: "xfs"
+    mkfs_cmd: "mkfs.xfs -f -s size=4096 -b size=32768"
+    mount_opts: "rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota"
+    enabled: "{{ ai_multifs_test_xfs and ai_multifs_xfs_32k_4ks }}"
+
+  - name: "xfs_64k_4ks"
+    filesystem: "xfs"
+    mkfs_cmd: "mkfs.xfs -f -s size=4096 -b size=65536"
+    mount_opts: "rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota"
+    enabled: "{{ ai_multifs_test_xfs and ai_multifs_xfs_64k_4ks }}"
+
+  - name: "ext4_4k"
+    filesystem: "ext4"
+    mkfs_cmd: "mkfs.ext4 -F -b 4096"
+    mount_opts: "rw,relatime,data=ordered"
+    enabled: "{{ ai_multifs_test_ext4 and ai_multifs_ext4_4k }}"
+
+  - name: "ext4_16k_bigalloc"
+    filesystem: "ext4"
+    mkfs_cmd: "mkfs.ext4 -F -b 4096 -C 16384"
+    mount_opts: "rw,relatime,data=ordered"
+    enabled: "{{ ai_multifs_test_ext4 and ai_multifs_ext4_16k_bigalloc }}"
+
+  - name: "btrfs_default"
+    filesystem: "btrfs"
+    mkfs_cmd: "mkfs.btrfs -f"
+    mount_opts: "rw,relatime,space_cache=v2,discard=async"
+    enabled: "{{ ai_multifs_test_btrfs and ai_multifs_btrfs_default }}"
diff --git a/playbooks/roles/ai_multifs_setup/tasks/main.yml b/playbooks/roles/ai_multifs_setup/tasks/main.yml
new file mode 100644
index 00000000..28f3ec40
--- /dev/null
+++ b/playbooks/roles/ai_multifs_setup/tasks/main.yml
@@ -0,0 +1,70 @@
+---
+- name: Import optional extra_args file
+  include_vars: "{{ item }}"
+  ignore_errors: yes
+  with_items:
+    - "../extra_vars.yaml"
+  tags: vars
+
+- name: Create multi-filesystem results directory
+  file:
+    path: "{{ ai_multifs_results_dir }}"
+    state: directory
+    mode: '0755'
+
+- name: Create mount point directory
+  file:
+    path: "{{ ai_multifs_mount_point }}"
+    state: directory
+    mode: '0755'
+
+- name: Unmount any existing filesystem on mount point
+  mount:
+    path: "{{ ai_multifs_mount_point }}"
+    state: unmounted
+  ignore_errors: yes
+
+- name: Install required filesystem utilities
+  package:
+    name:
+      - xfsprogs
+      - e2fsprogs
+      - btrfs-progs
+    state: present
+
+- name: Filter enabled filesystem configurations
+  set_fact:
+    enabled_fs_configs: "{{ ai_multifs_configurations | selectattr('enabled', 'equalto', true) | list }}"
+
+- name: Display enabled filesystem configurations
+  debug:
+    msg: "Will test {{ enabled_fs_configs | length }} filesystem configurations: {{ enabled_fs_configs | map(attribute='name') | list }}"
+
+- name: Validate that device exists
+  stat:
+    path: "{{ ai_multifs_device }}"
+  register: device_stat
+  failed_when: not device_stat.stat.exists
+
+- name: Display device information
+  debug:
+    msg: "Using device {{ ai_multifs_device }} for multi-filesystem testing"
+
+- name: Create filesystem configuration summary
+  copy:
+    content: |
+      # AI Multi-Filesystem Testing Configuration
+      Generated: {{ ansible_date_time.iso8601 }}
+      Device: {{ ai_multifs_device }}
+      Mount Point: {{ ai_multifs_mount_point }}
+      Results Directory: {{ ai_multifs_results_dir }}
+
+      Enabled Filesystem Configurations:
+      {% for config in enabled_fs_configs %}
+      - {{ config.name }}:
+          Filesystem: {{ config.filesystem }}
+          mkfs command: {{ config.mkfs_cmd }}
+          Mount options: {{ config.mount_opts }}
+      {% endfor %}
+    dest: "{{ ai_multifs_results_dir }}/test_configuration.txt"
+    mode: '0644'
diff --git a/playbooks/roles/ai_run_benchmarks/files/milvus_benchmark.py b/playbooks/roles/ai_run_benchmarks/files/milvus_benchmark.py
index 4ce14fb7..2aaa54ba 100644
--- a/playbooks/roles/ai_run_benchmarks/files/milvus_benchmark.py
+++ b/playbooks/roles/ai_run_benchmarks/files/milvus_benchmark.py
@@ -54,67 +54,83 @@ class MilvusBenchmark:
         )
         self.logger = logging.getLogger(__name__)
 
-    def get_filesystem_info(self, path: str = "/data") -> Dict[str, str]:
+    def get_filesystem_info(self, path: str = "/data/milvus") -> Dict[str, str]:
         """Detect filesystem type for the given path"""
-        try:
-            # Use df -T to get filesystem type
-            result = subprocess.run(
-                ["df", "-T", path], capture_output=True, text=True, check=True
-            )
-
-            lines = result.stdout.strip().split("\n")
-            if len(lines) >= 2:
-                # Second line contains the filesystem info
-                # Format: Filesystem Type 1K-blocks Used Available Use% Mounted on
-                parts = lines[1].split()
-                if len(parts) >= 2:
-                    filesystem_type = parts[1]
-                    mount_point = parts[-1] if len(parts) >= 7 else path
+        # Try primary path first, fallback to /data for backwards compatibility
+        paths_to_try = [path]
+        if path != "/data" and not os.path.exists(path):
+            paths_to_try.append("/data")
+
+        for check_path in paths_to_try:
+            try:
+                # Use df -T to get filesystem type
+                result = subprocess.run(
+                    ["df", "-T", check_path], capture_output=True, text=True, check=True
+                )
+
+                lines = result.stdout.strip().split("\n")
+                if len(lines) >= 2:
+                    # Second line contains the filesystem info
+                    # Format: Filesystem Type 1K-blocks Used Available Use% Mounted on
+                    parts = lines[1].split()
+                    if len(parts) >= 2:
+                        filesystem_type = parts[1]
+                        mount_point = parts[-1] if len(parts) >= 7 else check_path
+
+                        return {
+                            "filesystem": filesystem_type,
+                            "mount_point": mount_point,
+                            "data_path": check_path,
+                        }
+            except subprocess.CalledProcessError as e:
+                self.logger.warning(
+                    f"Failed to detect filesystem for {check_path}: {e}"
+                )
+                continue
+            except Exception as e:
+                self.logger.warning(f"Error detecting filesystem for {check_path}: {e}")
+                continue
 
+        # Fallback: try to detect from /proc/mounts
+        for check_path in paths_to_try:
+            try:
+                with open("/proc/mounts", "r") as f:
+                    mounts = f.readlines()
+
+                # Find the mount that contains our path
+                best_match = ""
+                best_fs = "unknown"
+
+                for line in mounts:
+                    parts = line.strip().split()
+                    if len(parts) >= 3:
+                        mount_point = parts[1]
+                        fs_type = parts[2]
+
+                        # Check if this mount point is a prefix of our path
+                        if check_path.startswith(mount_point) and len(
+                            mount_point
+                        ) > len(best_match):
+                            best_match = mount_point
+                            best_fs = fs_type
+
+                if best_fs != "unknown":
                     return {
-                        "filesystem": filesystem_type,
-                        "mount_point": mount_point,
-                        "data_path": path,
+                        "filesystem": best_fs,
+                        "mount_point": best_match,
+                        "data_path": check_path,
                     }
-        except subprocess.CalledProcessError as e:
-            self.logger.warning(f"Failed to detect filesystem for {path}: {e}")
-        except Exception as e:
-            self.logger.warning(f"Error detecting filesystem for {path}: {e}")
 
-        # Fallback: try to detect from /proc/mounts
-        try:
-            with open("/proc/mounts", "r") as f:
-                mounts = f.readlines()
-
-            # Find the mount that contains our path
-            best_match = ""
-            best_fs = "unknown"
-
-            for line in mounts:
-                parts = line.strip().split()
-                if len(parts) >= 3:
-                    mount_point = parts[1]
-                    fs_type = parts[2]
-
-                    # Check if this mount point is a prefix of our path
-                    if path.startswith(mount_point) and len(mount_point) > len(
-                        best_match
-                    ):
-                        best_match = mount_point
-                        best_fs = fs_type
-
-            if best_fs != "unknown":
-                return {
-                    "filesystem": best_fs,
-                    "mount_point": best_match,
-                    "data_path": path,
-                }
-
-        except Exception as e:
-            self.logger.warning(f"Error reading /proc/mounts: {e}")
+            except Exception as e:
+                self.logger.warning(f"Error reading /proc/mounts for {check_path}: {e}")
+                continue
 
         # Final fallback
-        return {"filesystem": "unknown", "mount_point": "/", "data_path": path}
+        return {
+            "filesystem": "unknown",
+            "mount_point": "/",
+            "data_path": paths_to_try[0],
+        }
 
     def connect_to_milvus(self) -> bool:
         """Connect to Milvus server"""
@@ -440,13 +456,47 @@ class MilvusBenchmark:
         """Run complete benchmark suite"""
         self.logger.info("Starting Milvus benchmark suite...")
 
-        # Detect filesystem information
-        fs_info = self.get_filesystem_info("/data")
+        # Detect filesystem information - Milvus data path first
+        milvus_data_path = "/data/milvus"
+        if os.path.exists(milvus_data_path):
+            # Multi-fs mode: Milvus data is on dedicated filesystem
+            fs_info = self.get_filesystem_info(milvus_data_path)
+            self.logger.info(
+                f"Multi-filesystem mode: Using {milvus_data_path} for filesystem detection"
+            )
+        else:
+            # Single-fs mode: fallback to /data
+            fs_info = self.get_filesystem_info("/data")
+            self.logger.info(
+                f"Single-filesystem mode: Using /data for filesystem detection"
+            )
+
         self.results["system_info"] = fs_info
+        
+        # Add kernel version and hostname to system info
+        try:
+            import socket
+            
+            # Get hostname
+            self.results["system_info"]["hostname"] = socket.gethostname()
+            
+            # Get kernel version using uname -r
+            kernel_result = subprocess.run(['uname', '-r'], capture_output=True, text=True, check=True)
+            self.results["system_info"]["kernel_version"] = kernel_result.stdout.strip()
+            
+            self.logger.info(
+                f"System info: hostname={self.results['system_info']['hostname']}, "
+                f"kernel={self.results['system_info']['kernel_version']}"
+            )
+        except Exception as e:
+            self.logger.warning(f"Could not collect kernel info: {e}")
+            self.results["system_info"]["kernel_version"] = "unknown"
+            self.results["system_info"]["hostname"] = "unknown"
+        
         # Also add filesystem at top level for compatibility with existing graphs
         self.results["filesystem"] = fs_info["filesystem"]
         self.logger.info(
-            f"Detected filesystem: {fs_info['filesystem']} at {fs_info['mount_point']}"
+            f"Detected filesystem: {fs_info['filesystem']} at {fs_info['mount_point']} (data path: {fs_info['data_path']})"
         )
 
         if not self.connect_to_milvus():
diff --git a/playbooks/roles/gen_hosts/tasks/main.yml b/playbooks/roles/gen_hosts/tasks/main.yml
index 4b35d9f6..d36790b0 100644
--- a/playbooks/roles/gen_hosts/tasks/main.yml
+++ b/playbooks/roles/gen_hosts/tasks/main.yml
@@ -381,6 +381,25 @@
     - workflows_reboot_limit
     - ansible_hosts_template.stat.exists
 
+- name: Load AI nodes configuration for multi-filesystem setup
+  include_vars:
+    file: "{{ topdir_path }}/{{ kdevops_nodes }}"
+    name: guestfs_nodes
+  when:
+    - kdevops_workflows_dedicated_workflow
+    - kdevops_workflow_enable_ai
+    - ai_enable_multifs_testing|default(false)|bool
+    - ansible_hosts_template.stat.exists
+
+- name: Extract AI node names for multi-filesystem setup
+  set_fact:
+    all_generic_nodes: "{{ guestfs_nodes.guestfs_nodes | map(attribute='name') | list }}"
+  when:
+    - kdevops_workflows_dedicated_workflow
+    - kdevops_workflow_enable_ai
+    - ai_enable_multifs_testing|default(false)|bool
+    - guestfs_nodes is defined
+
 - name: Generate the Ansible hosts file for a dedicated AI setup
   tags: ['hosts']
   ansible.builtin.template:
diff --git a/playbooks/roles/gen_hosts/templates/fstests.j2 b/playbooks/roles/gen_hosts/templates/fstests.j2
index ac086c6e..32d90abf 100644
--- a/playbooks/roles/gen_hosts/templates/fstests.j2
+++ b/playbooks/roles/gen_hosts/templates/fstests.j2
@@ -70,6 +70,7 @@ ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
 [krb5:vars]
 ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
 {% endif %}
+{% if kdevops_enable_iscsi or kdevops_nfsd_enable or kdevops_smbd_enable or kdevops_krb5_enable %}
 [service]
 {% if kdevops_enable_iscsi %}
 {{ kdevops_hosts_prefix }}-iscsi
@@ -85,3 +86,4 @@ ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
 {% endif %}
 [service:vars]
 ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
+{% endif %}
diff --git a/playbooks/roles/gen_hosts/templates/gitr.j2 b/playbooks/roles/gen_hosts/templates/gitr.j2
index 7f9094d4..3f30a5fb 100644
--- a/playbooks/roles/gen_hosts/templates/gitr.j2
+++ b/playbooks/roles/gen_hosts/templates/gitr.j2
@@ -38,6 +38,7 @@ ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
 [nfsd:vars]
 ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
 {% endif %}
+{% if kdevops_enable_iscsi or kdevops_nfsd_enable %}
 [service]
 {% if kdevops_enable_iscsi %}
 {{ kdevops_hosts_prefix }}-iscsi
@@ -47,3 +48,4 @@ ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
 {% endif %}
 [service:vars]
 ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
+{% endif %}
diff --git a/playbooks/roles/gen_hosts/templates/hosts.j2 b/playbooks/roles/gen_hosts/templates/hosts.j2
index cdcd1883..e9441605 100644
--- a/playbooks/roles/gen_hosts/templates/hosts.j2
+++ b/playbooks/roles/gen_hosts/templates/hosts.j2
@@ -119,39 +119,30 @@ ansible_python_interpreter = "{{ kdevops_python_interpreter }}"
 [ai:vars]
 ansible_python_interpreter = "{{ kdevops_python_interpreter }}"
 
-{% set fs_configs = [] %}
+{# Individual section groups for multi-filesystem testing #}
+{% set section_names = [] %}
 {% for node in all_generic_nodes %}
-{% set node_parts = node.split('-') %}
-{% if node_parts|length >= 3 %}
-{% set fs_type = node_parts[2] %}
-{% set fs_config = node_parts[3:] | select('ne', 'dev') | join('_') %}
-{% set fs_group = fs_type + '_' + fs_config if fs_config else fs_type %}
-{% if fs_group not in fs_configs %}
-{% set _ = fs_configs.append(fs_group) %}
+{% if not node.endswith('-dev') %}
+{% set section = node.replace(kdevops_host_prefix + '-ai-', '') %}
+{% if section != kdevops_host_prefix + '-ai' %}
+{% if section_names.append(section) %}{% endif %}
 {% endif %}
 {% endif %}
 {% endfor %}
 
-{% for fs_group in fs_configs %}
-[ai_{{ fs_group }}]
-{% for node in all_generic_nodes %}
-{% set node_parts = node.split('-') %}
-{% if node_parts|length >= 3 %}
-{% set fs_type = node_parts[2] %}
-{% set fs_config = node_parts[3:] | select('ne', 'dev') | join('_') %}
-{% set node_fs_group = fs_type + '_' + fs_config if fs_config else fs_type %}
-{% if node_fs_group == fs_group %}
-{{ node }}
-{% endif %}
+{% for section in section_names %}
+[ai_{{ section | replace('-', '_') }}]
+{{ kdevops_host_prefix }}-ai-{{ section }}
+{% if kdevops_baseline_and_dev %}
+{{ kdevops_host_prefix }}-ai-{{ section }}-dev
 {% endif %}
-{% endfor %}
 
-[ai_{{ fs_group }}:vars]
+[ai_{{ section | replace('-', '_') }}:vars]
 ansible_python_interpreter = "{{ kdevops_python_interpreter }}"
 
 {% endfor %}
 {% else %}
-{# Single-node AI hosts #}
+{# Single filesystem hosts (original behavior) #}
 [all]
 localhost ansible_connection=local
 {{ kdevops_host_prefix }}-ai
diff --git a/playbooks/roles/gen_hosts/templates/nfstest.j2 b/playbooks/roles/gen_hosts/templates/nfstest.j2
index e427ac34..709d871d 100644
--- a/playbooks/roles/gen_hosts/templates/nfstest.j2
+++ b/playbooks/roles/gen_hosts/templates/nfstest.j2
@@ -38,6 +38,7 @@ ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
 [nfsd:vars]
 ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
 {% endif %}
+{% if kdevops_enable_iscsi or kdevops_nfsd_enable %}
 [service]
 {% if kdevops_enable_iscsi %}
 {{ kdevops_hosts_prefix }}-iscsi
@@ -47,3 +48,4 @@ ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
 {% endif %}
 [service:vars]
 ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
+{% endif %}
diff --git a/playbooks/roles/gen_hosts/templates/pynfs.j2 b/playbooks/roles/gen_hosts/templates/pynfs.j2
index 85c87dae..55add4d1 100644
--- a/playbooks/roles/gen_hosts/templates/pynfs.j2
+++ b/playbooks/roles/gen_hosts/templates/pynfs.j2
@@ -23,6 +23,7 @@ ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
 {{ kdevops_hosts_prefix }}-nfsd
 [nfsd:vars]
 ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
+{% if true %}
 [service]
 {% if kdevops_enable_iscsi %}
 {{ kdevops_hosts_prefix }}-iscsi
@@ -30,3 +31,4 @@ ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
 {{ kdevops_hosts_prefix }}-nfsd
 [service:vars]
 ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
+{% endif %}
diff --git a/playbooks/roles/gen_nodes/tasks/main.yml b/playbooks/roles/gen_nodes/tasks/main.yml
index d54977be..b294d294 100644
--- a/playbooks/roles/gen_nodes/tasks/main.yml
+++ b/playbooks/roles/gen_nodes/tasks/main.yml
@@ -658,6 +658,7 @@
     - kdevops_workflow_enable_ai
     - ansible_nodes_template.stat.exists
     - not kdevops_baseline_and_dev
+    - not ai_enable_multifs_testing|default(false)|bool
 
 - name: Generate the AI kdevops nodes file with dev hosts using {{ kdevops_nodes_template }} as jinja2 source template
   tags: ['hosts']
@@ -675,6 +676,95 @@
     - kdevops_workflow_enable_ai
     - ansible_nodes_template.stat.exists
     - kdevops_baseline_and_dev
+    - not ai_enable_multifs_testing|default(false)|bool
+
+- name: Infer enabled AI multi-filesystem configurations
+  vars:
+    kdevops_config_data: "{{ lookup('file', topdir_path + '/.config') }}"
+    # Find all enabled AI multifs configurations
+    xfs_configs: >-
+      {{
+        kdevops_config_data | regex_findall('^CONFIG_AI_MULTIFS_XFS_(.*)=y$', multiline=True)
+        | map('lower')
+        | map('regex_replace', '_', '-')
+        | map('regex_replace', '^', 'xfs-')
+        | list
+        if kdevops_config_data | regex_search('^CONFIG_AI_MULTIFS_TEST_XFS=y$', multiline=True)
+        else []
+      }}
+    ext4_configs: >-
+      {{
+        kdevops_config_data | regex_findall('^CONFIG_AI_MULTIFS_EXT4_(.*)=y$', multiline=True)
+        | map('lower')
+        | map('regex_replace', '_', '-')
+        | map('regex_replace', '^', 'ext4-')
+        | list
+        if kdevops_config_data | regex_search('^CONFIG_AI_MULTIFS_TEST_EXT4=y$', multiline=True)
+        else []
+      }}
+    btrfs_configs: >-
+      {{
+        kdevops_config_data | regex_findall('^CONFIG_AI_MULTIFS_BTRFS_(.*)=y$', multiline=True)
+        | map('lower')
+        | map('regex_replace', '_', '-')
+        | map('regex_replace', '^', 'btrfs-')
+        | list
+        if kdevops_config_data | regex_search('^CONFIG_AI_MULTIFS_TEST_BTRFS=y$', multiline=True)
+        else []
+      }}
+  set_fact:
+    ai_multifs_enabled_configs: "{{ (xfs_configs + ext4_configs + btrfs_configs) | unique }}"
+  when:
+    - kdevops_workflows_dedicated_workflow
+    - kdevops_workflow_enable_ai
+    - ai_enable_multifs_testing|default(false)|bool
+    - ansible_nodes_template.stat.exists
+
+- name: Create AI nodes for each filesystem configuration (no dev)
+  vars:
+    filesystem_nodes: "{{ [kdevops_host_prefix + '-ai-'] | product(ai_multifs_enabled_configs | default([])) | map('join') | list }}"
+  set_fact:
+    ai_enabled_section_types: "{{ filesystem_nodes }}"
+  when:
+    - kdevops_workflows_dedicated_workflow
+    - kdevops_workflow_enable_ai
+    - ai_enable_multifs_testing|default(false)|bool
+    - ansible_nodes_template.stat.exists
+    - not kdevops_baseline_and_dev
+    - ai_multifs_enabled_configs is defined
+    - ai_multifs_enabled_configs | length > 0
+
+- name: Create AI nodes for each filesystem configuration with dev hosts
+  vars:
+    filesystem_nodes: "{{ [kdevops_host_prefix + '-ai-'] | product(ai_multifs_enabled_configs | default([])) | map('join') | list }}"
+  set_fact:
+    ai_enabled_section_types: "{{ filesystem_nodes | product(['', '-dev']) | map('join') | list }}"
+  when:
+    - kdevops_workflows_dedicated_workflow
+    - kdevops_workflow_enable_ai
+    - ai_enable_multifs_testing|default(false)|bool
+    - ansible_nodes_template.stat.exists
+    - kdevops_baseline_and_dev
+    - ai_multifs_enabled_configs is defined
+    - ai_multifs_enabled_configs | length > 0
+
+- name: Generate the AI multi-filesystem kdevops nodes file using {{ kdevops_nodes_template }} as jinja2 source template
+  tags: [ 'hosts' ]
+  vars:
+    node_template: "{{ kdevops_nodes_template | basename }}"
+    nodes: "{{ ai_enabled_section_types | regex_replace('\\[') | regex_replace('\\]') | replace(\"'\", '') | split(', ') }}"
+    all_generic_nodes: "{{ ai_enabled_section_types }}"
+  template:
+    src: "{{ node_template }}"
+    dest: "{{ topdir_path }}/{{ kdevops_nodes }}"
+    force: yes
+  when:
+    - kdevops_workflows_dedicated_workflow
+    - kdevops_workflow_enable_ai
+    - ai_enable_multifs_testing|default(false)|bool
+    - ansible_nodes_template.stat.exists
+    - ai_enabled_section_types is defined
+    - ai_enabled_section_types | length > 0
 
 - name: Get the control host's timezone
   ansible.builtin.command: "timedatectl show -p Timezone --value"
diff --git a/playbooks/roles/guestfs/tasks/bringup/main.yml b/playbooks/roles/guestfs/tasks/bringup/main.yml
index c131de25..bd9f5260 100644
--- a/playbooks/roles/guestfs/tasks/bringup/main.yml
+++ b/playbooks/roles/guestfs/tasks/bringup/main.yml
@@ -1,11 +1,16 @@
 ---
 - name: List defined libvirt guests
   run_once: true
+  delegate_to: localhost
   community.libvirt.virt:
     command: list_vms
     uri: "{{ libvirt_uri }}"
   register: defined_vms
 
+- name: Debug defined VMs
+  debug:
+    msg: "Hostname: {{ inventory_hostname }}, Defined VMs: {{ hostvars['localhost']['defined_vms']['list_vms'] | default([]) }}, Check: {{ inventory_hostname not in (hostvars['localhost']['defined_vms']['list_vms'] | default([])) }}"
+
 - name: Provision each target node
   when:
     - "inventory_hostname not in defined_vms.list_vms"
@@ -25,10 +30,13 @@
             path: "{{ ssh_key_dir }}"
             state: directory
             mode: "u=rwx"
+          delegate_to: localhost
 
         - name: Generate fresh keys for each target node
           ansible.builtin.command:
             cmd: 'ssh-keygen -q -t ed25519 -f {{ ssh_key }} -N ""'
+            creates: "{{ ssh_key }}"
+          delegate_to: localhost
 
     - name: Set the pathname of the root disk image for each target node
       ansible.builtin.set_fact:
@@ -38,15 +46,18 @@
       ansible.builtin.file:
         path: "{{ storagedir }}/{{ inventory_hostname }}"
         state: directory
+      delegate_to: localhost
 
     - name: Duplicate the root disk image for each target node
       ansible.builtin.command:
         cmd: "cp --reflink=auto {{ base_image }} {{ root_image }}"
+      delegate_to: localhost
 
     - name: Get the timezone of the control host
       ansible.builtin.command:
         cmd: "timedatectl show -p Timezone --value"
       register: host_timezone
+      delegate_to: localhost
 
     - name: Build the root image for each target node (as root)
       become: true
@@ -103,6 +114,7 @@
         name: "{{ inventory_hostname }}"
         xml: "{{ lookup('file', xml_file) }}"
         uri: "{{ libvirt_uri }}"
+      delegate_to: localhost
 
     - name: Find PCIe passthrough devices
       ansible.builtin.find:
@@ -110,6 +122,7 @@
         file_type: file
         patterns: "pcie_passthrough_*.xml"
       register: passthrough_devices
+      delegate_to: localhost
 
     - name: Attach PCIe passthrough devices to each target node
       environment:
@@ -124,6 +137,7 @@
       loop: "{{ passthrough_devices.files }}"
       loop_control:
         label: "Doing PCI-E passthrough for device {{ item }}"
+      delegate_to: localhost
       when:
         - passthrough_devices.matched > 0
 
@@ -142,3 +156,4 @@
     name: "{{ inventory_hostname }}"
     uri: "{{ libvirt_uri }}"
     state: running
+  delegate_to: localhost
diff --git a/scripts/guestfs.Makefile b/scripts/guestfs.Makefile
index bd03f58c..f6c350a4 100644
--- a/scripts/guestfs.Makefile
+++ b/scripts/guestfs.Makefile
@@ -79,7 +79,7 @@ bringup_guestfs: $(GUESTFS_BRINGUP_DEPS)
 		--extra-vars=@./extra_vars.yaml \
 		--tags network,pool,base_image
 	$(Q)ansible-playbook $(ANSIBLE_VERBOSE) \
-		--limit 'baseline:dev:service' \
+		--limit 'baseline:dev:service:ai' \
 		playbooks/guestfs.yml \
 		--extra-vars=@./extra_vars.yaml \
 		--tags bringup
diff --git a/workflows/ai/Kconfig b/workflows/ai/Kconfig
index 2ffc6b65..d04570d8 100644
--- a/workflows/ai/Kconfig
+++ b/workflows/ai/Kconfig
@@ -161,4 +161,17 @@ config AI_BENCHMARK_ITERATIONS
 # Docker storage configuration
 source "workflows/ai/Kconfig.docker-storage"
 
+# Multi-filesystem configuration
+config AI_MULTIFS_ENABLE
+	bool "Enable multi-filesystem benchmarking"
+	output yaml
+	default n
+	help
+	  Run AI benchmarks across multiple filesystem configurations
+	  to compare performance characteristics.
+
+if AI_MULTIFS_ENABLE
+source "workflows/ai/Kconfig.multifs"
+endif
+
 endif # KDEVOPS_WORKFLOW_ENABLE_AI
diff --git a/workflows/ai/Kconfig.fs b/workflows/ai/Kconfig.fs
new file mode 100644
index 00000000..a95d02c6
--- /dev/null
+++ b/workflows/ai/Kconfig.fs
@@ -0,0 +1,118 @@
+menu "Target filesystem to use"
+
+choice
+	prompt "Target filesystem"
+	default AI_FILESYSTEM_XFS
+
+config AI_FILESYSTEM_XFS
+	bool "xfs"
+	select HAVE_SUPPORTS_PURE_IOMAP if BOOTLINUX_TREE_LINUS || BOOTLINUX_TREE_STABLE
+	help
+	  This will target testing AI workloads on top of XFS.
+	  XFS provides excellent performance for large datasets
+	  and is commonly used in high-performance computing.
+
+config AI_FILESYSTEM_BTRFS
+	bool "btrfs"
+	help
+	  This will target testing AI workloads on top of btrfs.
+	  Btrfs provides features like snapshots and compression
+	  which can be useful for AI dataset management.
+
+config AI_FILESYSTEM_EXT4
+	bool "ext4"
+	help
+	  This will target testing AI workloads on top of ext4.
+	  Ext4 is widely supported and provides reliable performance
+	  for AI workloads.
+
+endchoice
+
+config AI_FILESYSTEM
+	string
+	output yaml
+	default "xfs" if AI_FILESYSTEM_XFS
+	default "btrfs" if AI_FILESYSTEM_BTRFS
+	default "ext4" if AI_FILESYSTEM_EXT4
+
+config AI_FSTYPE
+	string
+	output yaml
+	default "xfs" if AI_FILESYSTEM_XFS
+	default "btrfs" if AI_FILESYSTEM_BTRFS
+	default "ext4" if AI_FILESYSTEM_EXT4
+
+if AI_FILESYSTEM_XFS
+
+menu "XFS configuration"
+
+config AI_XFS_MKFS_OPTS
+	string "mkfs.xfs options"
+	output yaml
+	default "-f -s size=4096"
+	help
+	  Additional options to pass to mkfs.xfs when creating
+	  the filesystem for AI workloads.
+
+config AI_XFS_MOUNT_OPTS
+	string "XFS mount options"
+	output yaml
+	default "rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota"
+	help
+	  Mount options for XFS filesystem. These options are
+	  optimized for AI workloads with large sequential I/O.
+
+endmenu
+
+endif # AI_FILESYSTEM_XFS
+
+if AI_FILESYSTEM_BTRFS
+
+menu "Btrfs configuration"
+
+config AI_BTRFS_MKFS_OPTS
+	string "mkfs.btrfs options"
+	output yaml
+	default "-f"
+	help
+	  Additional options to pass to mkfs.btrfs when creating
+	  the filesystem for AI workloads.
+
+config AI_BTRFS_MOUNT_OPTS
+	string "Btrfs mount options"
+	output yaml
+	default "rw,relatime,compress=lz4,space_cache=v2"
+	help
+	  Mount options for Btrfs filesystem. LZ4 compression
+	  can help with AI datasets while maintaining performance.
+
+endmenu
+
+endif # AI_FILESYSTEM_BTRFS
+
+if AI_FILESYSTEM_EXT4
+
+menu "Ext4 configuration"
+
+config AI_EXT4_MKFS_OPTS
+	string "mkfs.ext4 options"
+	output yaml
+	default "-F"
+	help
+	  Additional options to pass to mkfs.ext4 when creating
+	  the filesystem for AI workloads.
+
+config AI_EXT4_MOUNT_OPTS
+	string "Ext4 mount options"
+	output yaml
+	default "rw,relatime,data=ordered"
+	help
+	  Mount options for Ext4 filesystem optimized for
+	  AI workload patterns.
+
+endmenu
+
+endif # AI_FILESYSTEM_EXT4
+
+
+endmenu
diff --git a/workflows/ai/Kconfig.multifs b/workflows/ai/Kconfig.multifs
new file mode 100644
index 00000000..2b72dd6c
--- /dev/null
+++ b/workflows/ai/Kconfig.multifs
@@ -0,0 +1,184 @@
+menu "Multi-filesystem testing configuration"
+
+config AI_ENABLE_MULTIFS_TESTING
+	bool "Enable multi-filesystem testing"
+	default n
+	output yaml
+	help
+	  Enable testing the same AI workload across multiple filesystem
+	  configurations. This allows comparing performance characteristics
+	  between different filesystems and their configurations.
+
+	  When enabled, the AI benchmark will run sequentially across all
+	  selected filesystem configurations, allowing for detailed
+	  performance analysis across different storage backends.
+
+if AI_ENABLE_MULTIFS_TESTING
+
+config AI_MULTIFS_TEST_XFS
+	bool "Test XFS configurations"
+	default y
+	output yaml
+	help
+	  Enable testing AI workloads on XFS filesystem with different
+	  block size configurations.
+
+if AI_MULTIFS_TEST_XFS
+
+menu "XFS configuration profiles"
+
+config AI_MULTIFS_XFS_4K_4KS
+	bool "XFS 4k block size - 4k sector size"
+	default y
+	output yaml
+	help
+	  Test AI workloads on XFS with 4k filesystem block size
+	  and 4k sector size. This is the most common configuration
+	  and provides good performance for most workloads.
+
+config AI_MULTIFS_XFS_16K_4KS
+	bool "XFS 16k block size - 4k sector size"
+	default y
+	output yaml
+	help
+	  Test AI workloads on XFS with 16k filesystem block size
+	  and 4k sector size. Larger block sizes can improve performance
+	  for sequential I/O patterns common in AI workloads.
+
+config AI_MULTIFS_XFS_32K_4KS
+	bool "XFS 32k block size - 4k sector size"
+	default y
+	output yaml
+	help
+	  Test AI workloads on XFS with 32k filesystem block size
+	  and 4k sector size. Even larger block sizes can provide
+	  benefits for large sequential I/O operations typical in
+	  AI vector database workloads.
+
+config AI_MULTIFS_XFS_64K_4KS
+	bool "XFS 64k block size - 4k sector size"
+	default y
+	output yaml
+	help
+	  Test AI workloads on XFS with 64k filesystem block size
+	  and 4k sector size. Maximum supported block size for XFS,
+	  optimized for very large file operations and high-throughput
+	  AI workloads with substantial data transfers.
+
+endmenu
+
+endif # AI_MULTIFS_TEST_XFS
+
+config AI_MULTIFS_TEST_EXT4
+	bool "Test ext4 configurations"
+	default y
+	output yaml
+	help
+	  Enable testing AI workloads on ext4 filesystem with different
+	  configurations including bigalloc options.
+
+if AI_MULTIFS_TEST_EXT4
+
+menu "ext4 configuration profiles"
+
+config AI_MULTIFS_EXT4_4K
+	bool "ext4 4k block size"
+	default y
+	output yaml
+	help
+	  Test AI workloads on ext4 with standard 4k block size.
+	  This is the default ext4 configuration.
+
+config AI_MULTIFS_EXT4_16K_BIGALLOC
+	bool "ext4 16k bigalloc"
+	default y
+	output yaml
+	help
+	  Test AI workloads on ext4 with 16k bigalloc enabled.
+	  Bigalloc reduces metadata overhead and can improve
+	  performance for large file workloads.
+
+endmenu
+
+endif # AI_MULTIFS_TEST_EXT4
+
+config AI_MULTIFS_TEST_BTRFS
+	bool "Test btrfs configurations"
+	default y
+	output yaml
+	help
+	  Enable testing AI workloads on btrfs filesystem with
+	  common default configuration profile.
+
+if AI_MULTIFS_TEST_BTRFS
+
+menu "btrfs configuration profiles"
+
+config AI_MULTIFS_BTRFS_DEFAULT
+	bool "btrfs default profile"
+	default y
+	output yaml
+	help
+	  Test AI workloads on btrfs with default configuration.
+	  This includes modern defaults with free-space-tree and
+	  no-holes features enabled.
+
+endmenu
+
+endif # AI_MULTIFS_TEST_BTRFS
+
+config AI_MULTIFS_RESULTS_DIR
+	string "Multi-filesystem results directory"
+	output yaml
+	default "/data/ai-multifs-benchmark"
+	help
+	  Directory where multi-filesystem test results and logs will be stored.
+	  Each filesystem configuration will have its own subdirectory.
+
+config AI_MILVUS_STORAGE_ENABLE
+	bool "Enable dedicated Milvus storage with filesystem matching node profile"
+	default y
+	output yaml
+	help
+	  Configure a dedicated storage device for Milvus data including
+	  vector data (MinIO), metadata (etcd), and local cache. The filesystem
+	  type will automatically match the node's configuration profile.
+
+config AI_MILVUS_DEVICE
+	string "Device to use for Milvus storage"
+	output yaml
+	default "/dev/disk/by-id/nvme-QEMU_NVMe_Ctrl_kdevops3" if LIBVIRT && LIBVIRT_EXTRA_STORAGE_DRIVE_NVME
+	default "/dev/disk/by-id/virtio-kdevops3" if LIBVIRT && LIBVIRT_EXTRA_STORAGE_DRIVE_VIRTIO
+	default "/dev/disk/by-id/ata-QEMU_HARDDISK_kdevops3" if LIBVIRT && LIBVIRT_EXTRA_STORAGE_DRIVE_IDE
+	default "/dev/nvme3n1" if TERRAFORM_AWS_INSTANCE_M5AD_2XLARGE
+	default "/dev/nvme3n1" if TERRAFORM_AWS_INSTANCE_M5AD_4XLARGE
+	default "/dev/nvme3n1" if TERRAFORM_GCE
+	default "/dev/sde" if TERRAFORM_AZURE
+	default TERRAFORM_OCI_SPARSE_VOLUME_DEVICE_FILE_NAME if TERRAFORM_OCI
+	help
+	  The device to use for Milvus storage. This device will be
+	  formatted with the filesystem type matching the node's profile
+	  and mounted at /data/milvus.
+
+config AI_MILVUS_MOUNT_POINT
+	string "Mount point for Milvus storage"
+	output yaml
+	default "/data/milvus"
+	help
+	  The path where the Milvus storage filesystem will be mounted.
+	  All Milvus data directories (data/, etcd/, minio/) will be
+	  created under this mount point.
+
+config AI_MILVUS_USE_NODE_FS
+	bool "Automatically detect filesystem type from node name"
+	default y
+	output yaml
+	help
+	  When enabled, the filesystem type for Milvus storage will be
+	  automatically determined based on the node's configuration name.
+	  For example, nodes named *-xfs-* will use XFS, *-ext4-* will
+	  use ext4, and *-btrfs-* will use Btrfs.
+
+endif # AI_ENABLE_MULTIFS_TESTING
+
+endmenu
diff --git a/workflows/ai/scripts/analysis_config.json b/workflows/ai/scripts/analysis_config.json
index 2f90f4d5..5f0a9328 100644
--- a/workflows/ai/scripts/analysis_config.json
+++ b/workflows/ai/scripts/analysis_config.json
@@ -2,5 +2,5 @@
   "enable_graphing": true,
   "graph_format": "png",
   "graph_dpi": 150,
-  "graph_theme": "seaborn"
+  "graph_theme": "default"
 }
diff --git a/workflows/ai/scripts/analyze_results.py b/workflows/ai/scripts/analyze_results.py
index 3d11fb11..2dc4a1d6 100755
--- a/workflows/ai/scripts/analyze_results.py
+++ b/workflows/ai/scripts/analyze_results.py
@@ -226,6 +226,68 @@ class ResultsAnalyzer:
 
         return fs_info
 
+    def _extract_filesystem_config(
+        self, result: Dict[str, Any]
+    ) -> tuple[str, str, str]:
+        """Extract filesystem type and block size from result data.
+        Returns (fs_type, block_size, config_key)"""
+        filename = result.get("_file", "")
+
+        # Primary: Extract filesystem type from filename (more reliable than JSON)
+        fs_type = "unknown"
+        block_size = "default"
+
+        if "xfs" in filename:
+            fs_type = "xfs"
+            # Check larger sizes first to avoid substring matches
+            if "64k" in filename and "64k-" in filename:
+                block_size = "64k"
+            elif "32k" in filename and "32k-" in filename:
+                block_size = "32k"
+            elif "16k" in filename and "16k-" in filename:
+                block_size = "16k"
+            elif "4k" in filename and "4k-" in filename:
+                block_size = "4k"
+        elif "ext4" in filename:
+            fs_type = "ext4"
+            if "16k" in filename:
+                block_size = "16k"
+            elif "4k" in filename:
+                block_size = "4k"
+        elif "btrfs" in filename:
+            fs_type = "btrfs"
+            block_size = "default"
+        else:
+            # Fallback to JSON data if filename parsing fails
+            fs_type = result.get("filesystem", "unknown")
+            self.logger.warning(
+                f"Could not determine filesystem from filename {filename}, using JSON data: {fs_type}"
+            )
+
+        config_key = f"{fs_type}-{block_size}" if block_size != "default" else fs_type
+        return fs_type, block_size, config_key
+
+    def _extract_node_info(self, result: Dict[str, Any]) -> tuple[str, bool]:
+        """Extract node hostname and determine if it's a dev node.
+        Returns (hostname, is_dev_node)"""
+        # Get hostname from system_info (preferred) or fall back to filename
+        system_info = result.get("system_info", {})
+        hostname = system_info.get("hostname", "")
+
+        # If no hostname in system_info, try extracting from filename
+        if not hostname:
+            filename = result.get("_file", "")
+            # Remove results_ prefix and .json suffix
+            hostname = filename.replace("results_", "").replace(".json", "")
+            # Remove iteration number if present (_1, _2, etc.)
+            if "_" in hostname and hostname.split("_")[-1].isdigit():
+                hostname = "_".join(hostname.split("_")[:-1])
+
+        # Determine if this is a dev node
+        is_dev = hostname.endswith("-dev")
+
+        return hostname, is_dev
+
     def load_results(self) -> bool:
         """Load all result files from the results directory"""
         try:
@@ -391,6 +453,8 @@ class ResultsAnalyzer:
             html.append(
                 "        .highlight { background-color: #fff3cd; padding: 10px; border-radius: 3px; }"
             )
+            html.append("        .baseline-row { background-color: #e8f5e9; }")
+            html.append("        .dev-row { background-color: #e3f2fd; }")
             html.append("    </style>")
             html.append("</head>")
             html.append("<body>")
@@ -486,26 +550,69 @@ class ResultsAnalyzer:
             else:
                 html.append("        <p>No storage device information available.</p>")
 
-            # Filesystem section
-            html.append("        <h3>🗂️ Filesystem Configuration</h3>")
-            fs_info = self.system_info.get("filesystem_info", {})
-            html.append("        <table class='config-table'>")
-            html.append(
-                "            <tr><td>Filesystem Type</td><td>"
-                + str(fs_info.get("filesystem_type", "Unknown"))
-                + "</td></tr>"
-            )
-            html.append(
-                "            <tr><td>Mount Point</td><td>"
-                + str(fs_info.get("mount_point", "Unknown"))
-                + "</td></tr>"
-            )
-            html.append(
-                "            <tr><td>Mount Options</td><td>"
-                + str(fs_info.get("mount_options", "Unknown"))
-                + "</td></tr>"
-            )
-            html.append("        </table>")
+            # Node Configuration section - Extract from actual benchmark results
+            html.append("        <h3>🗂️ Node Configuration</h3>")
+
+            # Collect node and filesystem information from benchmark results
+            node_configs = {}
+            for result in self.results_data:
+                # Extract node information
+                hostname, is_dev = self._extract_node_info(result)
+                fs_type, block_size, config_key = self._extract_filesystem_config(
+                    result
+                )
+
+                system_info = result.get("system_info", {})
+                data_path = system_info.get("data_path", "/data/milvus")
+                mount_point = system_info.get("mount_point", "/data")
+                kernel_version = system_info.get("kernel_version", "unknown")
+
+                if hostname not in node_configs:
+                    node_configs[hostname] = {
+                        "hostname": hostname,
+                        "node_type": "Development" if is_dev else "Baseline",
+                        "filesystem": fs_type,
+                        "block_size": block_size,
+                        "data_path": data_path,
+                        "mount_point": mount_point,
+                        "kernel": kernel_version,
+                        "test_count": 0,
+                    }
+                node_configs[hostname]["test_count"] += 1
+
+            if node_configs:
+                html.append("        <table class='config-table'>")
+                html.append(
+                    "            <tr><th>Node</th><th>Type</th><th>Filesystem</th><th>Block Size</th><th>Data Path</th><th>Mount Point</th><th>Kernel</th><th>Tests</th></tr>"
+                )
+                # Sort nodes with baseline first, then dev
+                sorted_nodes = sorted(
+                    node_configs.items(),
+                    key=lambda x: (x[1]["node_type"] != "Baseline", x[0]),
+                )
+                for hostname, config_info in sorted_nodes:
+                    row_class = (
+                        "dev-row"
+                        if config_info["node_type"] == "Development"
+                        else "baseline-row"
+                    )
+                    html.append(f"            <tr class='{row_class}'>")
+                    html.append(f"                <td><strong>{hostname}</strong></td>")
+                    html.append(f"                <td>{config_info['node_type']}</td>")
+                    html.append(f"                <td>{config_info['filesystem']}</td>")
+                    html.append(f"                <td>{config_info['block_size']}</td>")
+                    html.append(f"                <td>{config_info['data_path']}</td>")
+                    html.append(
+                        f"                <td>{config_info['mount_point']}</td>"
+                    )
+                    html.append(f"                <td>{config_info['kernel']}</td>")
+                    html.append(f"                <td>{config_info['test_count']}</td>")
+                    html.append(f"            </tr>")
+                html.append("        </table>")
+            else:
+                html.append(
+                    "        <p>No node configuration data found in results.</p>"
+                )
             html.append("    </div>")
 
             # Test Configuration Section
@@ -551,92 +658,192 @@ class ResultsAnalyzer:
                 html.append("        </table>")
                 html.append("    </div>")
 
-            # Performance Results Section
+            # Performance Results Section - Per Node
             html.append("    <div class='section'>")
-            html.append("        <h2>📊 Performance Results Summary</h2>")
+            html.append("        <h2>📊 Performance Results by Node</h2>")
 
             if self.results_data:
-                # Insert performance
-                insert_times = [
-                    r.get("insert_performance", {}).get("total_time_seconds", 0)
-                    for r in self.results_data
-                ]
-                insert_rates = [
-                    r.get("insert_performance", {}).get("vectors_per_second", 0)
-                    for r in self.results_data
-                ]
-
-                if insert_times and any(t > 0 for t in insert_times):
-                    html.append("        <h3>📈 Vector Insert Performance</h3>")
-                    html.append("        <table class='metric-table'>")
-                    html.append(
-                        f"            <tr><td>Average Insert Time</td><td>{np.mean(insert_times):.2f} seconds</td></tr>"
-                    )
-                    html.append(
-                        f"            <tr><td>Average Insert Rate</td><td>{np.mean(insert_rates):.2f} vectors/sec</td></tr>"
+                # Group results by node
+                node_performance = {}
+
+                for result in self.results_data:
+                    # Use node hostname as the grouping key
+                    hostname, is_dev = self._extract_node_info(result)
+                    fs_type, block_size, config_key = self._extract_filesystem_config(
+                        result
                     )
-                    html.append(
-                        f"            <tr><td>Insert Rate Range</td><td>{np.min(insert_rates):.2f} - {np.max(insert_rates):.2f} vectors/sec</td></tr>"
-                    )
-                    html.append("        </table>")
 
-                # Index performance
-                index_times = [
-                    r.get("index_performance", {}).get("creation_time_seconds", 0)
-                    for r in self.results_data
-                ]
-                if index_times and any(t > 0 for t in index_times):
-                    html.append("        <h3>🔗 Index Creation Performance</h3>")
-                    html.append("        <table class='metric-table'>")
-                    html.append(
-                        f"            <tr><td>Average Index Creation Time</td><td>{np.mean(index_times):.2f} seconds</td></tr>"
+                    if hostname not in node_performance:
+                        node_performance[hostname] = {
+                            "hostname": hostname,
+                            "node_type": "Development" if is_dev else "Baseline",
+                            "insert_rates": [],
+                            "insert_times": [],
+                            "index_times": [],
+                            "query_performance": {},
+                            "filesystem": fs_type,
+                            "block_size": block_size,
+                        }
+
+                    # Add insert performance
+                    insert_perf = result.get("insert_performance", {})
+                    if insert_perf:
+                        rate = insert_perf.get("vectors_per_second", 0)
+                        time = insert_perf.get("total_time_seconds", 0)
+                        if rate > 0:
+                            node_performance[hostname]["insert_rates"].append(rate)
+                        if time > 0:
+                            node_performance[hostname]["insert_times"].append(time)
+
+                    # Add index performance
+                    index_perf = result.get("index_performance", {})
+                    if index_perf:
+                        time = index_perf.get("creation_time_seconds", 0)
+                        if time > 0:
+                            node_performance[hostname]["index_times"].append(time)
+
+                    # Collect query performance (use first result for each node)
+                    query_perf = result.get("query_performance", {})
+                    if (
+                        query_perf
+                        and not node_performance[hostname]["query_performance"]
+                    ):
+                        node_performance[hostname]["query_performance"] = query_perf
+
+                # Display results for each node, sorted with baseline first
+                sorted_nodes = sorted(
+                    node_performance.items(),
+                    key=lambda x: (x[1]["node_type"] != "Baseline", x[0]),
+                )
+                for hostname, perf_data in sorted_nodes:
+                    node_type_badge = (
+                        "🔵" if perf_data["node_type"] == "Development" else "🟢"
                     )
                     html.append(
-                        f"            <tr><td>Index Time Range</td><td>{np.min(index_times):.2f} - {np.max(index_times):.2f} seconds</td></tr>"
+                        f"        <h3>{node_type_badge} {hostname} ({perf_data['node_type']})</h3>"
                     )
-                    html.append("        </table>")
-
-                # Query performance
-                html.append("        <h3>🔍 Query Performance</h3>")
-                first_query_perf = self.results_data[0].get("query_performance", {})
-                if first_query_perf:
-                    html.append("        <table>")
                     html.append(
-                        "            <tr><th>Query Type</th><th>Batch Size</th><th>QPS</th><th>Avg Latency (ms)</th></tr>"
+                        f"        <p>Filesystem: {perf_data['filesystem']}, Block Size: {perf_data['block_size']}</p>"
                     )
 
-                    for topk, topk_data in first_query_perf.items():
-                        for batch, batch_data in topk_data.items():
-                            qps = batch_data.get("queries_per_second", 0)
-                            avg_time = batch_data.get("average_time_seconds", 0) * 1000
-
-                            # Color coding for performance
-                            qps_class = ""
-                            if qps > 1000:
-                                qps_class = "performance-good"
-                            elif qps > 100:
-                                qps_class = "performance-warning"
-                            else:
-                                qps_class = "performance-poor"
-
-                            html.append(f"            <tr>")
-                            html.append(
-                                f"                <td>{topk.replace('topk_', 'Top-')}</td>"
-                            )
-                            html.append(
-                                f"                <td>{batch.replace('batch_', 'Batch ')}</td>"
-                            )
-                            html.append(
-                                f"                <td class='{qps_class}'>{qps:.2f}</td>"
-                            )
-                            html.append(f"                <td>{avg_time:.2f}</td>")
-                            html.append(f"            </tr>")
+                    # Insert performance
+                    insert_rates = perf_data["insert_rates"]
+                    if insert_rates:
+                        html.append("        <h4>📈 Vector Insert Performance</h4>")
+                        html.append("        <table class='metric-table'>")
+                        html.append(
+                            f"            <tr><td>Average Insert Rate</td><td>{np.mean(insert_rates):.2f} vectors/sec</td></tr>"
+                        )
+                        html.append(
+                            f"            <tr><td>Insert Rate Range</td><td>{np.min(insert_rates):.2f} - {np.max(insert_rates):.2f} vectors/sec</td></tr>"
+                        )
+                        html.append(
+                            f"            <tr><td>Test Iterations</td><td>{len(insert_rates)}</td></tr>"
+                        )
+                        html.append("        </table>")
+
+                    # Index performance
+                    index_times = perf_data["index_times"]
+                    if index_times:
+                        html.append("        <h4>🔗 Index Creation Performance</h4>")
+                        html.append("        <table class='metric-table'>")
+                        html.append(
+                            f"            <tr><td>Average Index Creation Time</td><td>{np.mean(index_times):.3f} seconds</td></tr>"
+                        )
+                        html.append(
+                            f"            <tr><td>Index Time Range</td><td>{np.min(index_times):.3f} - {np.max(index_times):.3f} seconds</td></tr>"
+                        )
+                        html.append("        </table>")
+
+                    # Query performance
+                    query_perf = perf_data["query_performance"]
+                    if query_perf:
+                        html.append("        <h4>🔍 Query Performance</h4>")
+                        html.append("        <table>")
+                        html.append(
+                            "            <tr><th>Query Type</th><th>Batch Size</th><th>QPS</th><th>Avg Latency (ms)</th></tr>"
+                        )
 
-                    html.append("        </table>")
+                        for topk, topk_data in query_perf.items():
+                            for batch, batch_data in topk_data.items():
+                                qps = batch_data.get("queries_per_second", 0)
+                                avg_time = (
+                                    batch_data.get("average_time_seconds", 0) * 1000
+                                )
+
+                                # Color coding for performance
+                                qps_class = ""
+                                if qps > 1000:
+                                    qps_class = "performance-good"
+                                elif qps > 100:
+                                    qps_class = "performance-warning"
+                                else:
+                                    qps_class = "performance-poor"
+
+                                html.append(f"            <tr>")
+                                html.append(
+                                    f"                <td>{topk.replace('topk_', 'Top-')}</td>"
+                                )
+                                html.append(
+                                    f"                <td>{batch.replace('batch_', 'Batch ')}</td>"
+                                )
+                                html.append(
+                                    f"                <td class='{qps_class}'>{qps:.2f}</td>"
+                                )
+                                html.append(f"                <td>{avg_time:.2f}</td>")
+                                html.append(f"            </tr>")
+                        html.append("        </table>")
+
+                    html.append("        <br>")  # Add spacing between configurations
 
-                html.append("    </div>")
+            html.append("    </div>")
 
             # Footer
+            # Performance Graphs Section
+            html.append("    <div class='section'>")
+            html.append("        <h2>📈 Performance Visualizations</h2>")
+            html.append(
+                "        <p>The following graphs provide visual analysis of the benchmark results across all tested filesystem configurations:</p>"
+            )
+            html.append("        <ul>")
+            html.append(
+                "            <li><strong>Insert Performance:</strong> Shows vector insertion rates and times for each filesystem configuration</li>"
+            )
+            html.append(
+                "            <li><strong>Query Performance:</strong> Displays query performance heatmaps for different Top-K and batch sizes</li>"
+            )
+            html.append(
+                "            <li><strong>Index Performance:</strong> Compares index creation times across filesystems</li>"
+            )
+            html.append(
+                "            <li><strong>Performance Matrix:</strong> Comprehensive comparison matrix of all metrics</li>"
+            )
+            html.append(
+                "            <li><strong>Filesystem Comparison:</strong> Side-by-side comparison of filesystem performance</li>"
+            )
+            html.append("        </ul>")
+            html.append(
+                "        <p><em>Note: Graphs are generated as separate PNG files in the same directory as this report.</em></p>"
+            )
+            html.append("        <div style='margin-top: 20px;'>")
+            html.append(
+                "            <img src='insert_performance.png' alt='Insert Performance' style='max-width: 100%; height: auto; margin-bottom: 20px;'>"
+            )
+            html.append(
+                "            <img src='query_performance.png' alt='Query Performance' style='max-width: 100%; height: auto; margin-bottom: 20px;'>"
+            )
+            html.append(
+                "            <img src='index_performance.png' alt='Index Performance' style='max-width: 100%; height: auto; margin-bottom: 20px;'>"
+            )
+            html.append(
+                "            <img src='performance_matrix.png' alt='Performance Matrix' style='max-width: 100%; height: auto; margin-bottom: 20px;'>"
+            )
+            html.append(
+                "            <img src='filesystem_comparison.png' alt='Filesystem Comparison' style='max-width: 100%; height: auto; margin-bottom: 20px;'>"
+            )
+            html.append("        </div>")
+            html.append("    </div>")
+
             html.append("    <div class='section'>")
             html.append("        <h2>📝 Notes</h2>")
             html.append("        <ul>")
@@ -661,10 +868,11 @@ class ResultsAnalyzer:
             return "\n".join(html)
 
         except Exception as e:
-            self.logger.error(f"Error generating HTML report: {e}")
-            return (
-                f"<html><body><h1>Error generating HTML report: {e}</h1></body></html>"
-            )
+            import traceback
+
+            tb = traceback.format_exc()
+            self.logger.error(f"Error generating HTML report: {e}\n{tb}")
+            return f"<html><body><h1>Error generating HTML report: {e}</h1><pre>{tb}</pre></body></html>"
 
     def generate_graphs(self) -> bool:
         """Generate performance visualization graphs"""
@@ -691,6 +899,9 @@ class ResultsAnalyzer:
             # Graph 4: Performance Comparison Matrix
             self._plot_performance_matrix()
 
+            # Graph 5: Multi-filesystem Comparison (if applicable)
+            self._plot_filesystem_comparison()
+
             self.logger.info("Graphs generated successfully")
             return True
 
@@ -699,34 +910,188 @@ class ResultsAnalyzer:
             return False
 
     def _plot_insert_performance(self):
-        """Plot insert performance metrics"""
-        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
+        """Plot insert performance metrics with node differentiation"""
+        # Group data by node
+        node_performance = {}
 
-        # Extract insert data
-        iterations = []
-        insert_rates = []
-        insert_times = []
+        for result in self.results_data:
+            hostname, is_dev = self._extract_node_info(result)
+
+            if hostname not in node_performance:
+                node_performance[hostname] = {
+                    "insert_rates": [],
+                    "insert_times": [],
+                    "iterations": [],
+                    "is_dev": is_dev,
+                }
 
-        for i, result in enumerate(self.results_data):
             insert_perf = result.get("insert_performance", {})
             if insert_perf:
-                iterations.append(i + 1)
-                insert_rates.append(insert_perf.get("vectors_per_second", 0))
-                insert_times.append(insert_perf.get("total_time_seconds", 0))
-
-        # Plot insert rate
-        ax1.plot(iterations, insert_rates, "b-o", linewidth=2, markersize=6)
-        ax1.set_xlabel("Iteration")
-        ax1.set_ylabel("Vectors/Second")
-        ax1.set_title("Vector Insert Rate Performance")
-        ax1.grid(True, alpha=0.3)
-
-        # Plot insert time
-        ax2.plot(iterations, insert_times, "r-o", linewidth=2, markersize=6)
-        ax2.set_xlabel("Iteration")
-        ax2.set_ylabel("Total Time (seconds)")
-        ax2.set_title("Vector Insert Time Performance")
-        ax2.grid(True, alpha=0.3)
+                node_performance[hostname]["insert_rates"].append(
+                    insert_perf.get("vectors_per_second", 0)
+                )
+                node_performance[hostname]["insert_times"].append(
+                    insert_perf.get("total_time_seconds", 0)
+                )
+                node_performance[hostname]["iterations"].append(
+                    len(node_performance[hostname]["insert_rates"])
+                )
+
+        # Check if we have multiple nodes
+        if len(node_performance) > 1:
+            # Multi-node mode: separate lines for each node
+            fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(18, 7))
+
+            # Sort nodes with baseline first, then dev
+            sorted_nodes = sorted(
+                node_performance.items(), key=lambda x: (x[1]["is_dev"], x[0])
+            )
+
+            # Create color palettes for baseline and dev nodes
+            baseline_colors = [
+                "#2E7D32",
+                "#43A047",
+                "#66BB6A",
+                "#81C784",
+                "#A5D6A7",
+                "#C8E6C9",
+            ]  # Greens
+            dev_colors = [
+                "#0D47A1",
+                "#1565C0",
+                "#1976D2",
+                "#1E88E5",
+                "#2196F3",
+                "#42A5F5",
+                "#64B5F6",
+            ]  # Blues
+
+            # Additional colors if needed
+            extra_colors = [
+                "#E65100",
+                "#F57C00",
+                "#FF9800",
+                "#FFB300",
+                "#FFC107",
+                "#FFCA28",
+            ]  # Oranges
+
+            # Line styles to cycle through
+            line_styles = ["-", "--", "-.", ":"]
+            markers = ["o", "s", "^", "v", "D", "p", "*", "h"]
+
+            baseline_idx = 0
+            dev_idx = 0
+
+            # Use different colors and styles for each node
+            for idx, (hostname, perf_data) in enumerate(sorted_nodes):
+                if not perf_data["insert_rates"]:
+                    continue
+
+                # Choose color and style based on node type and index
+                if perf_data["is_dev"]:
+                    # Development nodes - blues
+                    color = dev_colors[dev_idx % len(dev_colors)]
+                    linestyle = line_styles[
+                        (dev_idx // len(dev_colors)) % len(line_styles)
+                    ]
+                    marker = markers[4 + (dev_idx % 4)]  # Use markers 4-7 for dev
+                    label = f"{hostname} (Dev)"
+                    dev_idx += 1
+                else:
+                    # Baseline nodes - greens
+                    color = baseline_colors[baseline_idx % len(baseline_colors)]
+                    linestyle = line_styles[
+                        (baseline_idx // len(baseline_colors)) % len(line_styles)
+                    ]
+                    marker = markers[
+                        baseline_idx % 4
+                    ]  # Use first 4 markers for baseline
+                    label = f"{hostname} (Baseline)"
+                    baseline_idx += 1
+
+                iterations = list(range(1, len(perf_data["insert_rates"]) + 1))
+
+                # Plot insert rate with alpha for better visibility
+                ax1.plot(
+                    iterations,
+                    perf_data["insert_rates"],
+                    color=color,
+                    linestyle=linestyle,
+                    marker=marker,
+                    linewidth=1.5,
+                    markersize=5,
+                    label=label,
+                    alpha=0.8,
+                )
+
+                # Plot insert time
+                ax2.plot(
+                    iterations,
+                    perf_data["insert_times"],
+                    color=color,
+                    linestyle=linestyle,
+                    marker=marker,
+                    linewidth=1.5,
+                    markersize=5,
+                    label=label,
+                    alpha=0.8,
+                )
+
+            ax1.set_xlabel("Iteration")
+            ax1.set_ylabel("Vectors/Second")
+            ax1.set_title("Milvus Insert Rate by Node")
+            ax1.grid(True, alpha=0.3)
+            # Position legend outside plot area for better visibility with many nodes
+            ax1.legend(bbox_to_anchor=(1.05, 1), loc="upper left", fontsize=7, ncol=1)
+
+            ax2.set_xlabel("Iteration")
+            ax2.set_ylabel("Total Time (seconds)")
+            ax2.set_title("Milvus Insert Time by Node")
+            ax2.grid(True, alpha=0.3)
+            # Position legend outside plot area for better visibility with many nodes
+            ax2.legend(bbox_to_anchor=(1.05, 1), loc="upper left", fontsize=7, ncol=1)
+
+            plt.suptitle(
+                "Insert Performance Analysis: Baseline vs Development",
+                fontsize=14,
+                y=1.02,
+            )
+        else:
+            # Single node mode: original behavior
+            fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
+
+            # Extract insert data from single node
+            hostname = list(node_performance.keys())[0] if node_performance else None
+            if hostname:
+                perf_data = node_performance[hostname]
+                iterations = list(range(1, len(perf_data["insert_rates"]) + 1))
+
+                # Plot insert rate
+                ax1.plot(
+                    iterations,
+                    perf_data["insert_rates"],
+                    "b-o",
+                    linewidth=2,
+                    markersize=6,
+                )
+                ax1.set_xlabel("Iteration")
+                ax1.set_ylabel("Vectors/Second")
+                ax1.set_title(f"Vector Insert Rate Performance - {hostname}")
+                ax1.grid(True, alpha=0.3)
+
+                # Plot insert time
+                ax2.plot(
+                    iterations,
+                    perf_data["insert_times"],
+                    "r-o",
+                    linewidth=2,
+                    markersize=6,
+                )
+                ax2.set_xlabel("Iteration")
+                ax2.set_ylabel("Total Time (seconds)")
+                ax2.set_title(f"Vector Insert Time Performance - {hostname}")
+                ax2.grid(True, alpha=0.3)
 
         plt.tight_layout()
         output_file = os.path.join(
@@ -739,52 +1104,110 @@ class ResultsAnalyzer:
         plt.close()
 
     def _plot_query_performance(self):
-        """Plot query performance metrics"""
+        """Plot query performance metrics comparing baseline vs dev nodes"""
         if not self.results_data:
             return
 
-        # Collect query performance data
-        query_data = []
+        # Group data by filesystem configuration
+        fs_groups = {}
         for result in self.results_data:
+            hostname, is_dev = self._extract_node_info(result)
+            fs_type, block_size, config_key = self._extract_filesystem_config(result)
+
+            if config_key not in fs_groups:
+                fs_groups[config_key] = {"baseline": [], "dev": []}
+
             query_perf = result.get("query_performance", {})
-            for topk, topk_data in query_perf.items():
-                for batch, batch_data in topk_data.items():
-                    query_data.append(
-                        {
-                            "topk": topk.replace("topk_", ""),
-                            "batch": batch.replace("batch_", ""),
-                            "qps": batch_data.get("queries_per_second", 0),
-                            "avg_time": batch_data.get("average_time_seconds", 0)
-                            * 1000,  # Convert to ms
-                        }
-                    )
+            if query_perf:
+                node_type = "dev" if is_dev else "baseline"
+                for topk, topk_data in query_perf.items():
+                    for batch, batch_data in topk_data.items():
+                        fs_groups[config_key][node_type].append(
+                            {
+                                "hostname": hostname,
+                                "topk": topk.replace("topk_", ""),
+                                "batch": batch.replace("batch_", ""),
+                                "qps": batch_data.get("queries_per_second", 0),
+                                "avg_time": batch_data.get("average_time_seconds", 0)
+                                * 1000,
+                            }
+                        )
 
-        if not query_data:
+        if not fs_groups:
             return
 
-        df = pd.DataFrame(query_data)
+        # Create subplots for each filesystem config
+        n_configs = len(fs_groups)
+        fig_height = max(8, 4 * n_configs)
+        fig, axes = plt.subplots(n_configs, 2, figsize=(16, fig_height))
 
-        # Create subplots
-        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
+        if n_configs == 1:
+            axes = axes.reshape(1, -1)
 
-        # QPS heatmap
-        qps_pivot = df.pivot_table(
-            values="qps", index="topk", columns="batch", aggfunc="mean"
-        )
-        sns.heatmap(qps_pivot, annot=True, fmt=".1f", ax=ax1, cmap="YlOrRd")
-        ax1.set_title("Queries Per Second (QPS)")
-        ax1.set_xlabel("Batch Size")
-        ax1.set_ylabel("Top-K")
-
-        # Latency heatmap
-        latency_pivot = df.pivot_table(
-            values="avg_time", index="topk", columns="batch", aggfunc="mean"
-        )
-        sns.heatmap(latency_pivot, annot=True, fmt=".1f", ax=ax2, cmap="YlOrRd")
-        ax2.set_title("Average Query Latency (ms)")
-        ax2.set_xlabel("Batch Size")
-        ax2.set_ylabel("Top-K")
+        for idx, (config_key, data) in enumerate(sorted(fs_groups.items())):
+            # Create DataFrames for baseline and dev
+            baseline_df = (
+                pd.DataFrame(data["baseline"]) if data["baseline"] else pd.DataFrame()
+            )
+            dev_df = pd.DataFrame(data["dev"]) if data["dev"] else pd.DataFrame()
+
+            # Baseline QPS heatmap
+            ax_base = axes[idx][0]
+            if not baseline_df.empty:
+                baseline_pivot = baseline_df.pivot_table(
+                    values="qps", index="topk", columns="batch", aggfunc="mean"
+                )
+                sns.heatmap(
+                    baseline_pivot,
+                    annot=True,
+                    fmt=".1f",
+                    ax=ax_base,
+                    cmap="Greens",
+                    cbar_kws={"label": "QPS"},
+                )
+                ax_base.set_title(f"{config_key.upper()} - Baseline QPS")
+                ax_base.set_xlabel("Batch Size")
+                ax_base.set_ylabel("Top-K")
+            else:
+                ax_base.text(
+                    0.5,
+                    0.5,
+                    f"No baseline data for {config_key}",
+                    ha="center",
+                    va="center",
+                    transform=ax_base.transAxes,
+                )
+                ax_base.set_title(f"{config_key.upper()} - Baseline QPS")
 
+            # Dev QPS heatmap
+            ax_dev = axes[idx][1]
+            if not dev_df.empty:
+                dev_pivot = dev_df.pivot_table(
+                    values="qps", index="topk", columns="batch", aggfunc="mean"
+                )
+                sns.heatmap(
+                    dev_pivot,
+                    annot=True,
+                    fmt=".1f",
+                    ax=ax_dev,
+                    cmap="Blues",
+                    cbar_kws={"label": "QPS"},
+                )
+                ax_dev.set_title(f"{config_key.upper()} - Development QPS")
+                ax_dev.set_xlabel("Batch Size")
+                ax_dev.set_ylabel("Top-K")
+            else:
+                ax_dev.text(
+                    0.5,
+                    0.5,
+                    f"No dev data for {config_key}",
+                    ha="center",
+                    va="center",
+                    transform=ax_dev.transAxes,
+                )
+                ax_dev.set_title(f"{config_key.upper()} - Development QPS")
+
+        plt.suptitle("Query Performance: Baseline vs Development", fontsize=16, y=1.02)
         plt.tight_layout()
         output_file = os.path.join(
             self.output_dir,
@@ -796,32 +1219,101 @@ class ResultsAnalyzer:
         plt.close()
 
     def _plot_index_performance(self):
-        """Plot index creation performance"""
-        iterations = []
-        index_times = []
+        """Plot index creation performance comparing baseline vs dev"""
+        # Group by filesystem configuration
+        fs_groups = {}
+
+        for result in self.results_data:
+            hostname, is_dev = self._extract_node_info(result)
+            fs_type, block_size, config_key = self._extract_filesystem_config(result)
+
+            if config_key not in fs_groups:
+                fs_groups[config_key] = {"baseline": [], "dev": []}
 
-        for i, result in enumerate(self.results_data):
             index_perf = result.get("index_performance", {})
             if index_perf:
-                iterations.append(i + 1)
-                index_times.append(index_perf.get("creation_time_seconds", 0))
+                time = index_perf.get("creation_time_seconds", 0)
+                if time > 0:
+                    node_type = "dev" if is_dev else "baseline"
+                    fs_groups[config_key][node_type].append(time)
 
-        if not index_times:
+        if not fs_groups:
             return
 
-        plt.figure(figsize=(10, 6))
-        plt.bar(iterations, index_times, alpha=0.7, color="green")
-        plt.xlabel("Iteration")
-        plt.ylabel("Index Creation Time (seconds)")
-        plt.title("Index Creation Performance")
-        plt.grid(True, alpha=0.3)
-
-        # Add average line
-        avg_time = np.mean(index_times)
-        plt.axhline(
-            y=avg_time, color="red", linestyle="--", label=f"Average: {avg_time:.2f}s"
+        # Create comparison bar chart
+        fig, ax = plt.subplots(figsize=(14, 8))
+
+        configs = sorted(fs_groups.keys())
+        x = np.arange(len(configs))
+        width = 0.35
+
+        # Calculate averages for each config
+        baseline_avgs = []
+        dev_avgs = []
+        baseline_stds = []
+        dev_stds = []
+
+        for config in configs:
+            baseline_times = fs_groups[config]["baseline"]
+            dev_times = fs_groups[config]["dev"]
+
+            baseline_avgs.append(np.mean(baseline_times) if baseline_times else 0)
+            dev_avgs.append(np.mean(dev_times) if dev_times else 0)
+            baseline_stds.append(np.std(baseline_times) if baseline_times else 0)
+            dev_stds.append(np.std(dev_times) if dev_times else 0)
+
+        # Create bars
+        bars1 = ax.bar(
+            x - width / 2,
+            baseline_avgs,
+            width,
+            yerr=baseline_stds,
+            label="Baseline",
+            color="#4CAF50",
+            capsize=5,
+        )
+        bars2 = ax.bar(
+            x + width / 2,
+            dev_avgs,
+            width,
+            yerr=dev_stds,
+            label="Development",
+            color="#2196F3",
+            capsize=5,
         )
-        plt.legend()
+
+        # Add value labels on bars
+        for bar, val in zip(bars1, baseline_avgs):
+            if val > 0:
+                height = bar.get_height()
+                ax.text(
+                    bar.get_x() + bar.get_width() / 2.0,
+                    height,
+                    f"{val:.3f}s",
+                    ha="center",
+                    va="bottom",
+                    fontsize=9,
+                )
+
+        for bar, val in zip(bars2, dev_avgs):
+            if val > 0:
+                height = bar.get_height()
+                ax.text(
+                    bar.get_x() + bar.get_width() / 2.0,
+                    height,
+                    f"{val:.3f}s",
+                    ha="center",
+                    va="bottom",
+                    fontsize=9,
+                )
+
+        ax.set_xlabel("Filesystem Configuration", fontsize=12)
+        ax.set_ylabel("Index Creation Time (seconds)", fontsize=12)
+        ax.set_title("Index Creation Performance: Baseline vs Development", fontsize=14)
+        ax.set_xticks(x)
+        ax.set_xticklabels([c.upper() for c in configs], rotation=45, ha="right")
+        ax.legend(loc="upper right")
+        ax.grid(True, alpha=0.3, axis="y")
 
         output_file = os.path.join(
             self.output_dir,
@@ -833,61 +1325,148 @@ class ResultsAnalyzer:
         plt.close()
 
     def _plot_performance_matrix(self):
-        """Plot comprehensive performance comparison matrix"""
+        """Plot performance comparison matrix for each filesystem config"""
         if len(self.results_data) < 2:
             return
 
-        # Extract key metrics for comparison
-        metrics = []
-        for i, result in enumerate(self.results_data):
+        # Group by filesystem configuration
+        fs_metrics = {}
+
+        for result in self.results_data:
+            hostname, is_dev = self._extract_node_info(result)
+            fs_type, block_size, config_key = self._extract_filesystem_config(result)
+
+            if config_key not in fs_metrics:
+                fs_metrics[config_key] = {"baseline": [], "dev": []}
+
+            # Collect metrics
             insert_perf = result.get("insert_performance", {})
             index_perf = result.get("index_performance", {})
+            query_perf = result.get("query_performance", {})
 
             metric = {
-                "iteration": i + 1,
+                "hostname": hostname,
                 "insert_rate": insert_perf.get("vectors_per_second", 0),
                 "index_time": index_perf.get("creation_time_seconds", 0),
             }
 
-            # Add query metrics
-            query_perf = result.get("query_performance", {})
+            # Get representative query performance (topk_10, batch_1)
             if "topk_10" in query_perf and "batch_1" in query_perf["topk_10"]:
                 metric["query_qps"] = query_perf["topk_10"]["batch_1"].get(
                     "queries_per_second", 0
                 )
+            else:
+                metric["query_qps"] = 0
 
-            metrics.append(metric)
+            node_type = "dev" if is_dev else "baseline"
+            fs_metrics[config_key][node_type].append(metric)
 
-        df = pd.DataFrame(metrics)
+        if not fs_metrics:
+            return
 
-        # Normalize metrics for comparison
-        numeric_cols = ["insert_rate", "index_time", "query_qps"]
-        for col in numeric_cols:
-            if col in df.columns:
-                df[f"{col}_norm"] = (df[col] - df[col].min()) / (
-                    df[col].max() - df[col].min() + 1e-6
-                )
+        # Create subplots for each filesystem
+        n_configs = len(fs_metrics)
+        n_cols = min(3, n_configs)
+        n_rows = (n_configs + n_cols - 1) // n_cols
+
+        fig, axes = plt.subplots(n_rows, n_cols, figsize=(n_cols * 6, n_rows * 5))
+        if n_rows == 1 and n_cols == 1:
+            axes = [[axes]]
+        elif n_rows == 1:
+            axes = [axes]
+        elif n_cols == 1:
+            axes = [[ax] for ax in axes]
+
+        for idx, (config_key, data) in enumerate(sorted(fs_metrics.items())):
+            row = idx // n_cols
+            col = idx % n_cols
+            ax = axes[row][col]
+
+            # Calculate averages
+            baseline_metrics = data["baseline"]
+            dev_metrics = data["dev"]
+
+            if baseline_metrics and dev_metrics:
+                categories = ["Insert Rate\n(vec/s)", "Index Time\n(s)", "Query QPS"]
+
+                baseline_avg = [
+                    np.mean([m["insert_rate"] for m in baseline_metrics]),
+                    np.mean([m["index_time"] for m in baseline_metrics]),
+                    np.mean([m["query_qps"] for m in baseline_metrics]),
+                ]
 
-        # Create radar chart
-        fig, ax = plt.subplots(figsize=(10, 8), subplot_kw=dict(projection="polar"))
+                dev_avg = [
+                    np.mean([m["insert_rate"] for m in dev_metrics]),
+                    np.mean([m["index_time"] for m in dev_metrics]),
+                    np.mean([m["query_qps"] for m in dev_metrics]),
+                ]
 
-        angles = np.linspace(0, 2 * np.pi, len(numeric_cols), endpoint=False).tolist()
-        angles += angles[:1]  # Complete the circle
+                x = np.arange(len(categories))
+                width = 0.35
 
-        for i, row in df.iterrows():
-            values = [row.get(f"{col}_norm", 0) for col in numeric_cols]
-            values += values[:1]  # Complete the circle
+                bars1 = ax.bar(
+                    x - width / 2,
+                    baseline_avg,
+                    width,
+                    label="Baseline",
+                    color="#4CAF50",
+                )
+                bars2 = ax.bar(
+                    x + width / 2, dev_avg, width, label="Development", color="#2196F3"
+                )
 
-            ax.plot(
-                angles, values, "o-", linewidth=2, label=f'Iteration {row["iteration"]}'
-            )
-            ax.fill(angles, values, alpha=0.25)
+                # Add value labels
+                for bar, val in zip(bars1, baseline_avg):
+                    height = bar.get_height()
+                    ax.text(
+                        bar.get_x() + bar.get_width() / 2.0,
+                        height,
+                        f"{val:.0f}" if val > 100 else f"{val:.2f}",
+                        ha="center",
+                        va="bottom",
+                        fontsize=8,
+                    )
 
-        ax.set_xticks(angles[:-1])
-        ax.set_xticklabels(["Insert Rate", "Index Time (inv)", "Query QPS"])
-        ax.set_ylim(0, 1)
-        ax.set_title("Performance Comparison Matrix (Normalized)", y=1.08)
-        ax.legend(loc="upper right", bbox_to_anchor=(1.3, 1.0))
+                for bar, val in zip(bars2, dev_avg):
+                    height = bar.get_height()
+                    ax.text(
+                        bar.get_x() + bar.get_width() / 2.0,
+                        height,
+                        f"{val:.0f}" if val > 100 else f"{val:.2f}",
+                        ha="center",
+                        va="bottom",
+                        fontsize=8,
+                    )
+
+                ax.set_xlabel("Metrics")
+                ax.set_ylabel("Value")
+                ax.set_title(f"{config_key.upper()}")
+                ax.set_xticks(x)
+                ax.set_xticklabels(categories)
+                ax.legend(loc="upper right", fontsize=8)
+                ax.grid(True, alpha=0.3, axis="y")
+            else:
+                ax.text(
+                    0.5,
+                    0.5,
+                    f"Insufficient data\nfor {config_key}",
+                    ha="center",
+                    va="center",
+                    transform=ax.transAxes,
+                )
+                ax.set_title(f"{config_key.upper()}")
+
+        # Hide unused subplots
+        for idx in range(n_configs, n_rows * n_cols):
+            row = idx // n_cols
+            col = idx % n_cols
+            axes[row][col].set_visible(False)
+
+        plt.suptitle(
+            "Performance Comparison Matrix: Baseline vs Development",
+            fontsize=14,
+            y=1.02,
+        )
 
         output_file = os.path.join(
             self.output_dir,
@@ -898,6 +1477,149 @@ class ResultsAnalyzer:
         )
         plt.close()
 
+    def _plot_filesystem_comparison(self):
+        """Plot node performance comparison chart"""
+        if len(self.results_data) < 2:
+            return
+
+        # Group results by node
+        node_performance = {}
+
+        for result in self.results_data:
+            hostname, is_dev = self._extract_node_info(result)
+
+            if hostname not in node_performance:
+                node_performance[hostname] = {
+                    "insert_rates": [],
+                    "index_times": [],
+                    "query_qps": [],
+                    "is_dev": is_dev,
+                }
+
+            # Collect metrics
+            insert_perf = result.get("insert_performance", {})
+            if insert_perf:
+                node_performance[hostname]["insert_rates"].append(
+                    insert_perf.get("vectors_per_second", 0)
+                )
+
+            index_perf = result.get("index_performance", {})
+            if index_perf:
+                node_performance[hostname]["index_times"].append(
+                    index_perf.get("creation_time_seconds", 0)
+                )
+
+            # Get top-10 batch-1 query performance as representative
+            query_perf = result.get("query_performance", {})
+            if "topk_10" in query_perf and "batch_1" in query_perf["topk_10"]:
+                qps = query_perf["topk_10"]["batch_1"].get("queries_per_second", 0)
+                node_performance[hostname]["query_qps"].append(qps)
+
+        # Only create comparison if we have multiple nodes
+        if len(node_performance) > 1:
+            # Calculate averages
+            node_metrics = {}
+            for hostname, perf_data in node_performance.items():
+                node_metrics[hostname] = {
+                    "avg_insert_rate": (
+                        np.mean(perf_data["insert_rates"])
+                        if perf_data["insert_rates"]
+                        else 0
+                    ),
+                    "avg_index_time": (
+                        np.mean(perf_data["index_times"])
+                        if perf_data["index_times"]
+                        else 0
+                    ),
+                    "avg_query_qps": (
+                        np.mean(perf_data["query_qps"]) if perf_data["query_qps"] else 0
+                    ),
+                    "is_dev": perf_data["is_dev"],
+                }
+
+            # Create comparison bar chart with more space
+            fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(24, 8))
+
+            # Sort nodes with baseline first
+            sorted_nodes = sorted(
+                node_metrics.items(), key=lambda x: (x[1]["is_dev"], x[0])
+            )
+            node_names = [hostname for hostname, _ in sorted_nodes]
+
+            # Use different colors for baseline vs dev
+            colors = [
+                "#4CAF50" if not node_metrics[hostname]["is_dev"] else "#2196F3"
+                for hostname in node_names
+            ]
+
+            # Add labels for clarity
+            labels = [
+                f"{hostname}\n({'Dev' if node_metrics[hostname]['is_dev'] else 'Baseline'})"
+                for hostname in node_names
+            ]
+
+            # Insert rate comparison
+            insert_rates = [
+                node_metrics[hostname]["avg_insert_rate"] for hostname in node_names
+            ]
+            bars1 = ax1.bar(labels, insert_rates, color=colors)
+            ax1.set_title("Average Milvus Insert Rate by Node")
+            ax1.set_ylabel("Vectors/Second")
+            # Rotate labels for better readability
+            ax1.set_xticklabels(labels, rotation=45, ha="right", fontsize=8)
+
+            # Index time comparison (lower is better)
+            index_times = [
+                node_metrics[hostname]["avg_index_time"] for hostname in node_names
+            ]
+            bars2 = ax2.bar(labels, index_times, color=colors)
+            ax2.set_title("Average Milvus Index Time by Node")
+            ax2.set_ylabel("Seconds (Lower is Better)")
+            ax2.set_xticklabels(labels, rotation=45, ha="right", fontsize=8)
+
+            # Query QPS comparison
+            query_qps = [
+                node_metrics[hostname]["avg_query_qps"] for hostname in node_names
+            ]
+            bars3 = ax3.bar(labels, query_qps, color=colors)
+            ax3.set_title("Average Milvus Query QPS by Node")
+            ax3.set_ylabel("Queries/Second")
+            ax3.set_xticklabels(labels, rotation=45, ha="right", fontsize=8)
+
+            # Add value labels on bars
+            for bars, values in [
+                (bars1, insert_rates),
+                (bars2, index_times),
+                (bars3, query_qps),
+            ]:
+                for bar, value in zip(bars, values):
+                    height = bar.get_height()
+                    ax = bar.axes
+                    ax.text(
+                        bar.get_x() + bar.get_width() / 2.0,
+                        height + height * 0.01,
+                        f"{value:.1f}",
+                        ha="center",
+                        va="bottom",
+                        fontsize=10,
+                    )
+
+            plt.suptitle(
+                "Milvus Performance Comparison: Baseline vs Development Nodes",
+                fontsize=16,
+                y=1.02,
+            )
+            plt.tight_layout()
+
+            output_file = os.path.join(
+                self.output_dir,
+                f"filesystem_comparison.{self.config.get('graph_format', 'png')}",
+            )
+            plt.savefig(
+                output_file, dpi=self.config.get("graph_dpi", 300), bbox_inches="tight"
+            )
+            plt.close()
+
     def analyze(self) -> bool:
         """Run complete analysis"""
         self.logger.info("Starting results analysis...")
diff --git a/workflows/ai/scripts/generate_graphs.py b/workflows/ai/scripts/generate_graphs.py
index 2e183e86..fafc62bf 100755
--- a/workflows/ai/scripts/generate_graphs.py
+++ b/workflows/ai/scripts/generate_graphs.py
@@ -9,7 +9,6 @@ import sys
 import glob
 import numpy as np
 import matplotlib
-
 matplotlib.use("Agg")  # Use non-interactive backend
 import matplotlib.pyplot as plt
 from datetime import datetime
@@ -17,6 +16,66 @@ from pathlib import Path
 from collections import defaultdict
 
 
+def _extract_filesystem_config(result):
+    """Extract filesystem type and block size from result data.
+    Returns (fs_type, block_size, config_key)"""
+    filename = result.get("_file", "")
+
+    # Primary: Extract filesystem type from filename (more reliable than JSON)
+    fs_type = "unknown"
+    block_size = "default"
+
+    if "xfs" in filename:
+        fs_type = "xfs"
+        # Check larger sizes first to avoid substring matches
+        if "64k" in filename and "64k-" in filename:
+            block_size = "64k"
+        elif "32k" in filename and "32k-" in filename:
+            block_size = "32k"
+        elif "16k" in filename and "16k-" in filename:
+            block_size = "16k"
+        elif "4k" in filename and "4k-" in filename:
+            block_size = "4k"
+    elif "ext4" in filename:
+        fs_type = "ext4"
+        if "4k" in filename and "4k-" in filename:
+            block_size = "4k"
+        elif "16k" in filename and "16k-" in filename:
+            block_size = "16k"
+    elif "btrfs" in filename:
+        fs_type = "btrfs"
+
+    # Fallback: Check JSON data if filename parsing failed
+    if fs_type == "unknown":
+        fs_type = result.get("filesystem", "unknown")
+
+    # Create descriptive config key
+    config_key = f"{fs_type}-{block_size}" if block_size != "default" else fs_type
+    return fs_type, block_size, config_key
+
+
+def _extract_node_info(result):
+    """Extract node hostname and determine if it's a dev node.
+    Returns (hostname, is_dev_node)"""
+    # Get hostname from system_info (preferred) or fall back to filename
+    system_info = result.get("system_info", {})
+    hostname = system_info.get("hostname", "")
+    
+    # If no hostname in system_info, try extracting from filename
+    if not hostname:
+        filename = result.get("_file", "")
+        # Remove results_ prefix and .json suffix
+        hostname = filename.replace("results_", "").replace(".json", "")
+        # Remove iteration number if present (_1, _2, etc.)
+        if "_" in hostname and hostname.split("_")[-1].isdigit():
+            hostname = "_".join(hostname.split("_")[:-1])
+    
+    # Determine if this is a dev node
+    is_dev = hostname.endswith("-dev")
+    
+    return hostname, is_dev
+
+
 def load_results(results_dir):
     """Load all JSON result files from the directory"""
     results = []
@@ -27,63 +86,8 @@ def load_results(results_dir):
         try:
             with open(json_file, "r") as f:
                 data = json.load(f)
-                # Extract filesystem info - prefer from JSON data over filename
-                filename = os.path.basename(json_file)
-
-                # First, try to get filesystem from the JSON data itself
-                fs_type = data.get("filesystem", None)
-
-                # If not in JSON, try to parse from filename (backwards compatibility)
-                if not fs_type:
-                    parts = (
-                        filename.replace("results_", "").replace(".json", "").split("-")
-                    )
-
-                    # Parse host info
-                    if "debian13-ai-" in filename:
-                        host_parts = (
-                            filename.replace("results_debian13-ai-", "")
-                            .replace("_1.json", "")
-                            .replace("_2.json", "")
-                            .replace("_3.json", "")
-                            .split("-")
-                        )
-                        if "xfs" in host_parts[0]:
-                            fs_type = "xfs"
-                            # Extract block size (e.g., "4k", "16k", etc.)
-                            block_size = (
-                                host_parts[1] if len(host_parts) > 1 else "unknown"
-                            )
-                        elif "ext4" in host_parts[0]:
-                            fs_type = "ext4"
-                            block_size = host_parts[1] if len(host_parts) > 1 else "4k"
-                        elif "btrfs" in host_parts[0]:
-                            fs_type = "btrfs"
-                            block_size = "default"
-                        else:
-                            fs_type = "unknown"
-                            block_size = "unknown"
-                    else:
-                        fs_type = "unknown"
-                        block_size = "unknown"
-                else:
-                    # If filesystem came from JSON, set appropriate block size
-                    if fs_type == "btrfs":
-                        block_size = "default"
-                    elif fs_type in ["ext4", "xfs"]:
-                        block_size = data.get("block_size", "4k")
-                    else:
-                        block_size = data.get("block_size", "default")
-
-                is_dev = "dev" in filename
-
-                # Use filesystem from JSON if available, otherwise use parsed value
-                if "filesystem" not in data:
-                    data["filesystem"] = fs_type
-                data["block_size"] = block_size
-                data["is_dev"] = is_dev
-                data["filename"] = filename
-
+                # Add filename for filesystem detection
+                data["_file"] = os.path.basename(json_file)
                 results.append(data)
         except Exception as e:
             print(f"Error loading {json_file}: {e}")
@@ -91,1023 +95,240 @@ def load_results(results_dir):
     return results
 
 
-def create_filesystem_comparison_chart(results, output_dir):
-    """Create a bar chart comparing performance across filesystems"""
-    # Group by filesystem and baseline/dev
-    fs_data = defaultdict(lambda: {"baseline": [], "dev": []})
-
-    for result in results:
-        fs = result.get("filesystem", "unknown")
-        category = "dev" if result.get("is_dev", False) else "baseline"
-
-        # Extract actual performance data from results
-        if "insert_performance" in result:
-            insert_qps = result["insert_performance"].get("vectors_per_second", 0)
-        else:
-            insert_qps = 0
-        fs_data[fs][category].append(insert_qps)
-
-    # Prepare data for plotting
-    filesystems = list(fs_data.keys())
-    baseline_means = [
-        np.mean(fs_data[fs]["baseline"]) if fs_data[fs]["baseline"] else 0
-        for fs in filesystems
-    ]
-    dev_means = [
-        np.mean(fs_data[fs]["dev"]) if fs_data[fs]["dev"] else 0 for fs in filesystems
-    ]
-
-    x = np.arange(len(filesystems))
-    width = 0.35
-
-    fig, ax = plt.subplots(figsize=(10, 6))
-    baseline_bars = ax.bar(
-        x - width / 2, baseline_means, width, label="Baseline", color="#1f77b4"
-    )
-    dev_bars = ax.bar(
-        x + width / 2, dev_means, width, label="Development", color="#ff7f0e"
-    )
-
-    ax.set_xlabel("Filesystem")
-    ax.set_ylabel("Insert QPS")
-    ax.set_title("Vector Database Performance by Filesystem")
-    ax.set_xticks(x)
-    ax.set_xticklabels(filesystems)
-    ax.legend()
-    ax.grid(True, alpha=0.3)
-
-    # Add value labels on bars
-    for bars in [baseline_bars, dev_bars]:
-        for bar in bars:
-            height = bar.get_height()
-            if height > 0:
-                ax.annotate(
-                    f"{height:.0f}",
-                    xy=(bar.get_x() + bar.get_width() / 2, height),
-                    xytext=(0, 3),
-                    textcoords="offset points",
-                    ha="center",
-                    va="bottom",
-                )
-
-    plt.tight_layout()
-    plt.savefig(os.path.join(output_dir, "filesystem_comparison.png"), dpi=150)
-    plt.close()
-
-
-def create_block_size_analysis(results, output_dir):
-    """Create analysis for different block sizes (XFS specific)"""
-    # Filter XFS results
-    xfs_results = [r for r in results if r.get("filesystem") == "xfs"]
-
-    if not xfs_results:
+def create_simple_performance_trends(results, output_dir):
+    """Create multi-node performance trends chart"""
+    if not results:
         return
 
-    # Group by block size
-    block_size_data = defaultdict(lambda: {"baseline": [], "dev": []})
-
-    for result in xfs_results:
-        block_size = result.get("block_size", "unknown")
-        category = "dev" if result.get("is_dev", False) else "baseline"
-        if "insert_performance" in result:
-            insert_qps = result["insert_performance"].get("vectors_per_second", 0)
-        else:
-            insert_qps = 0
-        block_size_data[block_size][category].append(insert_qps)
-
-    # Sort block sizes
-    block_sizes = sorted(
-        block_size_data.keys(),
-        key=lambda x: (
-            int(x.replace("k", "").replace("s", ""))
-            if x not in ["unknown", "default"]
-            else 0
-        ),
-    )
-
-    # Create grouped bar chart
-    baseline_means = [
-        (
-            np.mean(block_size_data[bs]["baseline"])
-            if block_size_data[bs]["baseline"]
-            else 0
-        )
-        for bs in block_sizes
-    ]
-    dev_means = [
-        np.mean(block_size_data[bs]["dev"]) if block_size_data[bs]["dev"] else 0
-        for bs in block_sizes
-    ]
-
-    x = np.arange(len(block_sizes))
-    width = 0.35
-
-    fig, ax = plt.subplots(figsize=(12, 6))
-    baseline_bars = ax.bar(
-        x - width / 2, baseline_means, width, label="Baseline", color="#2ca02c"
-    )
-    dev_bars = ax.bar(
-        x + width / 2, dev_means, width, label="Development", color="#d62728"
-    )
-
-    ax.set_xlabel("Block Size")
-    ax.set_ylabel("Insert QPS")
-    ax.set_title("XFS Performance by Block Size")
-    ax.set_xticks(x)
-    ax.set_xticklabels(block_sizes)
-    ax.legend()
-    ax.grid(True, alpha=0.3)
-
-    # Add value labels
-    for bars in [baseline_bars, dev_bars]:
-        for bar in bars:
-            height = bar.get_height()
-            if height > 0:
-                ax.annotate(
-                    f"{height:.0f}",
-                    xy=(bar.get_x() + bar.get_width() / 2, height),
-                    xytext=(0, 3),
-                    textcoords="offset points",
-                    ha="center",
-                    va="bottom",
-                )
-
-    plt.tight_layout()
-    plt.savefig(os.path.join(output_dir, "xfs_block_size_analysis.png"), dpi=150)
-    plt.close()
-
-
-def create_heatmap_analysis(results, output_dir):
-    """Create a heatmap showing AVERAGE performance across all test iterations"""
-    # Group data by configuration and version, collecting ALL values for averaging
-    config_data = defaultdict(
-        lambda: {
-            "baseline": {"insert": [], "query": [], "count": 0},
-            "dev": {"insert": [], "query": [], "count": 0},
-        }
-    )
+    # Group results by node
+    node_performance = defaultdict(lambda: {
+        "insert_rates": [],
+        "insert_times": [],
+        "iterations": [],
+        "is_dev": False,
+    })
 
     for result in results:
-        fs = result.get("filesystem", "unknown")
-        block_size = result.get("block_size", "default")
-        config = f"{fs}-{block_size}"
-        version = "dev" if result.get("is_dev", False) else "baseline"
-
-        # Get actual insert performance
-        if "insert_performance" in result:
-            insert_qps = result["insert_performance"].get("vectors_per_second", 0)
-        else:
-            insert_qps = 0
-
-        # Calculate average query QPS
-        query_qps = 0
-        if "query_performance" in result:
-            qp = result["query_performance"]
-            total_qps = 0
-            count = 0
-            for topk_key in ["topk_1", "topk_10", "topk_100"]:
-                if topk_key in qp:
-                    for batch_key in ["batch_1", "batch_10", "batch_100"]:
-                        if batch_key in qp[topk_key]:
-                            total_qps += qp[topk_key][batch_key].get(
-                                "queries_per_second", 0
-                            )
-                            count += 1
-            if count > 0:
-                query_qps = total_qps / count
-
-        # Collect all values for averaging
-        config_data[config][version]["insert"].append(insert_qps)
-        config_data[config][version]["query"].append(query_qps)
-        config_data[config][version]["count"] += 1
-
-    # Sort configurations
-    configs = sorted(config_data.keys())
-
-    # Calculate averages for heatmap
-    insert_baseline = []
-    insert_dev = []
-    query_baseline = []
-    query_dev = []
-    iteration_counts = {"baseline": 0, "dev": 0}
-
-    for c in configs:
-        # Calculate average insert QPS
-        baseline_insert_vals = config_data[c]["baseline"]["insert"]
-        insert_baseline.append(
-            np.mean(baseline_insert_vals) if baseline_insert_vals else 0
-        )
-
-        dev_insert_vals = config_data[c]["dev"]["insert"]
-        insert_dev.append(np.mean(dev_insert_vals) if dev_insert_vals else 0)
-
-        # Calculate average query QPS
-        baseline_query_vals = config_data[c]["baseline"]["query"]
-        query_baseline.append(
-            np.mean(baseline_query_vals) if baseline_query_vals else 0
-        )
-
-        dev_query_vals = config_data[c]["dev"]["query"]
-        query_dev.append(np.mean(dev_query_vals) if dev_query_vals else 0)
-
-        # Track iteration counts
-        iteration_counts["baseline"] = max(
-            iteration_counts["baseline"], len(baseline_insert_vals)
-        )
-        iteration_counts["dev"] = max(iteration_counts["dev"], len(dev_insert_vals))
-
-    # Create figure with custom heatmap
-    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 8))
-
-    # Create data matrices
-    insert_data = np.array([insert_baseline, insert_dev]).T
-    query_data = np.array([query_baseline, query_dev]).T
-
-    # Insert QPS heatmap
-    im1 = ax1.imshow(insert_data, cmap="YlOrRd", aspect="auto")
-    ax1.set_xticks([0, 1])
-    ax1.set_xticklabels(["Baseline", "Development"])
-    ax1.set_yticks(range(len(configs)))
-    ax1.set_yticklabels(configs)
-    ax1.set_title(
-        f"Insert Performance - AVERAGE across {iteration_counts['baseline']} iterations\n(1M vectors, 128 dims, HNSW index)"
-    )
-    ax1.set_ylabel("Configuration")
-
-    # Add text annotations with dynamic color based on background
-    # Get the colormap to determine actual colors
-    cmap1 = plt.cm.YlOrRd
-    norm1 = plt.Normalize(vmin=insert_data.min(), vmax=insert_data.max())
-
-    for i in range(len(configs)):
-        for j in range(2):
-            # Get the actual color from the colormap
-            val = insert_data[i, j]
-            rgba = cmap1(norm1(val))
-            # Calculate luminance using standard formula
-            # Perceived luminance: 0.299*R + 0.587*G + 0.114*B
-            luminance = 0.299 * rgba[0] + 0.587 * rgba[1] + 0.114 * rgba[2]
-            # Use white text on dark backgrounds (low luminance)
-            text_color = "white" if luminance < 0.5 else "black"
+        hostname, is_dev = _extract_node_info(result)
+        
+        if hostname not in node_performance:
+            node_performance[hostname] = {
+                "insert_rates": [],
+                "insert_times": [],
+                "iterations": [],
+                "is_dev": is_dev,
+            }
 
-            # Show average value with indicator
-            text = ax1.text(
-                j,
-                i,
-                f"{int(insert_data[i, j])}\n(avg)",
-                ha="center",
-                va="center",
-                color=text_color,
-                fontweight="bold",
-                fontsize=9,
+        insert_perf = result.get("insert_performance", {})
+        if insert_perf:
+            node_performance[hostname]["insert_rates"].append(
+                insert_perf.get("vectors_per_second", 0)
             )
-
-    # Add colorbar
-    cbar1 = plt.colorbar(im1, ax=ax1)
-    cbar1.set_label("Insert QPS")
-
-    # Query QPS heatmap
-    im2 = ax2.imshow(query_data, cmap="YlGnBu", aspect="auto")
-    ax2.set_xticks([0, 1])
-    ax2.set_xticklabels(["Baseline", "Development"])
-    ax2.set_yticks(range(len(configs)))
-    ax2.set_yticklabels(configs)
-    ax2.set_title(
-        f"Query Performance - AVERAGE across {iteration_counts['dev']} iterations\n(1M vectors, 128 dims, HNSW index)"
-    )
-
-    # Add text annotations with dynamic color based on background
-    # Get the colormap to determine actual colors
-    cmap2 = plt.cm.YlGnBu
-    norm2 = plt.Normalize(vmin=query_data.min(), vmax=query_data.max())
-
-    for i in range(len(configs)):
-        for j in range(2):
-            # Get the actual color from the colormap
-            val = query_data[i, j]
-            rgba = cmap2(norm2(val))
-            # Calculate luminance using standard formula
-            # Perceived luminance: 0.299*R + 0.587*G + 0.114*B
-            luminance = 0.299 * rgba[0] + 0.587 * rgba[1] + 0.114 * rgba[2]
-            # Use white text on dark backgrounds (low luminance)
-            text_color = "white" if luminance < 0.5 else "black"
-
-            # Show average value with indicator
-            text = ax2.text(
-                j,
-                i,
-                f"{int(query_data[i, j])}\n(avg)",
-                ha="center",
-                va="center",
-                color=text_color,
-                fontweight="bold",
-                fontsize=9,
+            fs_performance[config_key]["insert_times"].append(
+                insert_perf.get("total_time_seconds", 0)
+            )
+            fs_performance[config_key]["iterations"].append(
+                len(fs_performance[config_key]["insert_rates"])
             )
 
-    # Add colorbar
-    cbar2 = plt.colorbar(im2, ax=ax2)
-    cbar2.set_label("Query QPS")
-
-    # Add overall figure title
-    fig.suptitle(
-        "Performance Heatmap - Showing AVERAGES across Multiple Test Iterations",
-        fontsize=14,
-        fontweight="bold",
-        y=1.02,
-    )
-
-    plt.tight_layout()
-    plt.savefig(
-        os.path.join(output_dir, "performance_heatmap.png"),
-        dpi=150,
-        bbox_inches="tight",
-    )
-    plt.close()
-
-
-def create_performance_trends(results, output_dir):
-    """Create line charts showing performance trends"""
-    # Group by filesystem type
-    fs_types = defaultdict(
-        lambda: {
-            "configs": [],
-            "baseline_insert": [],
-            "dev_insert": [],
-            "baseline_query": [],
-            "dev_query": [],
-        }
-    )
-
-    for result in results:
-        fs = result.get("filesystem", "unknown")
-        block_size = result.get("block_size", "default")
-        config = f"{block_size}"
-
-        if config not in fs_types[fs]["configs"]:
-            fs_types[fs]["configs"].append(config)
-            fs_types[fs]["baseline_insert"].append(0)
-            fs_types[fs]["dev_insert"].append(0)
-            fs_types[fs]["baseline_query"].append(0)
-            fs_types[fs]["dev_query"].append(0)
-
-        idx = fs_types[fs]["configs"].index(config)
-
-        # Calculate average query QPS from all test configurations
-        query_qps = 0
-        if "query_performance" in result:
-            qp = result["query_performance"]
-            total_qps = 0
-            count = 0
-            for topk_key in ["topk_1", "topk_10", "topk_100"]:
-                if topk_key in qp:
-                    for batch_key in ["batch_1", "batch_10", "batch_100"]:
-                        if batch_key in qp[topk_key]:
-                            total_qps += qp[topk_key][batch_key].get(
-                                "queries_per_second", 0
-                            )
-                            count += 1
-            if count > 0:
-                query_qps = total_qps / count
-
-        if result.get("is_dev", False):
-            if "insert_performance" in result:
-                fs_types[fs]["dev_insert"][idx] = result["insert_performance"].get(
-                    "vectors_per_second", 0
-                )
-            fs_types[fs]["dev_query"][idx] = query_qps
-        else:
-            if "insert_performance" in result:
-                fs_types[fs]["baseline_insert"][idx] = result["insert_performance"].get(
-                    "vectors_per_second", 0
-                )
-            fs_types[fs]["baseline_query"][idx] = query_qps
-
-    # Create separate plots for each filesystem
-    for fs, data in fs_types.items():
-        if not data["configs"]:
-            continue
-
-        fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 10))
-
-        x = range(len(data["configs"]))
-
-        # Insert performance
-        ax1.plot(
-            x,
-            data["baseline_insert"],
-            "o-",
-            label="Baseline",
-            linewidth=2,
-            markersize=8,
-        )
-        ax1.plot(
-            x, data["dev_insert"], "s-", label="Development", linewidth=2, markersize=8
-        )
-        ax1.set_xlabel("Configuration")
-        ax1.set_ylabel("Insert QPS")
-        ax1.set_title(f"{fs.upper()} Insert Performance")
-        ax1.set_xticks(x)
-        ax1.set_xticklabels(data["configs"])
-        ax1.legend()
+    # Check if we have multi-filesystem data
+    if len(fs_performance) > 1:
+        # Multi-filesystem mode: separate lines for each filesystem
+        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
+        
+        colors = ["b", "r", "g", "m", "c", "y", "k"]
+        color_idx = 0
+        
+        for config_key, perf_data in fs_performance.items():
+            if not perf_data["insert_rates"]:
+                continue
+                
+            color = colors[color_idx % len(colors)]
+            iterations = list(range(1, len(perf_data["insert_rates"]) + 1))
+            
+            # Plot insert rate  
+            ax1.plot(
+                iterations,
+                perf_data["insert_rates"], 
+                f"{color}-o",
+                linewidth=2,
+                markersize=6,
+                label=config_key.upper(),
+            )
+            
+            # Plot insert time
+            ax2.plot(
+                iterations,
+                perf_data["insert_times"],
+                f"{color}-o", 
+                linewidth=2,
+                markersize=6,
+                label=config_key.upper(),
+            )
+            
+            color_idx += 1
+            
+        ax1.set_xlabel("Iteration")
+        ax1.set_ylabel("Vectors/Second")
+        ax1.set_title("Milvus Insert Rate by Storage Filesystem")
         ax1.grid(True, alpha=0.3)
-
-        # Query performance
-        ax2.plot(
-            x, data["baseline_query"], "o-", label="Baseline", linewidth=2, markersize=8
-        )
-        ax2.plot(
-            x, data["dev_query"], "s-", label="Development", linewidth=2, markersize=8
-        )
-        ax2.set_xlabel("Configuration")
-        ax2.set_ylabel("Query QPS")
-        ax2.set_title(f"{fs.upper()} Query Performance")
-        ax2.set_xticks(x)
-        ax2.set_xticklabels(data["configs"])
-        ax2.legend()
+        ax1.legend()
+        
+        ax2.set_xlabel("Iteration")
+        ax2.set_ylabel("Total Time (seconds)")
+        ax2.set_title("Milvus Insert Time by Storage Filesystem")
         ax2.grid(True, alpha=0.3)
-
-        plt.tight_layout()
-        plt.savefig(os.path.join(output_dir, f"{fs}_performance_trends.png"), dpi=150)
-        plt.close()
-
-
-def create_simple_performance_trends(results, output_dir):
-    """Create a simple performance trends chart for basic Milvus testing"""
-    if not results:
-        return
-
-    # Extract configuration from first result for display
-    config_text = ""
-    if results:
-        first_result = results[0]
-        if "config" in first_result:
-            cfg = first_result["config"]
-            config_text = (
-                f"Test Config:\n"
-                f"• {cfg.get('vector_dataset_size', 'N/A'):,} vectors/iteration\n"
-                f"• {cfg.get('vector_dimensions', 'N/A')} dimensions\n"
-                f"• {cfg.get('index_type', 'N/A')} index"
+        ax2.legend()
+    else:
+        # Single filesystem mode: original behavior
+        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
+        
+        # Extract insert data from single filesystem
+        config_key = list(fs_performance.keys())[0] if fs_performance else None
+        if config_key:
+            perf_data = fs_performance[config_key]
+            iterations = list(range(1, len(perf_data["insert_rates"]) + 1))
+            
+            # Plot insert rate
+            ax1.plot(
+                iterations,
+                perf_data["insert_rates"],
+                "b-o",
+                linewidth=2,
+                markersize=6,
             )
-
-    # Separate baseline and dev results
-    baseline_results = [r for r in results if not r.get("is_dev", False)]
-    dev_results = [r for r in results if r.get("is_dev", False)]
-
-    if not baseline_results and not dev_results:
-        return
-
-    fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 10))
-
-    # Prepare data
-    baseline_insert = []
-    baseline_query = []
-    dev_insert = []
-    dev_query = []
-    labels = []
-
-    # Process baseline results
-    for i, result in enumerate(baseline_results):
-        if "insert_performance" in result:
-            baseline_insert.append(
-                result["insert_performance"].get("vectors_per_second", 0)
+            ax1.set_xlabel("Iteration")
+            ax1.set_ylabel("Vectors/Second") 
+            ax1.set_title("Vector Insert Rate Performance")
+            ax1.grid(True, alpha=0.3)
+            
+            # Plot insert time
+            ax2.plot(
+                iterations,
+                perf_data["insert_times"],
+                "r-o",
+                linewidth=2,
+                markersize=6,
             )
-        else:
-            baseline_insert.append(0)
-
-        # Calculate average query QPS
-        query_qps = 0
-        if "query_performance" in result:
-            qp = result["query_performance"]
-            total_qps = 0
-            count = 0
-            for topk_key in ["topk_1", "topk_10", "topk_100"]:
-                if topk_key in qp:
-                    for batch_key in ["batch_1", "batch_10", "batch_100"]:
-                        if batch_key in qp[topk_key]:
-                            total_qps += qp[topk_key][batch_key].get(
-                                "queries_per_second", 0
-                            )
-                            count += 1
-            if count > 0:
-                query_qps = total_qps / count
-        baseline_query.append(query_qps)
-        labels.append(f"Iteration {i+1}")
-
-    # Process dev results
-    for result in dev_results:
-        if "insert_performance" in result:
-            dev_insert.append(result["insert_performance"].get("vectors_per_second", 0))
-        else:
-            dev_insert.append(0)
-
-        query_qps = 0
-        if "query_performance" in result:
-            qp = result["query_performance"]
-            total_qps = 0
-            count = 0
-            for topk_key in ["topk_1", "topk_10", "topk_100"]:
-                if topk_key in qp:
-                    for batch_key in ["batch_1", "batch_10", "batch_100"]:
-                        if batch_key in qp[topk_key]:
-                            total_qps += qp[topk_key][batch_key].get(
-                                "queries_per_second", 0
-                            )
-                            count += 1
-            if count > 0:
-                query_qps = total_qps / count
-        dev_query.append(query_qps)
-
-    x = range(len(baseline_results) if baseline_results else len(dev_results))
-
-    # Insert performance - with visible markers for all points
-    if baseline_insert:
-        # Line plot with smaller markers
-        ax1.plot(
-            x,
-            baseline_insert,
-            "-",
-            label="Baseline",
-            linewidth=1.5,
-            color="blue",
-            alpha=0.6,
-        )
-        # Add distinct markers for each point
-        ax1.scatter(
-            x,
-            baseline_insert,
-            s=30,
-            color="blue",
-            alpha=0.8,
-            edgecolors="darkblue",
-            linewidth=0.5,
-            zorder=5,
-        )
-    if dev_insert:
-        # Line plot with smaller markers
-        ax1.plot(
-            x[: len(dev_insert)],
-            dev_insert,
-            "-",
-            label="Development",
-            linewidth=1.5,
-            color="red",
-            alpha=0.6,
-        )
-        # Add distinct markers for each point
-        ax1.scatter(
-            x[: len(dev_insert)],
-            dev_insert,
-            s=30,
-            color="red",
-            alpha=0.8,
-            edgecolors="darkred",
-            linewidth=0.5,
-            marker="s",
-            zorder=5,
-        )
-    ax1.set_xlabel("Test Iteration (same configuration, repeated for reliability)")
-    ax1.set_ylabel("Insert QPS (vectors/second)")
-    ax1.set_title("Milvus Insert Performance")
-
-    # Handle x-axis labels to prevent overlap
-    num_points = len(x)
-    if num_points > 20:
-        # Show every 5th label for many iterations
-        step = 5
-        tick_positions = list(range(0, num_points, step))
-        tick_labels = [
-            labels[i] if labels else f"Iteration {i+1}" for i in tick_positions
-        ]
-        ax1.set_xticks(tick_positions)
-        ax1.set_xticklabels(tick_labels, rotation=45, ha="right")
-    elif num_points > 10:
-        # Show every 2nd label for moderate iterations
-        step = 2
-        tick_positions = list(range(0, num_points, step))
-        tick_labels = [
-            labels[i] if labels else f"Iteration {i+1}" for i in tick_positions
-        ]
-        ax1.set_xticks(tick_positions)
-        ax1.set_xticklabels(tick_labels, rotation=45, ha="right")
-    else:
-        # Show all labels for few iterations
-        ax1.set_xticks(x)
-        ax1.set_xticklabels(labels if labels else [f"Iteration {i+1}" for i in x])
-
-    ax1.legend()
-    ax1.grid(True, alpha=0.3)
-
-    # Add configuration text box - compact
-    if config_text:
-        ax1.text(
-            0.02,
-            0.98,
-            config_text,
-            transform=ax1.transAxes,
-            fontsize=6,
-            verticalalignment="top",
-            bbox=dict(boxstyle="round,pad=0.3", facecolor="wheat", alpha=0.85),
-        )
-
-    # Query performance - with visible markers for all points
-    if baseline_query:
-        # Line plot
-        ax2.plot(
-            x,
-            baseline_query,
-            "-",
-            label="Baseline",
-            linewidth=1.5,
-            color="blue",
-            alpha=0.6,
-        )
-        # Add distinct markers for each point
-        ax2.scatter(
-            x,
-            baseline_query,
-            s=30,
-            color="blue",
-            alpha=0.8,
-            edgecolors="darkblue",
-            linewidth=0.5,
-            zorder=5,
-        )
-    if dev_query:
-        # Line plot
-        ax2.plot(
-            x[: len(dev_query)],
-            dev_query,
-            "-",
-            label="Development",
-            linewidth=1.5,
-            color="red",
-            alpha=0.6,
-        )
-        # Add distinct markers for each point
-        ax2.scatter(
-            x[: len(dev_query)],
-            dev_query,
-            s=30,
-            color="red",
-            alpha=0.8,
-            edgecolors="darkred",
-            linewidth=0.5,
-            marker="s",
-            zorder=5,
-        )
-    ax2.set_xlabel("Test Iteration (same configuration, repeated for reliability)")
-    ax2.set_ylabel("Query QPS (queries/second)")
-    ax2.set_title("Milvus Query Performance")
-
-    # Handle x-axis labels to prevent overlap
-    num_points = len(x)
-    if num_points > 20:
-        # Show every 5th label for many iterations
-        step = 5
-        tick_positions = list(range(0, num_points, step))
-        tick_labels = [
-            labels[i] if labels else f"Iteration {i+1}" for i in tick_positions
-        ]
-        ax2.set_xticks(tick_positions)
-        ax2.set_xticklabels(tick_labels, rotation=45, ha="right")
-    elif num_points > 10:
-        # Show every 2nd label for moderate iterations
-        step = 2
-        tick_positions = list(range(0, num_points, step))
-        tick_labels = [
-            labels[i] if labels else f"Iteration {i+1}" for i in tick_positions
-        ]
-        ax2.set_xticks(tick_positions)
-        ax2.set_xticklabels(tick_labels, rotation=45, ha="right")
-    else:
-        # Show all labels for few iterations
-        ax2.set_xticks(x)
-        ax2.set_xticklabels(labels if labels else [f"Iteration {i+1}" for i in x])
-
-    ax2.legend()
-    ax2.grid(True, alpha=0.3)
-
-    # Add configuration text box - compact
-    if config_text:
-        ax2.text(
-            0.02,
-            0.98,
-            config_text,
-            transform=ax2.transAxes,
-            fontsize=6,
-            verticalalignment="top",
-            bbox=dict(boxstyle="round,pad=0.3", facecolor="wheat", alpha=0.85),
-        )
-
+            ax2.set_xlabel("Iteration")
+            ax2.set_ylabel("Total Time (seconds)")
+            ax2.set_title("Vector Insert Time Performance") 
+            ax2.grid(True, alpha=0.3)
+            
     plt.tight_layout()
     plt.savefig(os.path.join(output_dir, "performance_trends.png"), dpi=150)
     plt.close()
 
 
-def generate_summary_statistics(results, output_dir):
-    """Generate summary statistics and save to JSON"""
-    # Get unique filesystems, excluding "unknown"
-    filesystems = set()
-    for r in results:
-        fs = r.get("filesystem", "unknown")
-        if fs != "unknown":
-            filesystems.add(fs)
-
-    summary = {
-        "total_tests": len(results),
-        "filesystems_tested": sorted(list(filesystems)),
-        "configurations": {},
-        "performance_summary": {
-            "best_insert_qps": {"value": 0, "config": ""},
-            "best_query_qps": {"value": 0, "config": ""},
-            "average_insert_qps": 0,
-            "average_query_qps": 0,
-        },
-    }
-
-    # Calculate statistics
-    all_insert_qps = []
-    all_query_qps = []
-
-    for result in results:
-        fs = result.get("filesystem", "unknown")
-        block_size = result.get("block_size", "default")
-        is_dev = "dev" if result.get("is_dev", False) else "baseline"
-        config_name = f"{fs}-{block_size}-{is_dev}"
-
-        # Get actual performance metrics
-        if "insert_performance" in result:
-            insert_qps = result["insert_performance"].get("vectors_per_second", 0)
-        else:
-            insert_qps = 0
-
-        # Calculate average query QPS
-        query_qps = 0
-        if "query_performance" in result:
-            qp = result["query_performance"]
-            total_qps = 0
-            count = 0
-            for topk_key in ["topk_1", "topk_10", "topk_100"]:
-                if topk_key in qp:
-                    for batch_key in ["batch_1", "batch_10", "batch_100"]:
-                        if batch_key in qp[topk_key]:
-                            total_qps += qp[topk_key][batch_key].get(
-                                "queries_per_second", 0
-                            )
-                            count += 1
-            if count > 0:
-                query_qps = total_qps / count
-
-        all_insert_qps.append(insert_qps)
-        all_query_qps.append(query_qps)
-
-        summary["configurations"][config_name] = {
-            "insert_qps": insert_qps,
-            "query_qps": query_qps,
-            "host": result.get("host", "unknown"),
-        }
-
-        if insert_qps > summary["performance_summary"]["best_insert_qps"]["value"]:
-            summary["performance_summary"]["best_insert_qps"] = {
-                "value": insert_qps,
-                "config": config_name,
-            }
-
-        if query_qps > summary["performance_summary"]["best_query_qps"]["value"]:
-            summary["performance_summary"]["best_query_qps"] = {
-                "value": query_qps,
-                "config": config_name,
-            }
-
-    summary["performance_summary"]["average_insert_qps"] = (
-        np.mean(all_insert_qps) if all_insert_qps else 0
-    )
-    summary["performance_summary"]["average_query_qps"] = (
-        np.mean(all_query_qps) if all_query_qps else 0
-    )
-
-    # Save summary
-    with open(os.path.join(output_dir, "summary.json"), "w") as f:
-        json.dump(summary, f, indent=2)
-
-    return summary
-
-
-def create_comprehensive_fs_comparison(results, output_dir):
-    """Create comprehensive filesystem performance comparison including all configurations"""
-    import matplotlib.pyplot as plt
-    import numpy as np
-    from collections import defaultdict
-
-    # Collect data for all filesystem configurations
-    config_data = defaultdict(lambda: {"baseline": [], "dev": []})
-
-    for result in results:
-        fs = result.get("filesystem", "unknown")
-        block_size = result.get("block_size", "")
-
-        # Create configuration label
-        if block_size and block_size != "default":
-            config_label = f"{fs}-{block_size}"
-        else:
-            config_label = fs
-
-        category = "dev" if result.get("is_dev", False) else "baseline"
-
-        # Extract performance metrics
-        if "insert_performance" in result:
-            insert_qps = result["insert_performance"].get("vectors_per_second", 0)
-        else:
-            insert_qps = 0
-
-        config_data[config_label][category].append(insert_qps)
-
-    # Sort configurations for consistent display
-    configs = sorted(config_data.keys())
-
-    # Calculate means and standard deviations
-    baseline_means = []
-    baseline_stds = []
-    dev_means = []
-    dev_stds = []
-
-    for config in configs:
-        baseline_vals = config_data[config]["baseline"]
-        dev_vals = config_data[config]["dev"]
-
-        baseline_means.append(np.mean(baseline_vals) if baseline_vals else 0)
-        baseline_stds.append(np.std(baseline_vals) if baseline_vals else 0)
-        dev_means.append(np.mean(dev_vals) if dev_vals else 0)
-        dev_stds.append(np.std(dev_vals) if dev_vals else 0)
-
-    # Create the plot
-    fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(14, 10))
-
-    x = np.arange(len(configs))
-    width = 0.35
-
-    # Top plot: Absolute performance
-    baseline_bars = ax1.bar(
-        x - width / 2,
-        baseline_means,
-        width,
-        yerr=baseline_stds,
-        label="Baseline",
-        color="#1f77b4",
-        capsize=5,
-    )
-    dev_bars = ax1.bar(
-        x + width / 2,
-        dev_means,
-        width,
-        yerr=dev_stds,
-        label="Development",
-        color="#ff7f0e",
-        capsize=5,
-    )
-
-    ax1.set_ylabel("Insert QPS")
-    ax1.set_title("Vector Database Performance Across Filesystem Configurations")
-    ax1.set_xticks(x)
-    ax1.set_xticklabels(configs, rotation=45, ha="right")
-    ax1.legend()
-    ax1.grid(True, alpha=0.3)
-
-    # Add value labels on bars
-    for bars in [baseline_bars, dev_bars]:
-        for bar in bars:
-            height = bar.get_height()
-            if height > 0:
-                ax1.annotate(
-                    f"{height:.0f}",
-                    xy=(bar.get_x() + bar.get_width() / 2, height),
-                    xytext=(0, 3),
-                    textcoords="offset points",
-                    ha="center",
-                    va="bottom",
-                    fontsize=8,
-                )
-
-    # Bottom plot: Percentage improvement (dev vs baseline)
-    improvements = []
-    for i in range(len(configs)):
-        if baseline_means[i] > 0:
-            improvement = ((dev_means[i] - baseline_means[i]) / baseline_means[i]) * 100
-        else:
-            improvement = 0
-        improvements.append(improvement)
-
-    colors = ["green" if x > 0 else "red" for x in improvements]
-    improvement_bars = ax2.bar(x, improvements, color=colors, alpha=0.7)
-
-    ax2.set_ylabel("Performance Change (%)")
-    ax2.set_title("Development vs Baseline Performance Change")
-    ax2.set_xticks(x)
-    ax2.set_xticklabels(configs, rotation=45, ha="right")
-    ax2.axhline(y=0, color="black", linestyle="-", linewidth=0.5)
-    ax2.grid(True, alpha=0.3)
-
-    # Add percentage labels
-    for bar, val in zip(improvement_bars, improvements):
-        ax2.annotate(
-            f"{val:.1f}%",
-            xy=(bar.get_x() + bar.get_width() / 2, val),
-            xytext=(0, 3 if val > 0 else -15),
-            textcoords="offset points",
-            ha="center",
-            va="bottom" if val > 0 else "top",
-            fontsize=8,
-        )
-
-    plt.tight_layout()
-    plt.savefig(os.path.join(output_dir, "comprehensive_fs_comparison.png"), dpi=150)
-    plt.close()
-
-
-def create_fs_latency_comparison(results, output_dir):
-    """Create latency comparison across filesystems"""
-    import matplotlib.pyplot as plt
-    import numpy as np
-    from collections import defaultdict
-
-    # Collect latency data
-    config_latency = defaultdict(lambda: {"baseline": [], "dev": []})
-
-    for result in results:
-        fs = result.get("filesystem", "unknown")
-        block_size = result.get("block_size", "")
-
-        if block_size and block_size != "default":
-            config_label = f"{fs}-{block_size}"
-        else:
-            config_label = fs
-
-        category = "dev" if result.get("is_dev", False) else "baseline"
-
-        # Extract latency metrics
-        if "query_performance" in result:
-            latency_p99 = result["query_performance"].get("latency_p99_ms", 0)
-        else:
-            latency_p99 = 0
-
-        if latency_p99 > 0:
-            config_latency[config_label][category].append(latency_p99)
-
-    if not config_latency:
+def create_heatmap_analysis(results, output_dir):
+    """Create multi-filesystem heatmap showing query performance"""
+    if not results:
         return
 
-    # Sort configurations
-    configs = sorted(config_latency.keys())
-
-    # Calculate statistics
-    baseline_p99 = []
-    dev_p99 = []
-
-    for config in configs:
-        baseline_vals = config_latency[config]["baseline"]
-        dev_vals = config_latency[config]["dev"]
-
-        baseline_p99.append(np.mean(baseline_vals) if baseline_vals else 0)
-        dev_p99.append(np.mean(dev_vals) if dev_vals else 0)
-
-    # Create plot
-    fig, ax = plt.subplots(figsize=(12, 6))
-
-    x = np.arange(len(configs))
-    width = 0.35
-
-    baseline_bars = ax.bar(
-        x - width / 2, baseline_p99, width, label="Baseline P99", color="#9467bd"
-    )
-    dev_bars = ax.bar(
-        x + width / 2, dev_p99, width, label="Development P99", color="#e377c2"
-    )
-
-    ax.set_xlabel("Filesystem Configuration")
-    ax.set_ylabel("Latency P99 (ms)")
-    ax.set_title("Query Latency (P99) Comparison Across Filesystems")
-    ax.set_xticks(x)
-    ax.set_xticklabels(configs, rotation=45, ha="right")
-    ax.legend()
-    ax.grid(True, alpha=0.3)
-
-    # Add value labels
-    for bars in [baseline_bars, dev_bars]:
-        for bar in bars:
-            height = bar.get_height()
-            if height > 0:
-                ax.annotate(
-                    f"{height:.1f}",
-                    xy=(bar.get_x() + bar.get_width() / 2, height),
-                    xytext=(0, 3),
-                    textcoords="offset points",
-                    ha="center",
-                    va="bottom",
-                    fontsize=8,
-                )
+    # Group data by filesystem configuration
+    fs_performance = defaultdict(lambda: {
+        "query_data": [],
+        "config_key": "",
+    })
 
+    for result in results:
+        fs_type, block_size, config_key = _extract_filesystem_config(result)
+        
+        query_perf = result.get("query_performance", {})
+        for topk, topk_data in query_perf.items():
+            for batch, batch_data in topk_data.items():
+                qps = batch_data.get("queries_per_second", 0)
+                fs_performance[config_key]["query_data"].append({
+                    "topk": topk,
+                    "batch": batch,
+                    "qps": qps,
+                })
+                fs_performance[config_key]["config_key"] = config_key
+
+    # Check if we have multi-filesystem data
+    if len(fs_performance) > 1:
+        # Multi-filesystem mode: separate heatmaps for each filesystem
+        num_fs = len(fs_performance)
+        fig, axes = plt.subplots(1, num_fs, figsize=(5*num_fs, 6))
+        if num_fs == 1:
+            axes = [axes]
+        
+        # Define common structure for consistency
+        topk_order = ["topk_1", "topk_10", "topk_100"]
+        batch_order = ["batch_1", "batch_10", "batch_100"]
+        
+        for idx, (config_key, perf_data) in enumerate(fs_performance.items()):
+            # Create matrix for this filesystem
+            matrix = np.zeros((len(topk_order), len(batch_order)))
+            
+            # Fill matrix with data
+            query_dict = {}
+            for item in perf_data["query_data"]:
+                query_dict[(item["topk"], item["batch"])] = item["qps"]
+                
+            for i, topk in enumerate(topk_order):
+                for j, batch in enumerate(batch_order):
+                    matrix[i, j] = query_dict.get((topk, batch), 0)
+            
+            # Plot heatmap
+            im = axes[idx].imshow(matrix, cmap='viridis', aspect='auto')
+            axes[idx].set_title(f"{config_key.upper()} Query Performance")
+            axes[idx].set_xticks(range(len(batch_order)))
+            axes[idx].set_xticklabels([b.replace("batch_", "Batch ") for b in batch_order])
+            axes[idx].set_yticks(range(len(topk_order)))
+            axes[idx].set_yticklabels([t.replace("topk_", "Top-") for t in topk_order])
+            
+            # Add text annotations
+            for i in range(len(topk_order)):
+                for j in range(len(batch_order)):
+                    axes[idx].text(j, i, f'{matrix[i, j]:.0f}',
+                                 ha="center", va="center", color="white", fontweight="bold")
+            
+            # Add colorbar
+            cbar = plt.colorbar(im, ax=axes[idx])
+            cbar.set_label('Queries Per Second (QPS)')
+    else:
+        # Single filesystem mode
+        fig, ax = plt.subplots(1, 1, figsize=(8, 6))
+        
+        if fs_performance:
+            config_key = list(fs_performance.keys())[0]
+            perf_data = fs_performance[config_key]
+            
+            # Create matrix
+            topk_order = ["topk_1", "topk_10", "topk_100"]
+            batch_order = ["batch_1", "batch_10", "batch_100"]
+            matrix = np.zeros((len(topk_order), len(batch_order)))
+            
+            # Fill matrix with data
+            query_dict = {}
+            for item in perf_data["query_data"]:
+                query_dict[(item["topk"], item["batch"])] = item["qps"]
+                
+            for i, topk in enumerate(topk_order):
+                for j, batch in enumerate(batch_order):
+                    matrix[i, j] = query_dict.get((topk, batch), 0)
+            
+            # Plot heatmap
+            im = ax.imshow(matrix, cmap='viridis', aspect='auto')
+            ax.set_title("Milvus Query Performance Heatmap")
+            ax.set_xticks(range(len(batch_order)))
+            ax.set_xticklabels([b.replace("batch_", "Batch ") for b in batch_order])
+            ax.set_yticks(range(len(topk_order)))
+            ax.set_yticklabels([t.replace("topk_", "Top-") for t in topk_order])
+            
+            # Add text annotations
+            for i in range(len(topk_order)):
+                for j in range(len(batch_order)):
+                    ax.text(j, i, f'{matrix[i, j]:.0f}',
+                           ha="center", va="center", color="white", fontweight="bold")
+            
+            # Add colorbar
+            cbar = plt.colorbar(im, ax=ax)
+            cbar.set_label('Queries Per Second (QPS)')
+    
     plt.tight_layout()
-    plt.savefig(os.path.join(output_dir, "filesystem_latency_comparison.png"), dpi=150)
+    plt.savefig(os.path.join(output_dir, "performance_heatmap.png"), dpi=150, bbox_inches="tight")
     plt.close()
 
 
@@ -1119,56 +340,23 @@ def main():
     results_dir = sys.argv[1]
     output_dir = sys.argv[2]
 
-    # Create output directory
+    # Ensure output directory exists
     os.makedirs(output_dir, exist_ok=True)
 
     # Load results
     results = load_results(results_dir)
-
     if not results:
-        print("No results found to analyze")
+        print(f"No valid results found in {results_dir}")
         sys.exit(1)
 
     print(f"Loaded {len(results)} result files")
 
     # Generate graphs
-    print("Generating performance heatmap...")
-    create_heatmap_analysis(results, output_dir)
-
-    print("Generating performance trends...")
     create_simple_performance_trends(results, output_dir)
+    create_heatmap_analysis(results, output_dir)
 
-    print("Generating summary statistics...")
-    summary = generate_summary_statistics(results, output_dir)
-
-    # Check if we have multiple filesystems to compare
-    filesystems = set(r.get("filesystem", "unknown") for r in results)
-    if len(filesystems) > 1:
-        print("Generating filesystem comparison chart...")
-        create_filesystem_comparison_chart(results, output_dir)
-
-        print("Generating comprehensive filesystem comparison...")
-        create_comprehensive_fs_comparison(results, output_dir)
-
-        print("Generating filesystem latency comparison...")
-        create_fs_latency_comparison(results, output_dir)
-
-        # Check if we have XFS results with different block sizes
-        xfs_results = [r for r in results if r.get("filesystem") == "xfs"]
-        block_sizes = set(r.get("block_size", "unknown") for r in xfs_results)
-        if len(block_sizes) > 1:
-            print("Generating XFS block size analysis...")
-            create_block_size_analysis(results, output_dir)
-
-    print(f"\nAnalysis complete! Graphs saved to {output_dir}")
-    print(f"Total configurations tested: {summary['total_tests']}")
-    print(
-        f"Best insert QPS: {summary['performance_summary']['best_insert_qps']['value']} ({summary['performance_summary']['best_insert_qps']['config']})"
-    )
-    print(
-        f"Best query QPS: {summary['performance_summary']['best_query_qps']['value']} ({summary['performance_summary']['best_query_qps']['config']})"
-    )
+    print(f"Graphs generated in {output_dir}")
 
 
 if __name__ == "__main__":
-    main()
+    main()
\ No newline at end of file
diff --git a/workflows/ai/scripts/generate_html_report.py b/workflows/ai/scripts/generate_html_report.py
index 3aa8342f..01ec734c 100755
--- a/workflows/ai/scripts/generate_html_report.py
+++ b/workflows/ai/scripts/generate_html_report.py
@@ -180,7 +180,7 @@ HTML_TEMPLATE = """
 </head>
 <body>
     <div class="header">
-        <h1>AI Vector Database Benchmark Results</h1>
+        <h1>Milvus Vector Database Benchmark Results</h1>
         <div class="subtitle">Generated on {timestamp}</div>
     </div>
     
@@ -238,11 +238,13 @@ HTML_TEMPLATE = """
     </div>
     
     <div id="detailed-results" class="section">
-        <h2>Detailed Results Table</h2>
+        <h2>Milvus Performance by Storage Filesystem</h2>
+        <p>This table shows how Milvus vector database performs when its data is stored on different filesystem types and configurations.</p>
         <table class="results-table">
             <thead>
                 <tr>
-                    <th>Host</th>
+                    <th>Filesystem</th>
+                    <th>Configuration</th>
                     <th>Type</th>
                     <th>Insert QPS</th>
                     <th>Query QPS</th>
@@ -293,27 +295,53 @@ def load_results(results_dir):
                 # Get filesystem from JSON data
                 fs_type = data.get("filesystem", None)
 
-                # If not in JSON, try to parse from filename (backwards compatibility)
-                if not fs_type and "debian13-ai" in filename:
-                    host_parts = (
-                        filename.replace("results_debian13-ai-", "")
-                        .replace("_1.json", "")
+                # Always try to parse from filename first since JSON data might be wrong
+                if "-ai-" in filename:
+                    # Handle both debian13-ai- and prod-ai- prefixes
+                    cleaned_filename = filename.replace("results_", "")
+
+                    # Extract the part after -ai-
+                    if "debian13-ai-" in cleaned_filename:
+                        host_part = cleaned_filename.replace("debian13-ai-", "")
+                    elif "prod-ai-" in cleaned_filename:
+                        host_part = cleaned_filename.replace("prod-ai-", "")
+                    else:
+                        # Generic extraction
+                        ai_index = cleaned_filename.find("-ai-")
+                        if ai_index != -1:
+                            host_part = cleaned_filename[ai_index + 4 :]  # Skip "-ai-"
+                        else:
+                            host_part = cleaned_filename
+
+                    # Remove file extensions and dev suffix
+                    host_part = (
+                        host_part.replace("_1.json", "")
                         .replace("_2.json", "")
                         .replace("_3.json", "")
-                        .split("-")
+                        .replace("-dev", "")
                     )
-                    if "xfs" in host_parts[0]:
+
+                    # Parse filesystem type and block size
+                    if host_part.startswith("xfs-"):
                         fs_type = "xfs"
-                        block_size = host_parts[1] if len(host_parts) > 1 else "4k"
-                    elif "ext4" in host_parts[0]:
+                        # Extract block size: xfs-4k-4ks -> 4k
+                        parts = host_part.split("-")
+                        if len(parts) >= 2:
+                            block_size = parts[1]  # 4k, 16k, 32k, 64k
+                        else:
+                            block_size = "4k"
+                    elif host_part.startswith("ext4-"):
                         fs_type = "ext4"
-                        block_size = host_parts[1] if len(host_parts) > 1 else "4k"
-                    elif "btrfs" in host_parts[0]:
+                        parts = host_part.split("-")
+                        block_size = parts[1] if len(parts) > 1 else "4k"
+                    elif host_part.startswith("btrfs"):
                         fs_type = "btrfs"
                         block_size = "default"
                     else:
-                        fs_type = "unknown"
-                        block_size = "unknown"
+                        # Fallback to JSON data if available
+                        if not fs_type:
+                            fs_type = "unknown"
+                            block_size = "unknown"
                 else:
                     # Set appropriate block size based on filesystem
                     if fs_type == "btrfs":
@@ -371,12 +399,36 @@ def generate_table_rows(results, best_configs):
         if config_key in best_configs:
             row_class += " best-config"
 
+        # Generate descriptive labels showing Milvus is running on this filesystem
+        if result["filesystem"] == "xfs" and result["block_size"] != "default":
+            storage_label = f"XFS {result['block_size'].upper()}"
+            config_details = f"Block size: {result['block_size']}, Milvus data on XFS"
+        elif result["filesystem"] == "ext4":
+            storage_label = "EXT4"
+            if "bigalloc" in result.get("host", "").lower():
+                config_details = "EXT4 with bigalloc, Milvus data on ext4"
+            else:
+                config_details = (
+                    f"Block size: {result['block_size']}, Milvus data on ext4"
+                )
+        elif result["filesystem"] == "btrfs":
+            storage_label = "BTRFS"
+            config_details = "Default Btrfs settings, Milvus data on Btrfs"
+        else:
+            storage_label = result["filesystem"].upper()
+            config_details = f"Milvus data on {result['filesystem']}"
+
+        # Extract clean node identifier from hostname
+        node_name = result["host"].replace("results_", "").replace(".json", "")
+
         row = f"""
         <tr class="{row_class}">
-            <td>{result['host']}</td>
+            <td><strong>{storage_label}</strong></td>
+            <td>{config_details}</td>
             <td>{result['type']}</td>
             <td>{result['insert_qps']:,}</td>
             <td>{result['query_qps']:,}</td>
+            <td><code>{node_name}</code></td>
             <td>{result['timestamp']}</td>
         </tr>
         """
@@ -483,8 +535,8 @@ def generate_html_report(results_dir, graphs_dir, output_path):
             <li><a href="#block-size-analysis">Block Size Analysis</a></li>"""
 
         filesystem_comparison_section = """<div id="filesystem-comparison" class="section">
-        <h2>Filesystem Performance Comparison</h2>
-        <p>Comparison of vector database performance across different filesystems, showing both baseline and development kernel results.</p>
+        <h2>Milvus Storage Filesystem Comparison</h2>
+        <p>Comparison of Milvus vector database performance when its data is stored on different filesystem types (XFS, ext4, Btrfs) with various configurations.</p>
         <div class="graph-container">
             <img src="graphs/filesystem_comparison.png" alt="Filesystem Comparison">
         </div>
@@ -499,9 +551,9 @@ def generate_html_report(results_dir, graphs_dir, output_path):
     </div>"""
 
         # Multi-fs mode: show filesystem info
-        fourth_card_title = "Filesystems Tested"
+        fourth_card_title = "Storage Filesystems"
         fourth_card_value = str(len(filesystems_tested))
-        fourth_card_label = ", ".join(filesystems_tested).upper()
+        fourth_card_label = ", ".join(filesystems_tested).upper() + " for Milvus Data"
     else:
         # Single filesystem mode - hide multi-fs sections
         filesystem_nav_items = ""
-- 
2.50.1