[PATCH 0/2] kdevops: add milvus with minio support

kdevops.lists.linux.dev archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] kdevops: add milvus with minio support
@ 2025-08-27  9:31 Luis Chamberlain
  2025-08-27  9:32 ` [PATCH 1/2] ai: add Milvus vector database benchmarking support Luis Chamberlain
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Luis Chamberlain @ 2025-08-27  9:31 UTC (permalink / raw)
  To: Chuck Lever, Daniel Gomez, hui81.qi, kundan.kumar, kdevops
  Cc: Luis Chamberlain

This adds the ability to test milvus on minio with different filesystem
configuration targets. There's a basic configuration you can run which
will just support one default filesystem, which will be used where you
place your docker image and also where we place the minio instance. Then
there is multifs support where just as with fstests support on kdevops
you can select a slew of different filesystem targets to try to test.

Recommendation is to stick to 40 iterations at 1,000,000 tests unless
you have more than 100 GiB per guest to spare. If you have space to
spare then you know how to ballpark it.

On High Capacity SSDs, the world is our oyster.

You can see a demo of results here:

https://github.com/mcgrof/demo-milvus-kdevops-results

These are just demos. On guests. Nothing really useful.
I should point out this has AB testing automated as well so we can
leverage this to test for instance ... parallel writeback in an
automated way ;)

If you want to test this you can also use this branch on kdevops:

https://github.com/linux-kdevops/kdevops/tree/mcgrof/20250827-milvus

I am in hopes someone will just prompt an AI for bare metal support
while I sleep. It should be... easy. Just create the partitions already,
use one host and ask the prompt to not mkfs for you. So don't use
multi-fs support at first. Just use the option to create the storage
partition where you place docker. In fact you can copy and paste this
prompt the the AI, and I think it will know what to do. You just skip
some steps as the filesystems can be created and mounted for you. You
just need the host file created by you manually for the target node.
That and infer user and group id support (WORKFLOW_INFER_USER_AND_GROUP).

Luis Chamberlain (2):
  ai: add Milvus vector database benchmarking support
  ai: add multi-filesystem testing support for Milvus benchmarks

 .github/workflows/docker-tests.yml            |    6 +
 .gitignore                                    |    3 +-
 Makefile                                      |    2 +-
 README.md                                     |   18 +
 defconfigs/ai-milvus-docker                   |  113 ++
 defconfigs/ai-milvus-docker-ci                |   51 +
 defconfigs/ai-milvus-multifs                  |   67 +
 defconfigs/ai-milvus-multifs-distro           |  109 ++
 defconfigs/ai-milvus-multifs-extended         |  108 ++
 docs/ai/README.md                             |  108 ++
 docs/ai/vector-databases/README.md            |   75 +
 docs/ai/vector-databases/milvus.md            |  264 +++
 kconfigs/workflows/Kconfig                    |   27 +
 playbooks/ai.yml                              |   11 +
 playbooks/ai_benchmark.yml                    |    8 +
 playbooks/ai_destroy.yml                      |   24 +
 playbooks/ai_install.yml                      |   14 +
 playbooks/ai_multifs.yml                      |   24 +
 playbooks/ai_results.yml                      |    6 +
 playbooks/ai_setup.yml                        |    6 +
 playbooks/ai_tests.yml                        |   31 +
 playbooks/ai_uninstall.yml                    |    6 +
 .../debian13-ai-btrfs-default-dev.yml         |    8 +
 .../host_vars/debian13-ai-btrfs-default.yml   |    8 +
 .../debian13-ai-ext4-16k-bigalloc-dev.yml     |    8 +
 .../debian13-ai-ext4-16k-bigalloc.yml         |    8 +
 .../host_vars/debian13-ai-ext4-4k-dev.yml     |    8 +
 playbooks/host_vars/debian13-ai-ext4-4k.yml   |    8 +
 .../host_vars/debian13-ai-xfs-16k-4ks-dev.yml |   10 +
 .../host_vars/debian13-ai-xfs-16k-4ks.yml     |   10 +
 .../host_vars/debian13-ai-xfs-32k-4ks-dev.yml |   10 +
 .../host_vars/debian13-ai-xfs-32k-4ks.yml     |   10 +
 .../host_vars/debian13-ai-xfs-4k-4ks-dev.yml  |   10 +
 .../host_vars/debian13-ai-xfs-64k-4ks-dev.yml |   10 +
 .../host_vars/debian13-ai-xfs-64k-4ks.yml     |   10 +
 .../files/analyze_results.py                  | 1701 +++++++++++++++++
 .../files/generate_better_graphs.py           |  550 ++++++
 .../files/generate_graphs.py                  |  362 ++++
 .../files/generate_html_report.py             |  610 ++++++
 .../roles/ai_collect_results/tasks/main.yml   |  202 ++
 .../templates/analysis_config.json.j2         |    6 +
 playbooks/roles/ai_destroy/tasks/main.yml     |   63 +
 .../roles/ai_docker_storage/tasks/main.yml    |  123 ++
 playbooks/roles/ai_install/tasks/main.yml     |   90 +
 .../roles/ai_milvus_storage/tasks/main.yml    |  161 ++
 .../tasks/generate_comparison.yml             |  279 +++
 playbooks/roles/ai_multifs_run/tasks/main.yml |   23 +
 .../tasks/run_single_filesystem.yml           |  104 +
 .../templates/milvus_config.json.j2           |   42 +
 .../roles/ai_multifs_setup/defaults/main.yml  |   49 +
 .../roles/ai_multifs_setup/tasks/main.yml     |   70 +
 playbooks/roles/ai_results/tasks/main.yml     |   22 +
 .../files/milvus_benchmark.py                 |  556 ++++++
 .../roles/ai_run_benchmarks/tasks/main.yml    |  181 ++
 .../templates/benchmark_config.json.j2        |   24 +
 playbooks/roles/ai_setup/tasks/main.yml       |  115 ++
 playbooks/roles/ai_uninstall/tasks/main.yml   |   62 +
 playbooks/roles/gen_hosts/tasks/main.yml      |   33 +
 .../roles/gen_hosts/templates/fstests.j2      |    2 +
 playbooks/roles/gen_hosts/templates/gitr.j2   |    2 +
 playbooks/roles/gen_hosts/templates/hosts.j2  |   99 +
 .../roles/gen_hosts/templates/nfstest.j2      |    2 +
 playbooks/roles/gen_hosts/templates/pynfs.j2  |    2 +
 playbooks/roles/gen_nodes/tasks/main.yml      |  124 ++
 .../roles/guestfs/tasks/bringup/main.yml      |   15 +
 playbooks/roles/milvus/README.md              |  181 ++
 playbooks/roles/milvus/defaults/main.yml      |   74 +
 .../roles/milvus/files/milvus_benchmark.py    |  348 ++++
 playbooks/roles/milvus/files/milvus_utils.py  |  134 ++
 playbooks/roles/milvus/meta/main.yml          |   30 +
 playbooks/roles/milvus/tasks/benchmark.yml    |   61 +
 .../roles/milvus/tasks/benchmark_setup.yml    |   58 +
 .../roles/milvus/tasks/install_docker.yml     |   97 +
 playbooks/roles/milvus/tasks/main.yml         |   52 +
 playbooks/roles/milvus/tasks/setup.yml        |  107 ++
 .../milvus/templates/benchmark_config.json.j2 |   25 +
 .../templates/docker-compose.override.yml.j2  |   24 +
 .../milvus/templates/docker-compose.yml.j2    |   64 +
 .../roles/milvus/templates/milvus.yaml.j2     |   30 +
 .../milvus/templates/test_connection.py.j2    |   25 +
 scripts/guestfs.Makefile                      |    2 +-
 workflows/Makefile                            |    4 +
 workflows/ai/Kconfig                          |  177 ++
 workflows/ai/Kconfig.docker                   |  172 ++
 workflows/ai/Kconfig.docker-storage           |  201 ++
 workflows/ai/Kconfig.fs                       |  118 ++
 workflows/ai/Kconfig.multifs                  |  184 ++
 workflows/ai/Kconfig.native                   |  184 ++
 workflows/ai/Makefile                         |  160 ++
 workflows/ai/scripts/analysis_config.json     |    6 +
 workflows/ai/scripts/analyze_results.py       | 1701 +++++++++++++++++
 workflows/ai/scripts/generate_graphs.py       |  362 ++++
 workflows/ai/scripts/generate_html_report.py  |  610 ++++++
 93 files changed, 12061 insertions(+), 3 deletions(-)
 create mode 100644 defconfigs/ai-milvus-docker
 create mode 100644 defconfigs/ai-milvus-docker-ci
 create mode 100644 defconfigs/ai-milvus-multifs
 create mode 100644 defconfigs/ai-milvus-multifs-distro
 create mode 100644 defconfigs/ai-milvus-multifs-extended
 create mode 100644 docs/ai/README.md
 create mode 100644 docs/ai/vector-databases/README.md
 create mode 100644 docs/ai/vector-databases/milvus.md
 create mode 100644 playbooks/ai.yml
 create mode 100644 playbooks/ai_benchmark.yml
 create mode 100644 playbooks/ai_destroy.yml
 create mode 100644 playbooks/ai_install.yml
 create mode 100644 playbooks/ai_multifs.yml
 create mode 100644 playbooks/ai_results.yml
 create mode 100644 playbooks/ai_setup.yml
 create mode 100644 playbooks/ai_tests.yml
 create mode 100644 playbooks/ai_uninstall.yml
 create mode 100644 playbooks/host_vars/debian13-ai-btrfs-default-dev.yml
 create mode 100644 playbooks/host_vars/debian13-ai-btrfs-default.yml
 create mode 100644 playbooks/host_vars/debian13-ai-ext4-16k-bigalloc-dev.yml
 create mode 100644 playbooks/host_vars/debian13-ai-ext4-16k-bigalloc.yml
 create mode 100644 playbooks/host_vars/debian13-ai-ext4-4k-dev.yml
 create mode 100644 playbooks/host_vars/debian13-ai-ext4-4k.yml
 create mode 100644 playbooks/host_vars/debian13-ai-xfs-16k-4ks-dev.yml
 create mode 100644 playbooks/host_vars/debian13-ai-xfs-16k-4ks.yml
 create mode 100644 playbooks/host_vars/debian13-ai-xfs-32k-4ks-dev.yml
 create mode 100644 playbooks/host_vars/debian13-ai-xfs-32k-4ks.yml
 create mode 100644 playbooks/host_vars/debian13-ai-xfs-4k-4ks-dev.yml
 create mode 100644 playbooks/host_vars/debian13-ai-xfs-64k-4ks-dev.yml
 create mode 100644 playbooks/host_vars/debian13-ai-xfs-64k-4ks.yml
 create mode 100755 playbooks/roles/ai_collect_results/files/analyze_results.py
 create mode 100755 playbooks/roles/ai_collect_results/files/generate_better_graphs.py
 create mode 100755 playbooks/roles/ai_collect_results/files/generate_graphs.py
 create mode 100755 playbooks/roles/ai_collect_results/files/generate_html_report.py
 create mode 100644 playbooks/roles/ai_collect_results/tasks/main.yml
 create mode 100644 playbooks/roles/ai_collect_results/templates/analysis_config.json.j2
 create mode 100644 playbooks/roles/ai_destroy/tasks/main.yml
 create mode 100644 playbooks/roles/ai_docker_storage/tasks/main.yml
 create mode 100644 playbooks/roles/ai_install/tasks/main.yml
 create mode 100644 playbooks/roles/ai_milvus_storage/tasks/main.yml
 create mode 100644 playbooks/roles/ai_multifs_run/tasks/generate_comparison.yml
 create mode 100644 playbooks/roles/ai_multifs_run/tasks/main.yml
 create mode 100644 playbooks/roles/ai_multifs_run/tasks/run_single_filesystem.yml
 create mode 100644 playbooks/roles/ai_multifs_run/templates/milvus_config.json.j2
 create mode 100644 playbooks/roles/ai_multifs_setup/defaults/main.yml
 create mode 100644 playbooks/roles/ai_multifs_setup/tasks/main.yml
 create mode 100644 playbooks/roles/ai_results/tasks/main.yml
 create mode 100644 playbooks/roles/ai_run_benchmarks/files/milvus_benchmark.py
 create mode 100644 playbooks/roles/ai_run_benchmarks/tasks/main.yml
 create mode 100644 playbooks/roles/ai_run_benchmarks/templates/benchmark_config.json.j2
 create mode 100644 playbooks/roles/ai_setup/tasks/main.yml
 create mode 100644 playbooks/roles/ai_uninstall/tasks/main.yml
 create mode 100644 playbooks/roles/milvus/README.md
 create mode 100644 playbooks/roles/milvus/defaults/main.yml
 create mode 100644 playbooks/roles/milvus/files/milvus_benchmark.py
 create mode 100644 playbooks/roles/milvus/files/milvus_utils.py
 create mode 100644 playbooks/roles/milvus/meta/main.yml
 create mode 100644 playbooks/roles/milvus/tasks/benchmark.yml
 create mode 100644 playbooks/roles/milvus/tasks/benchmark_setup.yml
 create mode 100644 playbooks/roles/milvus/tasks/install_docker.yml
 create mode 100644 playbooks/roles/milvus/tasks/main.yml
 create mode 100644 playbooks/roles/milvus/tasks/setup.yml
 create mode 100644 playbooks/roles/milvus/templates/benchmark_config.json.j2
 create mode 100644 playbooks/roles/milvus/templates/docker-compose.override.yml.j2
 create mode 100644 playbooks/roles/milvus/templates/docker-compose.yml.j2
 create mode 100644 playbooks/roles/milvus/templates/milvus.yaml.j2
 create mode 100644 playbooks/roles/milvus/templates/test_connection.py.j2
 create mode 100644 workflows/ai/Kconfig
 create mode 100644 workflows/ai/Kconfig.docker
 create mode 100644 workflows/ai/Kconfig.docker-storage
 create mode 100644 workflows/ai/Kconfig.fs
 create mode 100644 workflows/ai/Kconfig.multifs
 create mode 100644 workflows/ai/Kconfig.native
 create mode 100644 workflows/ai/Makefile
 create mode 100644 workflows/ai/scripts/analysis_config.json
 create mode 100755 workflows/ai/scripts/analyze_results.py
 create mode 100755 workflows/ai/scripts/generate_graphs.py
 create mode 100755 workflows/ai/scripts/generate_html_report.py

-- 
2.50.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] ai: add Milvus vector database benchmarking support
  2025-08-27  9:31 [PATCH 0/2] kdevops: add milvus with minio support Luis Chamberlain
@ 2025-08-27  9:32 ` Luis Chamberlain
  2025-08-27  9:32 ` [PATCH 2/2] ai: add multi-filesystem testing support for Milvus benchmarks Luis Chamberlain
  2025-08-29  2:05 ` [PATCH 0/2] kdevops: add milvus with minio support Luis Chamberlain
  2 siblings, 0 replies; 8+ messages in thread
From: Luis Chamberlain @ 2025-08-27  9:32 UTC (permalink / raw)
  To: Chuck Lever, Daniel Gomez, hui81.qi, kundan.kumar, kdevops
  Cc: Luis Chamberlain

Add initial AI/ML workflow infrastructure starting with Milvus
vector database benchmarking. This provides a foundation for testing
AI systems with the same rigor as existing kernel testing workflows
(fstests, blktests).

Key features:
- Docker-based Milvus deployment with etcd and MinIO
  - Supports using a dedicated drive for docker /var/lib/docker/
    including custom filesystem configurations
- Python virtual environment management for benchmark dependencies
- Comprehensive benchmarking of vector operations (insert, search, delete)
- A/B testing support for baseline vs development comparisons
- Performance visualization focusing on key metrics (QPS, latency)
- Result collection and analysis infrastructure

Performance Metrics:
The benchmarks focus on two critical vector database metrics:
- QPS (Queries Per Second): Throughput measurement for search operations
- Latency: Response time percentiles (p50, p95, p99) for operations

Recall rate measurement is challenging without ground truth data - the
correct answers must be known beforehand to measure search accuracy.
Since we generate random vectors for testing, establishing meaningful
ground truth would require careful similarity calculations that would
essentially duplicate the work being tested.

Defconfigs:
- ai-milvus-docker: Standard Docker-based Milvus deployment
- ai-milvus-docker-ci: CI-optimized with minimal dataset (1000 vectors)

Workflow integration follows kdevops patterns:
  make defconfig-ai-milvus-docker
  make bringup
  make ai         # Setup infrastructure
  make ai-tests   # Run benchmarks
  make ai-results # View results

The implementation handles proper cleanup, lock file management, and
comprehensive error handling to ensure reliable benchmark execution.

Generated-by: Claude AI
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 .gitignore                                    |    3 +-
 README.md                                     |   18 +
 defconfigs/ai-milvus-docker                   |  113 ++
 defconfigs/ai-milvus-docker-ci                |   51 +
 docs/ai/README.md                             |  108 ++
 docs/ai/vector-databases/README.md            |   76 ++
 docs/ai/vector-databases/milvus.md            |  264 ++++
 kconfigs/workflows/Kconfig                    |   27 +
 playbooks/ai.yml                              |   11 +
 playbooks/ai_benchmark.yml                    |    8 +
 playbooks/ai_destroy.yml                      |   24 +
 playbooks/ai_install.yml                      |    8 +
 playbooks/ai_results.yml                      |    6 +
 playbooks/ai_setup.yml                        |    6 +
 playbooks/ai_tests.yml                        |   31 +
 playbooks/ai_uninstall.yml                    |    6 +
 .../debian13-ai-btrfs-default-dev.yml         |    8 +
 .../host_vars/debian13-ai-btrfs-default.yml   |    8 +
 .../debian13-ai-ext4-16k-bigalloc-dev.yml     |    8 +
 .../debian13-ai-ext4-16k-bigalloc.yml         |    8 +
 .../host_vars/debian13-ai-ext4-4k-dev.yml     |    8 +
 playbooks/host_vars/debian13-ai-ext4-4k.yml   |    8 +
 .../host_vars/debian13-ai-xfs-16k-4ks-dev.yml |   10 +
 .../host_vars/debian13-ai-xfs-16k-4ks.yml     |   10 +
 .../host_vars/debian13-ai-xfs-32k-4ks-dev.yml |   10 +
 .../host_vars/debian13-ai-xfs-32k-4ks.yml     |   10 +
 .../host_vars/debian13-ai-xfs-4k-4ks-dev.yml  |   10 +
 .../host_vars/debian13-ai-xfs-4k-4ks.yml      |   10 +
 .../host_vars/debian13-ai-xfs-64k-4ks-dev.yml |   10 +
 .../host_vars/debian13-ai-xfs-64k-4ks.yml     |   10 +
 .../files/analyze_results.py                  |  979 ++++++++++++++
 .../files/generate_better_graphs.py           |  548 ++++++++
 .../files/generate_graphs.py                  |  678 ++++++++++
 .../files/generate_html_report.py             |  427 ++++++
 .../roles/ai_collect_results/tasks/main.yml   |  220 +++
 .../templates/analysis_config.json.j2         |    6 +
 playbooks/roles/ai_destroy/tasks/main.yml     |   63 +
 .../roles/ai_docker_storage/tasks/main.yml    |  123 ++
 playbooks/roles/ai_install/tasks/main.yml     |   90 ++
 playbooks/roles/ai_results/tasks/main.yml     |   22 +
 .../files/milvus_benchmark.py                 |  506 +++++++
 .../roles/ai_run_benchmarks/tasks/main.yml    |  181 +++
 .../templates/benchmark_config.json.j2        |   24 +
 playbooks/roles/ai_setup/tasks/main.yml       |  115 ++
 playbooks/roles/ai_uninstall/tasks/main.yml   |   62 +
 playbooks/roles/gen_hosts/tasks/main.yml      |   14 +
 playbooks/roles/gen_hosts/templates/hosts.j2  |  108 ++
 playbooks/roles/gen_nodes/tasks/main.yml      |   34 +
 playbooks/roles/milvus/README.md              |  181 +++
 playbooks/roles/milvus/defaults/main.yml      |   74 ++
 .../roles/milvus/files/milvus_benchmark.py    |  348 +++++
 playbooks/roles/milvus/files/milvus_utils.py  |  134 ++
 playbooks/roles/milvus/meta/main.yml          |   30 +
 playbooks/roles/milvus/tasks/benchmark.yml    |   61 +
 .../roles/milvus/tasks/benchmark_setup.yml    |   58 +
 .../roles/milvus/tasks/install_docker.yml     |   97 ++
 playbooks/roles/milvus/tasks/main.yml         |   52 +
 playbooks/roles/milvus/tasks/setup.yml        |  107 ++
 .../milvus/templates/benchmark_config.json.j2 |   25 +
 .../templates/docker-compose.override.yml.j2  |   24 +
 .../milvus/templates/docker-compose.yml.j2    |   64 +
 .../roles/milvus/templates/milvus.yaml.j2     |   30 +
 .../milvus/templates/test_connection.py.j2    |   25 +
 workflows/Makefile                            |    4 +
 workflows/ai/Kconfig                          |  164 +++
 workflows/ai/Kconfig.docker                   |  172 +++
 workflows/ai/Kconfig.docker-storage           |  201 +++
 workflows/ai/Kconfig.native                   |  184 +++
 workflows/ai/Makefile                         |  160 +++
 workflows/ai/scripts/analysis_config.json     |    6 +
 workflows/ai/scripts/analyze_results.py       |  979 ++++++++++++++
 workflows/ai/scripts/generate_graphs.py       | 1174 +++++++++++++++++
 workflows/ai/scripts/generate_html_report.py  |  558 ++++++++
 73 files changed, 9999 insertions(+), 1 deletion(-)
 create mode 100644 defconfigs/ai-milvus-docker
 create mode 100644 defconfigs/ai-milvus-docker-ci
 create mode 100644 docs/ai/README.md
 create mode 100644 docs/ai/vector-databases/README.md
 create mode 100644 docs/ai/vector-databases/milvus.md
 create mode 100644 playbooks/ai.yml
 create mode 100644 playbooks/ai_benchmark.yml
 create mode 100644 playbooks/ai_destroy.yml
 create mode 100644 playbooks/ai_install.yml
 create mode 100644 playbooks/ai_results.yml
 create mode 100644 playbooks/ai_setup.yml
 create mode 100644 playbooks/ai_tests.yml
 create mode 100644 playbooks/ai_uninstall.yml
 create mode 100644 playbooks/host_vars/debian13-ai-btrfs-default-dev.yml
 create mode 100644 playbooks/host_vars/debian13-ai-btrfs-default.yml
 create mode 100644 playbooks/host_vars/debian13-ai-ext4-16k-bigalloc-dev.yml
 create mode 100644 playbooks/host_vars/debian13-ai-ext4-16k-bigalloc.yml
 create mode 100644 playbooks/host_vars/debian13-ai-ext4-4k-dev.yml
 create mode 100644 playbooks/host_vars/debian13-ai-ext4-4k.yml
 create mode 100644 playbooks/host_vars/debian13-ai-xfs-16k-4ks-dev.yml
 create mode 100644 playbooks/host_vars/debian13-ai-xfs-16k-4ks.yml
 create mode 100644 playbooks/host_vars/debian13-ai-xfs-32k-4ks-dev.yml
 create mode 100644 playbooks/host_vars/debian13-ai-xfs-32k-4ks.yml
 create mode 100644 playbooks/host_vars/debian13-ai-xfs-4k-4ks-dev.yml
 create mode 100644 playbooks/host_vars/debian13-ai-xfs-4k-4ks.yml
 create mode 100644 playbooks/host_vars/debian13-ai-xfs-64k-4ks-dev.yml
 create mode 100644 playbooks/host_vars/debian13-ai-xfs-64k-4ks.yml
 create mode 100755 playbooks/roles/ai_collect_results/files/analyze_results.py
 create mode 100755 playbooks/roles/ai_collect_results/files/generate_better_graphs.py
 create mode 100755 playbooks/roles/ai_collect_results/files/generate_graphs.py
 create mode 100755 playbooks/roles/ai_collect_results/files/generate_html_report.py
 create mode 100644 playbooks/roles/ai_collect_results/tasks/main.yml
 create mode 100644 playbooks/roles/ai_collect_results/templates/analysis_config.json.j2
 create mode 100644 playbooks/roles/ai_destroy/tasks/main.yml
 create mode 100644 playbooks/roles/ai_docker_storage/tasks/main.yml
 create mode 100644 playbooks/roles/ai_install/tasks/main.yml
 create mode 100644 playbooks/roles/ai_results/tasks/main.yml
 create mode 100644 playbooks/roles/ai_run_benchmarks/files/milvus_benchmark.py
 create mode 100644 playbooks/roles/ai_run_benchmarks/tasks/main.yml
 create mode 100644 playbooks/roles/ai_run_benchmarks/templates/benchmark_config.json.j2
 create mode 100644 playbooks/roles/ai_setup/tasks/main.yml
 create mode 100644 playbooks/roles/ai_uninstall/tasks/main.yml
 create mode 100644 playbooks/roles/milvus/README.md
 create mode 100644 playbooks/roles/milvus/defaults/main.yml
 create mode 100644 playbooks/roles/milvus/files/milvus_benchmark.py
 create mode 100644 playbooks/roles/milvus/files/milvus_utils.py
 create mode 100644 playbooks/roles/milvus/meta/main.yml
 create mode 100644 playbooks/roles/milvus/tasks/benchmark.yml
 create mode 100644 playbooks/roles/milvus/tasks/benchmark_setup.yml
 create mode 100644 playbooks/roles/milvus/tasks/install_docker.yml
 create mode 100644 playbooks/roles/milvus/tasks/main.yml
 create mode 100644 playbooks/roles/milvus/tasks/setup.yml
 create mode 100644 playbooks/roles/milvus/templates/benchmark_config.json.j2
 create mode 100644 playbooks/roles/milvus/templates/docker-compose.override.yml.j2
 create mode 100644 playbooks/roles/milvus/templates/docker-compose.yml.j2
 create mode 100644 playbooks/roles/milvus/templates/milvus.yaml.j2
 create mode 100644 playbooks/roles/milvus/templates/test_connection.py.j2
 create mode 100644 workflows/ai/Kconfig
 create mode 100644 workflows/ai/Kconfig.docker
 create mode 100644 workflows/ai/Kconfig.docker-storage
 create mode 100644 workflows/ai/Kconfig.native
 create mode 100644 workflows/ai/Makefile
 create mode 100644 workflows/ai/scripts/analysis_config.json
 create mode 100755 workflows/ai/scripts/analyze_results.py
 create mode 100755 workflows/ai/scripts/generate_graphs.py
 create mode 100755 workflows/ai/scripts/generate_html_report.py

diff --git a/.gitignore b/.gitignore
index e5a13676..75e4712d 100644
--- a/.gitignore
+++ b/.gitignore
@@ -32,7 +32,6 @@ scripts/workflows/fstests/lib/__pycache__/
 scripts/workflows/blktests/lib/__pycache__/
 scripts/workflows/lib/__pycache__/
 
-
 include/
 
 # You can override role specific stuff on these
@@ -48,7 +47,9 @@ playbooks/secret.yml
 playbooks/python/workflows/fstests/__pycache__/
 playbooks/python/workflows/fstests/lib/__pycache__/
 playbooks/python/workflows/fstests/gen_results_summary.pyc
+playbooks/roles/ai_run_benchmarks/files/__pycache__/
 
+workflows/ai/results/
 workflows/pynfs/results/
 
 workflows/fstests/new_expunge_files.txt
diff --git a/README.md b/README.md
index 0c30762a..cb5fbc1f 100644
--- a/README.md
+++ b/README.md
@@ -14,6 +14,7 @@ Table of Contents
       * [reboot-limit](#reboot-limit)
       * [sysbench](#sysbench)
       * [fio-tests](#fio-tests)
+      * [AI workflow](#ai-workflow)
    * [kdevops chats](#kdevops-chats)
    * [kdevops on discord](#kdevops-on-discord)
       * [kdevops IRC](#kdevops-irc)
@@ -273,6 +274,22 @@ A/B testing capabilities, and advanced graphing and visualization support. For
 detailed configuration and usage information, refer to the
 [kdevops fio-tests documentation](docs/fio-tests.md).
 
+### AI workflow
+
+kdevops now supports AI/ML system benchmarking, starting with vector databases
+like Milvus. Similar to fstests, you can quickly set up and benchmark AI
+infrastructure with just a few commands:
+
+```bash
+make defconfig-ai-milvus-docker
+make bringup
+make ai
+```
+
+The AI workflow supports A/B testing, filesystem performance impact analysis,
+and comprehensive benchmarking of vector similarity search workloads. For
+details, see the [kdevops AI workflow documentation](docs/ai/README.md).
+
 ## kdevops chats
 
 We use discord and IRC. Right now we have more folks on discord than on IRC.
@@ -324,6 +341,7 @@ want to just use the kernel that comes with your Linux distribution.
   * [kdevops NFS docs](docs/nfs.md)
   * [kdevops selftests docs](docs/selftests.md)
   * [kdevops reboot-limit docs](docs/reboot-limit.md)
+  * [kdevops AI workflow docs](docs/ai/README.md)
 
 # kdevops general documentation
 
diff --git a/defconfigs/ai-milvus-docker b/defconfigs/ai-milvus-docker
new file mode 100644
index 00000000..ef5aa029
--- /dev/null
+++ b/defconfigs/ai-milvus-docker
@@ -0,0 +1,113 @@
+# AI benchmarking configuration for Milvus vector database testing
+CONFIG_KDEVOPS_FIRST_RUN=n
+CONFIG_LIBVIRT=y
+CONFIG_LIBVIRT_URI="qemu:///system"
+CONFIG_LIBVIRT_HOST_PASSTHROUGH=y
+CONFIG_LIBVIRT_MACHINE_TYPE_DEFAULT=y
+CONFIG_LIBVIRT_CPU_MODEL_PASSTHROUGH=y
+CONFIG_LIBVIRT_VCPUS=4
+CONFIG_LIBVIRT_RAM=8192
+CONFIG_LIBVIRT_OS_VARIANT="generic"
+CONFIG_LIBVIRT_STORAGE_POOL_PATH_CUSTOM=n
+CONFIG_LIBVIRT_STORAGE_POOL_CREATE=y
+CONFIG_LIBVIRT_EXTRA_STORAGE_DRIVE_NVME=y
+CONFIG_LIBVIRT_EXTRA_STORAGE_DRIVE_SIZE="100"
+
+# Network configuration
+CONFIG_KDEVOPS_NETWORK_TYPE_NATUAL_BRIDGE=y
+
+# Workflow configuration
+CONFIG_WORKFLOWS=y
+CONFIG_WORKFLOWS_TESTS=y
+CONFIG_WORKFLOWS_LINUX_TESTS=y
+CONFIG_WORKFLOWS_DEDICATED_WORKFLOW=y
+CONFIG_KDEVOPS_WORKFLOW_DEDICATE_AI=y
+
+# AI workflow configuration
+CONFIG_AI_TESTS_VECTOR_DATABASE=y
+CONFIG_AI_VECTOR_DB_MILVUS=y
+CONFIG_AI_VECTOR_DB_MILVUS_DOCKER=y
+
+# Milvus Docker configuration
+CONFIG_AI_VECTOR_DB_MILVUS_CONTAINER_IMAGE_2_5=y
+CONFIG_AI_VECTOR_DB_MILVUS_CONTAINER_IMAGE_STRING="milvusdb/milvus:v2.5.10"
+CONFIG_AI_VECTOR_DB_MILVUS_CONTAINER_NAME="milvus-ai-benchmark"
+CONFIG_AI_VECTOR_DB_MILVUS_ETCD_CONTAINER_IMAGE_STRING="quay.io/coreos/etcd:v3.5.18"
+CONFIG_AI_VECTOR_DB_MILVUS_ETCD_CONTAINER_NAME="milvus-etcd"
+CONFIG_AI_VECTOR_DB_MILVUS_MINIO_CONTAINER_IMAGE_STRING="minio/minio:RELEASE.2023-03-20T20-16-18Z"
+CONFIG_AI_VECTOR_DB_MILVUS_MINIO_CONTAINER_NAME="milvus-minio"
+CONFIG_AI_VECTOR_DB_MILVUS_MINIO_ACCESS_KEY="minioadmin"
+CONFIG_AI_VECTOR_DB_MILVUS_MINIO_SECRET_KEY="minioadmin"
+
+# Docker storage configuration
+CONFIG_AI_VECTOR_DB_MILVUS_DOCKER_DATA_PATH="/data/milvus-data"
+CONFIG_AI_VECTOR_DB_MILVUS_DOCKER_ETCD_DATA_PATH="/data/milvus-etcd"
+CONFIG_AI_VECTOR_DB_MILVUS_DOCKER_MINIO_DATA_PATH="/data/milvus-minio"
+CONFIG_AI_VECTOR_DB_MILVUS_DOCKER_NETWORK_NAME="milvus-network"
+
+# Docker ports
+CONFIG_AI_VECTOR_DB_MILVUS_PORT=19530
+CONFIG_AI_VECTOR_DB_MILVUS_WEB_UI_PORT=9091
+CONFIG_AI_VECTOR_DB_MILVUS_MINIO_API_PORT=9000
+CONFIG_AI_VECTOR_DB_MILVUS_MINIO_CONSOLE_PORT=9001
+CONFIG_AI_VECTOR_DB_MILVUS_ETCD_CLIENT_PORT=2379
+CONFIG_AI_VECTOR_DB_MILVUS_ETCD_PEER_PORT=2380
+
+# Docker resource limits
+CONFIG_AI_VECTOR_DB_MILVUS_MEMORY_LIMIT="8g"
+CONFIG_AI_VECTOR_DB_MILVUS_CPU_LIMIT="4.0"
+CONFIG_AI_VECTOR_DB_MILVUS_ETCD_MEMORY_LIMIT="1g"
+CONFIG_AI_VECTOR_DB_MILVUS_MINIO_MEMORY_LIMIT="2g"
+
+# Milvus connection configuration
+CONFIG_AI_VECTOR_DB_MILVUS_COLLECTION_NAME="benchmark_collection"
+CONFIG_AI_VECTOR_DB_MILVUS_DIMENSION=768
+CONFIG_AI_VECTOR_DB_MILVUS_DATASET_SIZE=1000000
+CONFIG_AI_VECTOR_DB_MILVUS_BATCH_SIZE=10000
+CONFIG_AI_VECTOR_DB_MILVUS_NUM_QUERIES=10000
+
+# Benchmark configuration
+CONFIG_AI_BENCHMARK_ITERATIONS=3
+# Vector dataset configuration
+CONFIG_AI_VECTOR_DB_MILVUS_DIMENSION=128
+
+# Test runtime configuration
+CONFIG_AI_BENCHMARK_RUNTIME="180"
+CONFIG_AI_BENCHMARK_WARMUP_TIME="30"
+
+# Query patterns for CI testing
+CONFIG_AI_BENCHMARK_QUERY_TOPK_1=y
+CONFIG_AI_BENCHMARK_QUERY_TOPK_10=y
+CONFIG_AI_BENCHMARK_QUERY_TOPK_100=n
+
+# Batch size configuration for CI
+CONFIG_AI_BENCHMARK_BATCH_1=y
+CONFIG_AI_BENCHMARK_BATCH_10=y
+CONFIG_AI_BENCHMARK_BATCH_100=n
+
+# Index configuration
+CONFIG_AI_INDEX_HNSW=y
+CONFIG_AI_INDEX_TYPE="HNSW"
+CONFIG_AI_INDEX_HNSW_M=16
+CONFIG_AI_INDEX_HNSW_EF_CONSTRUCTION=200
+CONFIG_AI_INDEX_HNSW_EF=64
+
+# Results and graphing
+CONFIG_AI_BENCHMARK_RESULTS_DIR="/data/ai-benchmark"
+CONFIG_AI_BENCHMARK_ENABLE_GRAPHING=y
+CONFIG_AI_BENCHMARK_GRAPH_FORMAT="png"
+CONFIG_AI_BENCHMARK_GRAPH_DPI=300
+CONFIG_AI_BENCHMARK_GRAPH_THEME="default"
+
+# Filesystem configuration
+CONFIG_AI_FILESYSTEM_XFS=y
+CONFIG_AI_FILESYSTEM="xfs"
+CONFIG_AI_FSTYPE="xfs"
+CONFIG_AI_XFS_MKFS_OPTS="-f -s size=4096"
+CONFIG_AI_XFS_MOUNT_OPTS="rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota"
+
+# Baseline/dev testing setup
+CONFIG_KDEVOPS_BASELINE_AND_DEV=y
+# Build Linux
+CONFIG_WORKFLOW_LINUX_CUSTOM=y
+CONFIG_BOOTLINUX_AB_DIFFERENT_REF=y
diff --git a/defconfigs/ai-milvus-docker-ci b/defconfigs/ai-milvus-docker-ci
new file mode 100644
index 00000000..144a6490
--- /dev/null
+++ b/defconfigs/ai-milvus-docker-ci
@@ -0,0 +1,51 @@
+# SPDX-License-Identifier: copyleft-next-0.3.1
+#
+# AI vector database benchmarking for CI testing
+# Uses minimal dataset size and short runtime for quick verification
+
+CONFIG_KDEVOPS_FIRST_RUN=y
+CONFIG_GUESTFS=y
+CONFIG_GUESTFS_DEBIAN=y
+CONFIG_GUESTFS_DEBIAN_TRIXIE=y
+
+# Enable AI workflow
+CONFIG_WORKFLOWS_TESTS=y
+CONFIG_WORKFLOWS_LINUX_TESTS=y
+CONFIG_WORKFLOWS_DEDICATED_WORKFLOW=y
+CONFIG_KDEVOPS_WORKFLOW_DEDICATE_AI=y
+CONFIG_AI_TESTS_VECTOR_DATABASE=y
+
+# Docker deployment
+CONFIG_AI_VECTOR_DB_MILVUS=y
+CONFIG_AI_VECTOR_DB_MILVUS_DOCKER=y
+
+# CI-optimized: Use custom small dataset
+CONFIG_AI_DATASET_CUSTOM=y
+
+# Small vector dimensions for faster processing
+CONFIG_AI_VECTOR_DIM_128=y
+
+# Minimal query configurations
+CONFIG_AI_BENCHMARK_QUERY_TOPK_1=y
+CONFIG_AI_BENCHMARK_BATCH_1=y
+
+# Fast HNSW indexing
+CONFIG_AI_INDEX_HNSW=y
+
+# Short runtime for CI
+# These will be overridden by environment variables in CI:
+# AI_VECTOR_DATASET_SIZE=1000
+# AI_BENCHMARK_RUNTIME=30
+
+# Reduced resource limits for CI
+CONFIG_AI_VECTOR_DB_MILVUS_MEMORY_LIMIT="2g"
+CONFIG_AI_VECTOR_DB_MILVUS_CPU_LIMIT="2.0"
+
+# Enable graphing for result verification
+CONFIG_AI_BENCHMARK_ENABLE_GRAPHING=y
+
+# XFS filesystem (fastest for AI workloads)
+CONFIG_AI_FILESYSTEM_XFS=y
+
+# A/B testing enabled for baseline/dev comparison
+CONFIG_KDEVOPS_BASELINE_AND_DEV=y
diff --git a/docs/ai/README.md b/docs/ai/README.md
new file mode 100644
index 00000000..94f9f6c0
--- /dev/null
+++ b/docs/ai/README.md
@@ -0,0 +1,108 @@
+# AI Workflow Documentation
+
+The kdevops AI workflow provides infrastructure for benchmarking and testing AI/ML systems, with initial support for vector databases.
+
+## Quick Start
+
+Just like other kdevops workflows (fstests, blktests), the AI workflow follows the same pattern:
+
+```bash
+make defconfig-ai-milvus-docker # Configure for AI vector database testing
+make bringup # Bring up the test environment
+make ai # Run the AI benchmarks
+make ai-baseline # Establish baseline results
+make ai-results # View results
+```
+
+## Supported Components
+
+### Vector Databases
+- [Milvus](vector-databases/milvus.md) - High-performance vector database for AI applications
+
+### Future Components (Planned)
+- Language Models (LLMs)
+- Embedding Services
+- Training Infrastructure
+- Inference Servers
+
+## Configuration Options
+
+The AI workflow can be configured through `make menuconfig`:
+
+1. **Vector Database Selection**
+   - Milvus (Docker or Native deployment)
+   - Future: Weaviate, Qdrant, Pinecone
+
+2. **Dataset Configuration**
+   - Dataset size (number of vectors)
+   - Vector dimensions
+   - Batch sizes
+
+3. **Benchmark Parameters**
+   - Query patterns
+   - Concurrency levels
+   - Runtime duration
+
+4. **Filesystem Testing**
+   - Test on different filesystems (XFS, ext4, btrfs)
+   - Compare performance across storage configurations
+
+## Pre-built Configurations
+
+Quick configurations for common use cases:
+
+- `defconfig-ai-milvus-docker` - Docker-based Milvus deployment
+- `defconfig-ai-milvus-docker-ci` - CI-optimized with minimal dataset
+- `defconfig-ai-milvus-native` - Native Milvus installation from source
+- `defconfig-ai-milvus-multifs` - Multi-filesystem performance comparison
+
+## A/B Testing Support
+
+Like other kdevops workflows, AI supports baseline/dev comparisons:
+
+```bash
+# Configure with A/B testing
+make menuconfig  # Enable CONFIG_KDEVOPS_BASELINE_AND_DEV
+make ai-baseline # Run on baseline
+make ai-dev # Run on dev
+make ai-results # Compare results
+```
+
+## Results and Analysis
+
+The AI workflow generates comprehensive performance metrics:
+
+- Throughput (operations/second)
+- Latency percentiles (p50, p95, p99)
+- Resource utilization
+- Performance graphs and trends
+
+Results are stored in the configured results directory (default: `/data/ai-results/`).
+
+## Integration with CI/CD
+
+The workflow includes CI-optimized configurations that use:
+- Minimal datasets for quick validation
+- `/dev/null` storage for I/O testing without disk requirements
+- Environment variable overrides for runtime configuration
+
+Example CI usage:
+```bash
+AI_VECTOR_DATASET_SIZE=1000 AI_BENCHMARK_RUNTIME=30 make defconfig-ai-milvus-docker-ci
+make bringup
+make ai
+```
+
+## Workflow Architecture
+
+The AI workflow follows kdevops patterns:
+
+1. **Configuration** - Kconfig-based configuration system
+2. **Provisioning** - Ansible-based infrastructure setup
+3. **Execution** - Standardized test execution
+4. **Collection** - Automated result collection and analysis
+5. **Reporting** - Performance visualization and comparison
+
+For detailed usage of specific components, see:
+- [Vector Databases Overview](vector-databases/README.md)
+- [Milvus Usage Guide](vector-databases/milvus.md)
diff --git a/docs/ai/vector-databases/README.md b/docs/ai/vector-databases/README.md
new file mode 100644
index 00000000..2a3955d7
--- /dev/null
+++ b/docs/ai/vector-databases/README.md
@@ -0,0 +1,76 @@
+# Vector Database Testing
+
+Vector databases are specialized systems designed to store and search high-dimensional vectors, essential for modern AI applications like semantic search, recommendation systems, and RAG (Retrieval-Augmented Generation).
+
+## Overview
+
+The kdevops AI workflow supports comprehensive benchmarking of vector databases to evaluate:
+
+- **Ingestion Performance**: How fast vectors can be indexed
+- **Query Performance**: Search latency and throughput
+- **Scalability**: Performance under different dataset sizes
+- **Storage Efficiency**: Filesystem and storage backend impact
+- **Resource Utilization**: CPU, memory, and I/O patterns
+
+## Supported Vector Databases
+
+### Currently Implemented
+- **[Milvus](milvus.md)** - Industry-leading vector database with comprehensive feature set
+
+### Planned Support
+- **Weaviate** - GraphQL-based vector search engine
+- **Qdrant** - High-performance vector similarity search
+- **Pinecone** - Cloud-native vector database
+- **ChromaDB** - Embedded vector database
+
+## Common Benchmark Patterns
+
+All vector database benchmarks follow similar patterns:
+
+1. **Data Ingestion**
+   - Generate or load vector datasets
+   - Create collections/indexes
+   - Insert vectors in batches
+   - Measure indexing performance
+
+2. **Query Workloads**
+   - Single vector searches
+   - Batch query processing
+   - Filtered searches
+   - Range queries
+
+3. **Performance Metrics**
+   - Queries per second (QPS)
+   - Latency percentiles
+   - Recall accuracy
+   - Resource consumption
+
+## Filesystem Impact
+
+Vector databases heavily depend on storage performance. The workflow tests across:
+
+- **XFS**: Default for many production deployments
+- **ext4**: Traditional Linux filesystem
+- **btrfs**: Copy-on-write with compression support
+- **ZFS**: Advanced features for data integrity
+
+## Configuration Dimensions
+
+Vector database testing explores multiple dimensions:
+
+- **Vector Dimensions**: 128, 256, 512, 768, 1536
+- **Dataset Sizes**: 100K to 100M+ vectors
+- **Index Types**: HNSW, IVF, Flat, Annoy
+- **Distance Metrics**: L2, Cosine, IP
+- **Batch Sizes**: Impact on ingestion/query performance
+
+## Quick Start Example
+
+```bash
+make defconfig-ai-milvus-docker # Configure for Milvus testing
+make bringup # Start the environment
+make ai # Run benchmarks
+make ai-results # Check results
+```
+
+See individual database guides for detailed configuration and usage instructions.
diff --git a/docs/ai/vector-databases/milvus.md b/docs/ai/vector-databases/milvus.md
new file mode 100644
index 00000000..11172774
--- /dev/null
+++ b/docs/ai/vector-databases/milvus.md
@@ -0,0 +1,264 @@
+# Milvus Vector Database Testing
+
+Milvus is a high-performance, cloud-native vector database designed for billion-scale vector similarity search. This guide explains how to benchmark Milvus using the kdevops AI workflow.
+
+## Quick Start
+
+### Basic Workflow
+
+Just like fstests or blktests, the Milvus workflow follows the standard kdevops pattern:
+
+```bash
+make defconfig-ai-milvus-docker # 1. Configure for Milvus testing
+make bringup # 2. Provision the test environment
+make ai # 3. Run the Milvus benchmarks
+make ai-baseline # 4. Establish baseline performance
+make ai-results # 5. View results
+```
+
+That's it! The workflow handles all the complexity of setting up Milvus, generating test data, and running comprehensive benchmarks.
+
+## Deployment Options
+
+### Docker Deployment (Recommended)
+
+The easiest way to test Milvus:
+
+```bash
+make defconfig-ai-milvus-docker
+make bringup
+make ai
+```
+
+This deploys Milvus using Docker Compose with:
+- Milvus standalone server
+- etcd for metadata storage
+- MinIO for object storage
+- Automatic service orchestration
+
+### Native Deployment
+
+For testing Milvus performance without containerization overhead:
+
+```bash
+make defconfig-ai-milvus-native
+make bringup
+make ai
+```
+
+Builds Milvus from source and runs directly on the VM.
+
+### CI/Quick Test Mode
+
+For rapid validation in CI pipelines:
+
+```bash
+# Uses minimal dataset (1000 vectors) and short runtime (30s)
+make defconfig-ai-milvus-docker-ci
+make bringup
+make ai
+```
+
+Or with environment overrides:
+```bash
+AI_VECTOR_DATASET_SIZE=5000 AI_BENCHMARK_RUNTIME=60 make ai
+```
+
+## What Actually Happens
+
+When you run `make ai`, the workflow:
+
+1. **Deploys Milvus** - Starts all required services
+2. **Generates Test Data** - Creates random vectors of configured dimensions
+3. **Creates Collection** - Sets up Milvus collection with appropriate schema
+4. **Ingests Data** - Inserts vectors in batches, measuring throughput
+5. **Builds Index** - Creates HNSW/IVF index on vectors
+6. **Runs Queries** - Executes search workload with various patterns
+7. **Collects Metrics** - Gathers performance data and system metrics
+8. **Generates Reports** - Creates graphs and summary statistics
+
+## Configuration Options
+
+### Via menuconfig
+
+```bash
+make menuconfig
+# Navigate to: Workflows → AI → Vector Databases → Milvus
+```
+
+Key configuration options:
+
+- **Deployment Type**: Docker vs Native
+- **Dataset Size**: 100K to 100M+ vectors (default: 1M)
+- **Vector Dimensions**: 128, 256, 512, 768, 1536 (default: 768)
+- **Batch Size**: Vectors per insert batch (default: 10K)
+- **Index Type**: HNSW, IVF_FLAT, IVF_SQ8
+- **Query Count**: Number of search queries to run
+
+### Via Environment Variables
+
+Override configurations at runtime:
+
+```bash
+# Quick test with small dataset
+AI_VECTOR_DATASET_SIZE=10000 make ai
+
+# Extended benchmark
+AI_BENCHMARK_RUNTIME=3600 make ai
+
+# Custom vector dimensions
+AI_VECTOR_DIMENSIONS=1536 make ai
+```
+
+## Filesystem Testing
+
+Test Milvus performance on different filesystems:
+
+```bash
+# Test on multiple filesystems
+make defconfig-ai-milvus-multifs
+make bringup
+make ai
+
+# Creates separate VMs for each filesystem:
+# - XFS with various configurations
+# - ext4 with bigalloc
+# - btrfs with compression
+```
+
+## A/B Testing
+
+Compare baseline vs development configurations:
+
+```bash
+# Enable A/B testing in menuconfig
+make menuconfig  # Enable CONFIG_KDEVOPS_BASELINE_AND_DEV
+make ai-baseline # Run baseline
+# Make changes (kernel, filesystem, Milvus config)
+make ai-dev   # Run on dev
+make ai-results # Compare results
+```
+
+## Understanding Results
+
+Results are stored in `/data/ai-results/` (configurable) with:
+
+### Performance Metrics
+- **Ingestion Rate**: Vectors indexed per second
+- **Query Latency**: p50, p95, p99 latencies
+- **Query Throughput**: Queries per second (QPS)
+- **Index Build Time**: Time to build vector index
+- **Resource Usage**: CPU, memory, disk I/O
+
+### Output Files
+```
+/data/ai-results/
+├── milvus_benchmark_results.json    # Raw benchmark data
+├── performance_summary.txt          # Human-readable summary
+├── graphs/
+│   ├── ingestion_throughput.png
+│   ├── query_latency_percentiles.png
+│   └── qps_over_time.png
+└── system_metrics/                  # iostat, vmstat data
+```
+
+## Common Tasks
+
+### View Current Milvus Status
+```bash
+ansible all -m shell -a "docker ps | grep milvus"
+```
+
+### Check Milvus Logs
+```bash
+ansible all -m shell -a "docker logs milvus-standalone"
+```
+
+### Reset and Re-run
+```bash
+make ai-destroy  # Clean up Milvus
+make ai         # Fresh run
+```
+
+### Run Specific Phases
+```bash
+make ai-vector-db-milvus-install    # Just install Milvus
+make ai-vector-db-milvus-benchmark  # Just run benchmarks
+make ai-vector-db-milvus-destroy    # Clean up
+```
+
+## Advanced Configuration
+
+### Custom Index Parameters
+
+Edit Milvus collection configuration in menuconfig:
+- HNSW: M (connections), efConstruction
+- IVF: nlist (clusters), nprobe
+- Metric Type: L2, IP, Cosine
+
+### Resource Limits
+
+For Docker deployment:
+```
+CONFIG_AI_VECTOR_DB_MILVUS_MEMORY_LIMIT="8g"
+CONFIG_AI_VECTOR_DB_MILVUS_CPU_LIMIT="4.0"
+```
+
+### Multi-Node Testing
+
+Future support for distributed Milvus cluster testing across multiple nodes.
+
+## Troubleshooting
+
+### Common Issues
+
+1. **Out of Memory**: Reduce dataset size or increase VM memory
+2. **Slow Ingestion**: Check disk I/O, consider faster storage
+3. **Docker Issues**: Ensure Docker service is running on VMs
+
+### Debug Commands
+
+```bash
+# Check Milvus health
+ansible all -m uri -a "url=http://localhost:9091/health"
+
+# View resource usage
+ansible all -m shell -a "docker stats --no-stream"
+
+# Check disk space
+ansible all -m shell -a "df -h /data"
+```
+
+## Performance Tuning Tips
+
+1. **Storage**: Use NVMe/SSD for best performance
+2. **Memory**: Ensure sufficient RAM for dataset + indexes
+3. **CPU**: More cores help with parallel ingestion
+4. **Filesystem**: XFS often performs best for Milvus workloads
+5. **Batch Size**: Larger batches improve ingestion throughput
+
+## Integration with CI/CD
+
+Example GitHub Actions workflow:
+
+```yaml
+- name: Run Milvus CI benchmark
+  run: |
+    AI_VECTOR_DATASET_SIZE=1000 \
+    AI_BENCHMARK_RUNTIME=30 \
+    make defconfig-ai-milvus-docker-ci
+    make bringup
+    make ai
+    make ai-results
+```
+
+## Summary
+
+The Milvus workflow in kdevops makes it simple to:
+- Quickly deploy and benchmark Milvus
+- Compare performance across configurations
+- Test filesystem and kernel impacts
+- Generate reproducible results
+- Scale from quick CI tests to comprehensive benchmarks
+
+Just like running `make fstests`, you can now run `make ai` to benchmark vector databases!
diff --git a/kconfigs/workflows/Kconfig b/kconfigs/workflows/Kconfig
index 6b2a3769..70898a1a 100644
--- a/kconfigs/workflows/Kconfig
+++ b/kconfigs/workflows/Kconfig
@@ -214,6 +214,13 @@ config KDEVOPS_WORKFLOW_DEDICATE_FIO_TESTS
 	  This will dedicate your configuration to running only the
 	  fio-tests workflow for comprehensive storage performance testing.
 
+config KDEVOPS_WORKFLOW_DEDICATE_AI
+	bool "ai"
+	select KDEVOPS_WORKFLOW_ENABLE_AI
+	help
+	  This will dedicate your configuration to running only the
+	  AI workflow for vector database performance testing.
+
 endchoice
 
 config KDEVOPS_WORKFLOW_NAME
@@ -229,6 +236,7 @@ config KDEVOPS_WORKFLOW_NAME
 	default "sysbench" if KDEVOPS_WORKFLOW_DEDICATE_SYSBENCH
 	default "mmtests" if KDEVOPS_WORKFLOW_DEDICATE_MMTESTS
 	default "fio-tests" if KDEVOPS_WORKFLOW_DEDICATE_FIO_TESTS
+	default "ai" if KDEVOPS_WORKFLOW_DEDICATE_AI
 
 endif
 
@@ -338,6 +346,14 @@ config KDEVOPS_WORKFLOW_NOT_DEDICATED_ENABLE_FIO_TESTS
 	  Select this option if you want to provision fio-tests on a
 	  single target node for by-hand testing.
 
+config KDEVOPS_WORKFLOW_NOT_DEDICATED_ENABLE_AI
+	bool "ai"
+	select KDEVOPS_WORKFLOW_ENABLE_AI
+	depends on LIBVIRT || TERRAFORM_PRIVATE_NET
+	help
+	  Select this option if you want to provision AI benchmarks on a
+	  single target node for by-hand testing.
+
 endif # !WORKFLOWS_DEDICATED_WORKFLOW
 
 config KDEVOPS_WORKFLOW_ENABLE_FSTESTS
@@ -462,6 +478,17 @@ source "workflows/fio-tests/Kconfig"
 endmenu
 endif # KDEVOPS_WORKFLOW_ENABLE_FIO_TESTS
 
+config KDEVOPS_WORKFLOW_ENABLE_AI
+	bool
+	output yaml
+	default y if KDEVOPS_WORKFLOW_NOT_DEDICATED_ENABLE_AI || KDEVOPS_WORKFLOW_DEDICATE_AI
+
+if KDEVOPS_WORKFLOW_ENABLE_AI
+menu "Configure and run AI benchmarks"
+source "workflows/ai/Kconfig"
+endmenu
+endif # KDEVOPS_WORKFLOW_ENABLE_AI
+
 config KDEVOPS_WORKFLOW_ENABLE_SSD_STEADY_STATE
        bool "Attain SSD steady state prior to tests"
        output yaml
diff --git a/playbooks/ai.yml b/playbooks/ai.yml
new file mode 100644
index 00000000..b1613309
--- /dev/null
+++ b/playbooks/ai.yml
@@ -0,0 +1,11 @@
+---
+# Main AI workflow orchestration playbook
+# This demonstrates the scalable structure for AI workflows
+
+- name: AI Workflow - Vector Database Setup
+  ansible.builtin.import_playbook: ai_install.yml
+  when: ai_workflow_vector_db | default(true) | bool
+  tags: ['ai', 'setup']
+
+# Benchmarks are run separately via make ai-tests targets
+# They should not run during the setup phase (make ai)
diff --git a/playbooks/ai_benchmark.yml b/playbooks/ai_benchmark.yml
new file mode 100644
index 00000000..85fc117c
--- /dev/null
+++ b/playbooks/ai_benchmark.yml
@@ -0,0 +1,8 @@
+---
+- name: Run Milvus Vector Database Benchmarks
+  hosts: ai
+  vars:
+    ai_vector_db_milvus_benchmark_enable: true
+  roles:
+    - role: milvus
+      tags: ['ai', 'vector_db', 'milvus', 'benchmark']
diff --git a/playbooks/ai_destroy.yml b/playbooks/ai_destroy.yml
new file mode 100644
index 00000000..eef07b2a
--- /dev/null
+++ b/playbooks/ai_destroy.yml
@@ -0,0 +1,24 @@
+---
+- name: Destroy Milvus Vector Database
+  hosts: ai
+  become: true
+  tasks:
+    - name: Stop Milvus containers
+      community.docker.docker_compose:
+        project_src: "{{ ai_vector_db_milvus_config_dir }}"
+        state: absent
+      when: ai_vector_db_milvus_docker | bool
+      # TODO: Review - was ignore_errors: true
+      failed_when: false  # Always succeed - review this condition
+
+    - name: Remove Milvus data directories
+      ansible.builtin.file:
+        path: "{{ item }}"
+        state: absent
+      loop:
+        - "{{ ai_vector_db_milvus_data_dir }}"
+        - "{{ ai_vector_db_milvus_config_dir }}"
+        - "{{ ai_vector_db_milvus_log_dir }}"
+      when: ai_vector_db_force_destroy | default(false) | bool
+
+  tags: ['ai', 'vector_db', 'milvus', 'destroy']
diff --git a/playbooks/ai_install.yml b/playbooks/ai_install.yml
new file mode 100644
index 00000000..70b734e4
--- /dev/null
+++ b/playbooks/ai_install.yml
@@ -0,0 +1,8 @@
+---
+- name: Install Milvus Vector Database
+  hosts: ai
+  become: true
+  become_user: root
+  roles:
+    - role: milvus
+      tags: ['ai', 'vector_db', 'milvus', 'install']
diff --git a/playbooks/ai_results.yml b/playbooks/ai_results.yml
new file mode 100644
index 00000000..881295eb
--- /dev/null
+++ b/playbooks/ai_results.yml
@@ -0,0 +1,6 @@
+---
+- name: Collect and analyze AI benchmark results
+  hosts: ai
+  roles:
+    - ai_collect_results
+  tags: ['ai', 'ai_results']
diff --git a/playbooks/ai_setup.yml b/playbooks/ai_setup.yml
new file mode 100644
index 00000000..f0007ee2
--- /dev/null
+++ b/playbooks/ai_setup.yml
@@ -0,0 +1,6 @@
+---
+- name: Setup AI benchmark environment
+  hosts: ai
+  roles:
+    - ai_setup
+  tags: ['ai', 'ai_setup']
diff --git a/playbooks/ai_tests.yml b/playbooks/ai_tests.yml
new file mode 100644
index 00000000..1a5638fc
--- /dev/null
+++ b/playbooks/ai_tests.yml
@@ -0,0 +1,31 @@
+---
+# AI Tests/Benchmarks playbook
+# This ensures AI infrastructure is setup before running benchmarks
+
+- name: AI Tests - Ensure Milvus is installed
+  hosts: ai
+  become: true
+  become_user: root
+  roles:
+    - role: milvus
+      when: ai_vector_db_milvus | default(false) | bool
+      tags: ['ai', 'milvus', 'setup']
+
+- name: AI Tests - Vector Database Benchmarks
+  hosts: ai
+  become: true
+  vars:
+    # Skip infrastructure setup when running tests
+    ai_skip_setup: true
+  roles:
+    - role: ai_run_benchmarks
+      when: ai_vector_db_milvus | default(false) | bool
+      tags: ['ai', 'benchmark']
+
+- name: AI Tests - Results Collection
+  hosts: ai
+  become: true
+  roles:
+    - role: ai_collect_results
+      when: ai_collect_results | default(true) | bool
+      tags: ['ai', 'results']
diff --git a/playbooks/ai_uninstall.yml b/playbooks/ai_uninstall.yml
new file mode 100644
index 00000000..fb537664
--- /dev/null
+++ b/playbooks/ai_uninstall.yml
@@ -0,0 +1,6 @@
+---
+- name: Uninstall AI benchmark components
+  hosts: ai
+  roles:
+    - ai_uninstall
+  tags: ['ai', 'ai_uninstall']
diff --git a/playbooks/host_vars/debian13-ai-btrfs-default-dev.yml b/playbooks/host_vars/debian13-ai-btrfs-default-dev.yml
new file mode 100644
index 00000000..85b95a52
--- /dev/null
+++ b/playbooks/host_vars/debian13-ai-btrfs-default-dev.yml
@@ -0,0 +1,8 @@
+---
+# btrfs default configuration (dev)
+ai_docker_fstype: "btrfs"
+ai_docker_btrfs_mkfs_opts: "-f"
+filesystem_type: "btrfs"
+filesystem_block_size: "default"
+ai_filesystem: "btrfs"
+ai_data_device_path: "/var/lib/docker"
diff --git a/playbooks/host_vars/debian13-ai-btrfs-default.yml b/playbooks/host_vars/debian13-ai-btrfs-default.yml
new file mode 100644
index 00000000..f4f18b9e
--- /dev/null
+++ b/playbooks/host_vars/debian13-ai-btrfs-default.yml
@@ -0,0 +1,8 @@
+---
+# btrfs default configuration
+ai_docker_fstype: "btrfs"
+ai_docker_btrfs_mkfs_opts: "-f"
+filesystem_type: "btrfs"
+filesystem_block_size: "default"
+ai_filesystem: "btrfs"
+ai_data_device_path: "/var/lib/docker"
diff --git a/playbooks/host_vars/debian13-ai-ext4-16k-bigalloc-dev.yml b/playbooks/host_vars/debian13-ai-ext4-16k-bigalloc-dev.yml
new file mode 100644
index 00000000..e4b1a9da
--- /dev/null
+++ b/playbooks/host_vars/debian13-ai-ext4-16k-bigalloc-dev.yml
@@ -0,0 +1,8 @@
+---
+# ext4 16k bigalloc configuration (dev)
+ai_docker_fstype: "ext4"
+ai_docker_ext4_mkfs_opts: "-b 4096 -C 16384 -O bigalloc"
+filesystem_type: "ext4"
+filesystem_block_size: "16k-bigalloc"
+ai_filesystem: "ext4"
+ai_data_device_path: "/var/lib/docker"
diff --git a/playbooks/host_vars/debian13-ai-ext4-16k-bigalloc.yml b/playbooks/host_vars/debian13-ai-ext4-16k-bigalloc.yml
new file mode 100644
index 00000000..a5624440
--- /dev/null
+++ b/playbooks/host_vars/debian13-ai-ext4-16k-bigalloc.yml
@@ -0,0 +1,8 @@
+---
+# ext4 16k bigalloc configuration
+ai_docker_fstype: "ext4"
+ai_docker_ext4_mkfs_opts: "-b 4096 -C 16384 -O bigalloc"
+filesystem_type: "ext4"
+filesystem_block_size: "16k-bigalloc"
+ai_filesystem: "ext4"
+ai_data_device_path: "/var/lib/docker"
diff --git a/playbooks/host_vars/debian13-ai-ext4-4k-dev.yml b/playbooks/host_vars/debian13-ai-ext4-4k-dev.yml
new file mode 100644
index 00000000..6ca5fec5
--- /dev/null
+++ b/playbooks/host_vars/debian13-ai-ext4-4k-dev.yml
@@ -0,0 +1,8 @@
+---
+# ext4 4k block configuration (dev)
+ai_docker_fstype: "ext4"
+ai_docker_ext4_mkfs_opts: "-b 4096"
+filesystem_type: "ext4"
+filesystem_block_size: "4k"
+ai_filesystem: "ext4"
+ai_data_device_path: "/var/lib/docker"
diff --git a/playbooks/host_vars/debian13-ai-ext4-4k.yml b/playbooks/host_vars/debian13-ai-ext4-4k.yml
new file mode 100644
index 00000000..f2840faa
--- /dev/null
+++ b/playbooks/host_vars/debian13-ai-ext4-4k.yml
@@ -0,0 +1,8 @@
+---
+# ext4 4k block configuration
+ai_docker_fstype: "ext4"
+ai_docker_ext4_mkfs_opts: "-b 4096"
+filesystem_type: "ext4"
+filesystem_block_size: "4k"
+ai_filesystem: "ext4"
+ai_data_device_path: "/var/lib/docker"
diff --git a/playbooks/host_vars/debian13-ai-xfs-16k-4ks-dev.yml b/playbooks/host_vars/debian13-ai-xfs-16k-4ks-dev.yml
new file mode 100644
index 00000000..429e6461
--- /dev/null
+++ b/playbooks/host_vars/debian13-ai-xfs-16k-4ks-dev.yml
@@ -0,0 +1,10 @@
+---
+# XFS 16k block, 4k sector configuration (dev)
+ai_docker_fstype: "xfs"
+ai_docker_xfs_blocksize: 16384
+ai_docker_xfs_sectorsize: 4096
+ai_docker_xfs_mkfs_opts: ""
+filesystem_type: "xfs"
+filesystem_block_size: "16k-4ks"
+ai_filesystem: "xfs"
+ai_data_device_path: "/var/lib/docker"
diff --git a/playbooks/host_vars/debian13-ai-xfs-16k-4ks.yml b/playbooks/host_vars/debian13-ai-xfs-16k-4ks.yml
new file mode 100644
index 00000000..15200810
--- /dev/null
+++ b/playbooks/host_vars/debian13-ai-xfs-16k-4ks.yml
@@ -0,0 +1,10 @@
+---
+# XFS 16k block, 4k sector configuration  
+ai_docker_fstype: "xfs"
+ai_docker_xfs_blocksize: 16384
+ai_docker_xfs_sectorsize: 4096
+ai_docker_xfs_mkfs_opts: ""
+filesystem_type: "xfs"
+filesystem_block_size: "16k-4ks"
+ai_filesystem: "xfs"
+ai_data_device_path: "/var/lib/docker"
diff --git a/playbooks/host_vars/debian13-ai-xfs-32k-4ks-dev.yml b/playbooks/host_vars/debian13-ai-xfs-32k-4ks-dev.yml
new file mode 100644
index 00000000..6f30a053
--- /dev/null
+++ b/playbooks/host_vars/debian13-ai-xfs-32k-4ks-dev.yml
@@ -0,0 +1,10 @@
+---
+# XFS 32k block, 4k sector configuration (dev)
+ai_docker_fstype: "xfs"
+ai_docker_xfs_blocksize: 32768
+ai_docker_xfs_sectorsize: 4096
+ai_docker_xfs_mkfs_opts: ""
+filesystem_type: "xfs"
+filesystem_block_size: "32k-4ks"
+ai_filesystem: "xfs"
+ai_data_device_path: "/var/lib/docker"
diff --git a/playbooks/host_vars/debian13-ai-xfs-32k-4ks.yml b/playbooks/host_vars/debian13-ai-xfs-32k-4ks.yml
new file mode 100644
index 00000000..4c78e9a4
--- /dev/null
+++ b/playbooks/host_vars/debian13-ai-xfs-32k-4ks.yml
@@ -0,0 +1,10 @@
+---
+# XFS 32k block, 4k sector configuration
+ai_docker_fstype: "xfs"
+ai_docker_xfs_blocksize: 32768
+ai_docker_xfs_sectorsize: 4096
+ai_docker_xfs_mkfs_opts: ""
+filesystem_type: "xfs"
+filesystem_block_size: "32k-4ks"
+ai_filesystem: "xfs"
+ai_data_device_path: "/var/lib/docker"
diff --git a/playbooks/host_vars/debian13-ai-xfs-4k-4ks-dev.yml b/playbooks/host_vars/debian13-ai-xfs-4k-4ks-dev.yml
new file mode 100644
index 00000000..f8b8c55b
--- /dev/null
+++ b/playbooks/host_vars/debian13-ai-xfs-4k-4ks-dev.yml
@@ -0,0 +1,10 @@
+---
+# XFS 4k block, 4k sector configuration (dev)
+ai_docker_fstype: "xfs"
+ai_docker_xfs_blocksize: 4096
+ai_docker_xfs_sectorsize: 4096
+ai_docker_xfs_mkfs_opts: ""
+filesystem_type: "xfs"
+filesystem_block_size: "4k-4ks"
+ai_filesystem: "xfs"
+ai_data_device_path: "/var/lib/docker"
diff --git a/playbooks/host_vars/debian13-ai-xfs-4k-4ks.yml b/playbooks/host_vars/debian13-ai-xfs-4k-4ks.yml
new file mode 100644
index 00000000..ffe9eb28
--- /dev/null
+++ b/playbooks/host_vars/debian13-ai-xfs-4k-4ks.yml
@@ -0,0 +1,10 @@
+---
+# XFS 4k block, 4k sector configuration
+ai_docker_fstype: "xfs"
+ai_docker_xfs_blocksize: 4096
+ai_docker_xfs_sectorsize: 4096
+ai_docker_xfs_mkfs_opts: ""
+filesystem_type: "xfs"
+filesystem_block_size: "4k-4ks"
+ai_filesystem: "xfs"
+ai_data_device_path: "/var/lib/docker"
\ No newline at end of file
diff --git a/playbooks/host_vars/debian13-ai-xfs-64k-4ks-dev.yml b/playbooks/host_vars/debian13-ai-xfs-64k-4ks-dev.yml
new file mode 100644
index 00000000..1590f154
--- /dev/null
+++ b/playbooks/host_vars/debian13-ai-xfs-64k-4ks-dev.yml
@@ -0,0 +1,10 @@
+---
+# XFS 64k block, 4k sector configuration (dev)
+ai_docker_fstype: "xfs"
+ai_docker_xfs_blocksize: 65536
+ai_docker_xfs_sectorsize: 4096
+ai_docker_xfs_mkfs_opts: ""
+filesystem_type: "xfs"
+filesystem_block_size: "64k-4ks"
+ai_filesystem: "xfs"
+ai_data_device_path: "/var/lib/docker"
diff --git a/playbooks/host_vars/debian13-ai-xfs-64k-4ks.yml b/playbooks/host_vars/debian13-ai-xfs-64k-4ks.yml
new file mode 100644
index 00000000..482835c4
--- /dev/null
+++ b/playbooks/host_vars/debian13-ai-xfs-64k-4ks.yml
@@ -0,0 +1,10 @@
+---
+# XFS 64k block, 4k sector configuration
+ai_docker_fstype: "xfs"
+ai_docker_xfs_blocksize: 65536
+ai_docker_xfs_sectorsize: 4096
+ai_docker_xfs_mkfs_opts: ""
+filesystem_type: "xfs"
+filesystem_block_size: "64k-4ks"
+ai_filesystem: "xfs"
+ai_data_device_path: "/var/lib/docker"
diff --git a/playbooks/roles/ai_collect_results/files/analyze_results.py b/playbooks/roles/ai_collect_results/files/analyze_results.py
new file mode 100755
index 00000000..3d11fb11
--- /dev/null
+++ b/playbooks/roles/ai_collect_results/files/analyze_results.py
@@ -0,0 +1,979 @@
+#!/usr/bin/env python3
+"""
+AI Benchmark Results Analysis and Visualization
+
+This script analyzes benchmark results and generates comprehensive graphs
+showing performance characteristics of the AI workload testing.
+"""
+
+import json
+import glob
+import os
+import sys
+import argparse
+import subprocess
+import platform
+from typing import List, Dict, Any
+import logging
+from datetime import datetime
+
+# Optional imports with graceful fallback
+GRAPHING_AVAILABLE = True
+try:
+    import pandas as pd
+    import matplotlib.pyplot as plt
+    import seaborn as sns
+    import numpy as np
+except ImportError as e:
+    GRAPHING_AVAILABLE = False
+    print(f"Warning: Graphing libraries not available: {e}")
+    print("Install with: pip install pandas matplotlib seaborn numpy")
+
+
+class ResultsAnalyzer:
+    def __init__(self, results_dir: str, output_dir: str, config: Dict[str, Any]):
+        self.results_dir = results_dir
+        self.output_dir = output_dir
+        self.config = config
+        self.results_data = []
+
+        # Setup logging
+        logging.basicConfig(
+            level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s"
+        )
+        self.logger = logging.getLogger(__name__)
+
+        # Create output directory
+        os.makedirs(output_dir, exist_ok=True)
+
+        # Collect system information for DUT details
+        self.system_info = self._collect_system_info()
+
+    def _collect_system_info(self) -> Dict[str, Any]:
+        """Collect system information for DUT details in HTML report"""
+        info = {}
+
+        try:
+            # Basic system information
+            info["hostname"] = platform.node()
+            info["platform"] = platform.platform()
+            info["architecture"] = platform.architecture()[0]
+            info["processor"] = platform.processor()
+
+            # Memory information
+            try:
+                with open("/proc/meminfo", "r") as f:
+                    meminfo = f.read()
+                    for line in meminfo.split("\n"):
+                        if "MemTotal:" in line:
+                            info["total_memory"] = line.split()[1] + " kB"
+                            break
+            except:
+                info["total_memory"] = "Unknown"
+
+            # CPU information
+            try:
+                with open("/proc/cpuinfo", "r") as f:
+                    cpuinfo = f.read()
+                    cpu_count = cpuinfo.count("processor")
+                    info["cpu_count"] = cpu_count
+
+                    # Extract CPU model
+                    for line in cpuinfo.split("\n"):
+                        if "model name" in line:
+                            info["cpu_model"] = line.split(":", 1)[1].strip()
+                            break
+            except:
+                info["cpu_count"] = "Unknown"
+                info["cpu_model"] = "Unknown"
+
+            # Storage information
+            info["storage_devices"] = self._get_storage_info()
+
+            # Virtualization detection
+            info["is_vm"] = self._detect_virtualization()
+
+            # Filesystem information for AI data directory
+            info["filesystem_info"] = self._get_filesystem_info()
+
+        except Exception as e:
+            self.logger.warning(f"Error collecting system information: {e}")
+
+        return info
+
+    def _get_storage_info(self) -> List[Dict[str, str]]:
+        """Get storage device information including NVMe details"""
+        devices = []
+
+        try:
+            # Get block devices
+            result = subprocess.run(
+                ["lsblk", "-J", "-o", "NAME,SIZE,TYPE,MOUNTPOINT,FSTYPE"],
+                capture_output=True,
+                text=True,
+            )
+            if result.returncode == 0:
+                lsblk_data = json.loads(result.stdout)
+                for device in lsblk_data.get("blockdevices", []):
+                    if device.get("type") == "disk":
+                        dev_info = {
+                            "name": device.get("name", ""),
+                            "size": device.get("size", ""),
+                            "type": "disk",
+                        }
+
+                        # Check if it's NVMe and get additional details
+                        if device.get("name", "").startswith("nvme"):
+                            nvme_info = self._get_nvme_info(device.get("name", ""))
+                            dev_info.update(nvme_info)
+
+                        devices.append(dev_info)
+        except Exception as e:
+            self.logger.warning(f"Error getting storage info: {e}")
+
+        return devices
+
+    def _get_nvme_info(self, device_name: str) -> Dict[str, str]:
+        """Get detailed NVMe device information"""
+        nvme_info = {}
+
+        try:
+            # Get NVMe identify info
+            result = subprocess.run(
+                ["nvme", "id-ctrl", f"/dev/{device_name}"],
+                capture_output=True,
+                text=True,
+            )
+            if result.returncode == 0:
+                output = result.stdout
+                for line in output.split("\n"):
+                    if "mn :" in line:
+                        nvme_info["model"] = line.split(":", 1)[1].strip()
+                    elif "fr :" in line:
+                        nvme_info["firmware"] = line.split(":", 1)[1].strip()
+                    elif "sn :" in line:
+                        nvme_info["serial"] = line.split(":", 1)[1].strip()
+        except Exception as e:
+            self.logger.debug(f"Could not get NVMe info for {device_name}: {e}")
+
+        return nvme_info
+
+    def _detect_virtualization(self) -> str:
+        """Detect if running in a virtual environment"""
+        try:
+            # Check systemd-detect-virt
+            result = subprocess.run(
+                ["systemd-detect-virt"], capture_output=True, text=True
+            )
+            if result.returncode == 0:
+                virt_type = result.stdout.strip()
+                return virt_type if virt_type != "none" else "Physical"
+        except:
+            pass
+
+        try:
+            # Check dmesg for virtualization hints
+            result = subprocess.run(["dmesg"], capture_output=True, text=True)
+            if result.returncode == 0:
+                dmesg_output = result.stdout.lower()
+                if "kvm" in dmesg_output:
+                    return "KVM"
+                elif "vmware" in dmesg_output:
+                    return "VMware"
+                elif "virtualbox" in dmesg_output:
+                    return "VirtualBox"
+                elif "xen" in dmesg_output:
+                    return "Xen"
+        except:
+            pass
+
+        return "Unknown"
+
+    def _get_filesystem_info(self) -> Dict[str, str]:
+        """Get filesystem information for the AI benchmark directory"""
+        fs_info = {}
+
+        try:
+            # Get filesystem info for the results directory
+            result = subprocess.run(
+                ["df", "-T", self.results_dir], capture_output=True, text=True
+            )
+            if result.returncode == 0:
+                lines = result.stdout.strip().split("\n")
+                if len(lines) > 1:
+                    fields = lines[1].split()
+                    if len(fields) >= 2:
+                        fs_info["filesystem_type"] = fields[1]
+                        fs_info["mount_point"] = (
+                            fields[6] if len(fields) > 6 else "Unknown"
+                        )
+
+            # Get mount options
+            try:
+                with open("/proc/mounts", "r") as f:
+                    for line in f:
+                        parts = line.split()
+                        if (
+                            len(parts) >= 4
+                            and fs_info.get("mount_point", "") in parts[1]
+                        ):
+                            fs_info["mount_options"] = parts[3]
+                            break
+            except:
+                pass
+        except Exception as e:
+            self.logger.warning(f"Error getting filesystem info: {e}")
+
+        return fs_info
+
+    def load_results(self) -> bool:
+        """Load all result files from the results directory"""
+        try:
+            pattern = os.path.join(self.results_dir, "results_*.json")
+            result_files = glob.glob(pattern)
+
+            if not result_files:
+                self.logger.warning(f"No result files found in {self.results_dir}")
+                return False
+
+            self.logger.info(f"Found {len(result_files)} result files")
+
+            for file_path in result_files:
+                try:
+                    with open(file_path, "r") as f:
+                        data = json.load(f)
+                        data["_file"] = os.path.basename(file_path)
+                        self.results_data.append(data)
+                except Exception as e:
+                    self.logger.error(f"Error loading {file_path}: {e}")
+
+            self.logger.info(
+                f"Successfully loaded {len(self.results_data)} result sets"
+            )
+            return len(self.results_data) > 0
+
+        except Exception as e:
+            self.logger.error(f"Error loading results: {e}")
+            return False
+
+    def generate_summary_report(self) -> str:
+        """Generate a text summary report"""
+        try:
+            report = []
+            report.append("=" * 80)
+            report.append("AI BENCHMARK RESULTS SUMMARY")
+            report.append("=" * 80)
+            report.append(f"Generated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
+            report.append(f"Total result sets: {len(self.results_data)}")
+            report.append("")
+
+            if not self.results_data:
+                report.append("No results to analyze.")
+                return "\n".join(report)
+
+            # Configuration summary
+            first_result = self.results_data[0]
+            config = first_result.get("config", {})
+
+            report.append("CONFIGURATION:")
+            report.append(
+                f"  Vector dataset size: {config.get('vector_dataset_size', 'N/A'):,}"
+            )
+            report.append(
+                f"  Vector dimensions: {config.get('vector_dimensions', 'N/A')}"
+            )
+            report.append(f"  Index type: {config.get('index_type', 'N/A')}")
+            report.append(f"  Benchmark iterations: {len(self.results_data)}")
+            report.append("")
+
+            # Insert performance summary
+            insert_times = []
+            insert_rates = []
+            for result in self.results_data:
+                insert_perf = result.get("insert_performance", {})
+                if insert_perf:
+                    insert_times.append(insert_perf.get("total_time_seconds", 0))
+                    insert_rates.append(insert_perf.get("vectors_per_second", 0))
+
+            if insert_times:
+                report.append("INSERT PERFORMANCE:")
+                report.append(
+                    f"  Average insert time: {np.mean(insert_times):.2f} seconds"
+                )
+                report.append(
+                    f"  Average insert rate: {np.mean(insert_rates):.2f} vectors/sec"
+                )
+                report.append(
+                    f"  Insert rate range: {np.min(insert_rates):.2f} - {np.max(insert_rates):.2f} vectors/sec"
+                )
+                report.append("")
+
+            # Index performance summary
+            index_times = []
+            for result in self.results_data:
+                index_perf = result.get("index_performance", {})
+                if index_perf:
+                    index_times.append(index_perf.get("creation_time_seconds", 0))
+
+            if index_times:
+                report.append("INDEX PERFORMANCE:")
+                report.append(
+                    f"  Average index creation time: {np.mean(index_times):.2f} seconds"
+                )
+                report.append(
+                    f"  Index time range: {np.min(index_times):.2f} - {np.max(index_times):.2f} seconds"
+                )
+                report.append("")
+
+            # Query performance summary
+            report.append("QUERY PERFORMANCE:")
+            for result in self.results_data:
+                query_perf = result.get("query_performance", {})
+                if query_perf:
+                    for topk, topk_data in query_perf.items():
+                        report.append(f"  {topk.upper()}:")
+                        for batch, batch_data in topk_data.items():
+                            qps = batch_data.get("queries_per_second", 0)
+                            avg_time = batch_data.get("average_time_seconds", 0)
+                            report.append(
+                                f"    {batch}: {qps:.2f} QPS, {avg_time*1000:.2f}ms avg"
+                            )
+                    break  # Only show first result for summary
+
+            return "\n".join(report)
+
+        except Exception as e:
+            self.logger.error(f"Error generating summary report: {e}")
+            return f"Error generating summary: {e}"
+
+    def generate_html_report(self) -> str:
+        """Generate comprehensive HTML report with DUT details and test configuration"""
+        try:
+            html = []
+
+            # HTML header
+            html.append("<!DOCTYPE html>")
+            html.append("<html lang='en'>")
+            html.append("<head>")
+            html.append("    <meta charset='UTF-8'>")
+            html.append(
+                "    <meta name='viewport' content='width=device-width, initial-scale=1.0'>"
+            )
+            html.append("    <title>AI Benchmark Results Report</title>")
+            html.append("    <style>")
+            html.append(
+                "        body { font-family: Arial, sans-serif; margin: 20px; line-height: 1.6; }"
+            )
+            html.append(
+                "        .header { background-color: #f4f4f4; padding: 20px; border-radius: 5px; margin-bottom: 20px; }"
+            )
+            html.append("        .section { margin-bottom: 30px; }")
+            html.append(
+                "        .section h2 { color: #333; border-bottom: 2px solid #007acc; padding-bottom: 5px; }"
+            )
+            html.append("        .section h3 { color: #555; }")
+            html.append(
+                "        table { border-collapse: collapse; width: 100%; margin-bottom: 20px; }"
+            )
+            html.append(
+                "        th, td { border: 1px solid #ddd; padding: 8px; text-align: left; }"
+            )
+            html.append("        th { background-color: #f2f2f2; font-weight: bold; }")
+            html.append(
+                "        .metric-table td:first-child { font-weight: bold; width: 30%; }"
+            )
+            html.append(
+                "        .config-table td:first-child { font-weight: bold; width: 40%; }"
+            )
+            html.append("        .performance-good { color: #27ae60; }")
+            html.append("        .performance-warning { color: #f39c12; }")
+            html.append("        .performance-poor { color: #e74c3c; }")
+            html.append(
+                "        .highlight { background-color: #fff3cd; padding: 10px; border-radius: 3px; }"
+            )
+            html.append("    </style>")
+            html.append("</head>")
+            html.append("<body>")
+
+            # Report header
+            html.append("    <div class='header'>")
+            html.append("        <h1>AI Benchmark Results Report</h1>")
+            html.append(
+                f"        <p><strong>Generated:</strong> {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}</p>"
+            )
+            html.append(
+                f"        <p><strong>Test Results:</strong> {len(self.results_data)} benchmark iterations</p>"
+            )
+
+            # Test type identification
+            html.append("        <div class='highlight'>")
+            html.append("            <h3>🤖 AI Workflow Test Type</h3>")
+            html.append(
+                "            <p><strong>Vector Database Performance Testing</strong> using <strong>Milvus Vector Database</strong></p>"
+            )
+            html.append(
+                "            <p>This test evaluates AI workload performance including vector insertion, indexing, and similarity search operations.</p>"
+            )
+            html.append("        </div>")
+            html.append("    </div>")
+
+            # Device Under Test (DUT) Section
+            html.append("    <div class='section'>")
+            html.append("        <h2>📋 Device Under Test (DUT) Details</h2>")
+            html.append("        <table class='config-table'>")
+            html.append(
+                "            <tr><td>Hostname</td><td>"
+                + str(self.system_info.get("hostname", "Unknown"))
+                + "</td></tr>"
+            )
+            html.append(
+                "            <tr><td>System Type</td><td>"
+                + str(self.system_info.get("is_vm", "Unknown"))
+                + "</td></tr>"
+            )
+            html.append(
+                "            <tr><td>Platform</td><td>"
+                + str(self.system_info.get("platform", "Unknown"))
+                + "</td></tr>"
+            )
+            html.append(
+                "            <tr><td>Architecture</td><td>"
+                + str(self.system_info.get("architecture", "Unknown"))
+                + "</td></tr>"
+            )
+            html.append(
+                "            <tr><td>CPU Model</td><td>"
+                + str(self.system_info.get("cpu_model", "Unknown"))
+                + "</td></tr>"
+            )
+            html.append(
+                "            <tr><td>CPU Count</td><td>"
+                + str(self.system_info.get("cpu_count", "Unknown"))
+                + " cores</td></tr>"
+            )
+            html.append(
+                "            <tr><td>Total Memory</td><td>"
+                + str(self.system_info.get("total_memory", "Unknown"))
+                + "</td></tr>"
+            )
+            html.append("        </table>")
+
+            # Storage devices section
+            html.append("        <h3>💾 Storage Configuration</h3>")
+            storage_devices = self.system_info.get("storage_devices", [])
+            if storage_devices:
+                html.append("        <table>")
+                html.append(
+                    "            <tr><th>Device</th><th>Size</th><th>Type</th><th>Model</th><th>Firmware</th></tr>"
+                )
+                for device in storage_devices:
+                    model = device.get("model", "N/A")
+                    firmware = device.get("firmware", "N/A")
+                    html.append(f"            <tr>")
+                    html.append(
+                        f"                <td>{device.get('name', 'Unknown')}</td>"
+                    )
+                    html.append(
+                        f"                <td>{device.get('size', 'Unknown')}</td>"
+                    )
+                    html.append(
+                        f"                <td>{device.get('type', 'Unknown')}</td>"
+                    )
+                    html.append(f"                <td>{model}</td>")
+                    html.append(f"                <td>{firmware}</td>")
+                    html.append(f"            </tr>")
+                html.append("        </table>")
+            else:
+                html.append("        <p>No storage device information available.</p>")
+
+            # Filesystem section
+            html.append("        <h3>🗂️ Filesystem Configuration</h3>")
+            fs_info = self.system_info.get("filesystem_info", {})
+            html.append("        <table class='config-table'>")
+            html.append(
+                "            <tr><td>Filesystem Type</td><td>"
+                + str(fs_info.get("filesystem_type", "Unknown"))
+                + "</td></tr>"
+            )
+            html.append(
+                "            <tr><td>Mount Point</td><td>"
+                + str(fs_info.get("mount_point", "Unknown"))
+                + "</td></tr>"
+            )
+            html.append(
+                "            <tr><td>Mount Options</td><td>"
+                + str(fs_info.get("mount_options", "Unknown"))
+                + "</td></tr>"
+            )
+            html.append("        </table>")
+            html.append("    </div>")
+
+            # Test Configuration Section
+            if self.results_data:
+                first_result = self.results_data[0]
+                config = first_result.get("config", {})
+
+                html.append("    <div class='section'>")
+                html.append("        <h2>⚙️ AI Test Configuration</h2>")
+                html.append("        <table class='config-table'>")
+                html.append(
+                    f"            <tr><td>Vector Dataset Size</td><td>{config.get('vector_dataset_size', 'N/A'):,} vectors</td></tr>"
+                )
+                html.append(
+                    f"            <tr><td>Vector Dimensions</td><td>{config.get('vector_dimensions', 'N/A')}</td></tr>"
+                )
+                html.append(
+                    f"            <tr><td>Index Type</td><td>{config.get('index_type', 'N/A')}</td></tr>"
+                )
+                html.append(
+                    f"            <tr><td>Benchmark Iterations</td><td>{len(self.results_data)}</td></tr>"
+                )
+
+                # Add index-specific parameters
+                if config.get("index_type") == "HNSW":
+                    html.append(
+                        f"            <tr><td>HNSW M Parameter</td><td>{config.get('hnsw_m', 'N/A')}</td></tr>"
+                    )
+                    html.append(
+                        f"            <tr><td>HNSW ef Construction</td><td>{config.get('hnsw_ef_construction', 'N/A')}</td></tr>"
+                    )
+                    html.append(
+                        f"            <tr><td>HNSW ef Search</td><td>{config.get('hnsw_ef', 'N/A')}</td></tr>"
+                    )
+                elif config.get("index_type") == "IVF_FLAT":
+                    html.append(
+                        f"            <tr><td>IVF nlist</td><td>{config.get('ivf_nlist', 'N/A')}</td></tr>"
+                    )
+                    html.append(
+                        f"            <tr><td>IVF nprobe</td><td>{config.get('ivf_nprobe', 'N/A')}</td></tr>"
+                    )
+
+                html.append("        </table>")
+                html.append("    </div>")
+
+            # Performance Results Section
+            html.append("    <div class='section'>")
+            html.append("        <h2>📊 Performance Results Summary</h2>")
+
+            if self.results_data:
+                # Insert performance
+                insert_times = [
+                    r.get("insert_performance", {}).get("total_time_seconds", 0)
+                    for r in self.results_data
+                ]
+                insert_rates = [
+                    r.get("insert_performance", {}).get("vectors_per_second", 0)
+                    for r in self.results_data
+                ]
+
+                if insert_times and any(t > 0 for t in insert_times):
+                    html.append("        <h3>📈 Vector Insert Performance</h3>")
+                    html.append("        <table class='metric-table'>")
+                    html.append(
+                        f"            <tr><td>Average Insert Time</td><td>{np.mean(insert_times):.2f} seconds</td></tr>"
+                    )
+                    html.append(
+                        f"            <tr><td>Average Insert Rate</td><td>{np.mean(insert_rates):.2f} vectors/sec</td></tr>"
+                    )
+                    html.append(
+                        f"            <tr><td>Insert Rate Range</td><td>{np.min(insert_rates):.2f} - {np.max(insert_rates):.2f} vectors/sec</td></tr>"
+                    )
+                    html.append("        </table>")
+
+                # Index performance
+                index_times = [
+                    r.get("index_performance", {}).get("creation_time_seconds", 0)
+                    for r in self.results_data
+                ]
+                if index_times and any(t > 0 for t in index_times):
+                    html.append("        <h3>🔗 Index Creation Performance</h3>")
+                    html.append("        <table class='metric-table'>")
+                    html.append(
+                        f"            <tr><td>Average Index Creation Time</td><td>{np.mean(index_times):.2f} seconds</td></tr>"
+                    )
+                    html.append(
+                        f"            <tr><td>Index Time Range</td><td>{np.min(index_times):.2f} - {np.max(index_times):.2f} seconds</td></tr>"
+                    )
+                    html.append("        </table>")
+
+                # Query performance
+                html.append("        <h3>🔍 Query Performance</h3>")
+                first_query_perf = self.results_data[0].get("query_performance", {})
+                if first_query_perf:
+                    html.append("        <table>")
+                    html.append(
+                        "            <tr><th>Query Type</th><th>Batch Size</th><th>QPS</th><th>Avg Latency (ms)</th></tr>"
+                    )
+
+                    for topk, topk_data in first_query_perf.items():
+                        for batch, batch_data in topk_data.items():
+                            qps = batch_data.get("queries_per_second", 0)
+                            avg_time = batch_data.get("average_time_seconds", 0) * 1000
+
+                            # Color coding for performance
+                            qps_class = ""
+                            if qps > 1000:
+                                qps_class = "performance-good"
+                            elif qps > 100:
+                                qps_class = "performance-warning"
+                            else:
+                                qps_class = "performance-poor"
+
+                            html.append(f"            <tr>")
+                            html.append(
+                                f"                <td>{topk.replace('topk_', 'Top-')}</td>"
+                            )
+                            html.append(
+                                f"                <td>{batch.replace('batch_', 'Batch ')}</td>"
+                            )
+                            html.append(
+                                f"                <td class='{qps_class}'>{qps:.2f}</td>"
+                            )
+                            html.append(f"                <td>{avg_time:.2f}</td>")
+                            html.append(f"            </tr>")
+
+                    html.append("        </table>")
+
+                html.append("    </div>")
+
+            # Footer
+            html.append("    <div class='section'>")
+            html.append("        <h2>📝 Notes</h2>")
+            html.append("        <ul>")
+            html.append(
+                "            <li>This report was generated automatically by the AI benchmark analysis tool</li>"
+            )
+            html.append(
+                "            <li>Performance metrics are averaged across all benchmark iterations</li>"
+            )
+            html.append(
+                "            <li>QPS (Queries Per Second) values are color-coded: <span class='performance-good'>Green (>1000)</span>, <span class='performance-warning'>Orange (100-1000)</span>, <span class='performance-poor'>Red (<100)</span></li>"
+            )
+            html.append(
+                "            <li>Storage device information may require root privileges to display NVMe details</li>"
+            )
+            html.append("        </ul>")
+            html.append("    </div>")
+
+            html.append("</body>")
+            html.append("</html>")
+
+            return "\n".join(html)
+
+        except Exception as e:
+            self.logger.error(f"Error generating HTML report: {e}")
+            return (
+                f"<html><body><h1>Error generating HTML report: {e}</h1></body></html>"
+            )
+
+    def generate_graphs(self) -> bool:
+        """Generate performance visualization graphs"""
+        if not GRAPHING_AVAILABLE:
+            self.logger.warning(
+                "Graphing libraries not available, skipping graph generation"
+            )
+            return False
+
+        try:
+            # Set matplotlib style
+            if self.config.get("graph_theme", "default") != "default":
+                plt.style.use(self.config["graph_theme"])
+
+            # Graph 1: Insert Performance
+            self._plot_insert_performance()
+
+            # Graph 2: Query Performance by Top-K
+            self._plot_query_performance()
+
+            # Graph 3: Index Creation Time
+            self._plot_index_performance()
+
+            # Graph 4: Performance Comparison Matrix
+            self._plot_performance_matrix()
+
+            self.logger.info("Graphs generated successfully")
+            return True
+
+        except Exception as e:
+            self.logger.error(f"Error generating graphs: {e}")
+            return False
+
+    def _plot_insert_performance(self):
+        """Plot insert performance metrics"""
+        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
+
+        # Extract insert data
+        iterations = []
+        insert_rates = []
+        insert_times = []
+
+        for i, result in enumerate(self.results_data):
+            insert_perf = result.get("insert_performance", {})
+            if insert_perf:
+                iterations.append(i + 1)
+                insert_rates.append(insert_perf.get("vectors_per_second", 0))
+                insert_times.append(insert_perf.get("total_time_seconds", 0))
+
+        # Plot insert rate
+        ax1.plot(iterations, insert_rates, "b-o", linewidth=2, markersize=6)
+        ax1.set_xlabel("Iteration")
+        ax1.set_ylabel("Vectors/Second")
+        ax1.set_title("Vector Insert Rate Performance")
+        ax1.grid(True, alpha=0.3)
+
+        # Plot insert time
+        ax2.plot(iterations, insert_times, "r-o", linewidth=2, markersize=6)
+        ax2.set_xlabel("Iteration")
+        ax2.set_ylabel("Total Time (seconds)")
+        ax2.set_title("Vector Insert Time Performance")
+        ax2.grid(True, alpha=0.3)
+
+        plt.tight_layout()
+        output_file = os.path.join(
+            self.output_dir,
+            f"insert_performance.{self.config.get('graph_format', 'png')}",
+        )
+        plt.savefig(
+            output_file, dpi=self.config.get("graph_dpi", 300), bbox_inches="tight"
+        )
+        plt.close()
+
+    def _plot_query_performance(self):
+        """Plot query performance metrics"""
+        if not self.results_data:
+            return
+
+        # Collect query performance data
+        query_data = []
+        for result in self.results_data:
+            query_perf = result.get("query_performance", {})
+            for topk, topk_data in query_perf.items():
+                for batch, batch_data in topk_data.items():
+                    query_data.append(
+                        {
+                            "topk": topk.replace("topk_", ""),
+                            "batch": batch.replace("batch_", ""),
+                            "qps": batch_data.get("queries_per_second", 0),
+                            "avg_time": batch_data.get("average_time_seconds", 0)
+                            * 1000,  # Convert to ms
+                        }
+                    )
+
+        if not query_data:
+            return
+
+        df = pd.DataFrame(query_data)
+
+        # Create subplots
+        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
+
+        # QPS heatmap
+        qps_pivot = df.pivot_table(
+            values="qps", index="topk", columns="batch", aggfunc="mean"
+        )
+        sns.heatmap(qps_pivot, annot=True, fmt=".1f", ax=ax1, cmap="YlOrRd")
+        ax1.set_title("Queries Per Second (QPS)")
+        ax1.set_xlabel("Batch Size")
+        ax1.set_ylabel("Top-K")
+
+        # Latency heatmap
+        latency_pivot = df.pivot_table(
+            values="avg_time", index="topk", columns="batch", aggfunc="mean"
+        )
+        sns.heatmap(latency_pivot, annot=True, fmt=".1f", ax=ax2, cmap="YlOrRd")
+        ax2.set_title("Average Query Latency (ms)")
+        ax2.set_xlabel("Batch Size")
+        ax2.set_ylabel("Top-K")
+
+        plt.tight_layout()
+        output_file = os.path.join(
+            self.output_dir,
+            f"query_performance.{self.config.get('graph_format', 'png')}",
+        )
+        plt.savefig(
+            output_file, dpi=self.config.get("graph_dpi", 300), bbox_inches="tight"
+        )
+        plt.close()
+
+    def _plot_index_performance(self):
+        """Plot index creation performance"""
+        iterations = []
+        index_times = []
+
+        for i, result in enumerate(self.results_data):
+            index_perf = result.get("index_performance", {})
+            if index_perf:
+                iterations.append(i + 1)
+                index_times.append(index_perf.get("creation_time_seconds", 0))
+
+        if not index_times:
+            return
+
+        plt.figure(figsize=(10, 6))
+        plt.bar(iterations, index_times, alpha=0.7, color="green")
+        plt.xlabel("Iteration")
+        plt.ylabel("Index Creation Time (seconds)")
+        plt.title("Index Creation Performance")
+        plt.grid(True, alpha=0.3)
+
+        # Add average line
+        avg_time = np.mean(index_times)
+        plt.axhline(
+            y=avg_time, color="red", linestyle="--", label=f"Average: {avg_time:.2f}s"
+        )
+        plt.legend()
+
+        output_file = os.path.join(
+            self.output_dir,
+            f"index_performance.{self.config.get('graph_format', 'png')}",
+        )
+        plt.savefig(
+            output_file, dpi=self.config.get("graph_dpi", 300), bbox_inches="tight"
+        )
+        plt.close()
+
+    def _plot_performance_matrix(self):
+        """Plot comprehensive performance comparison matrix"""
+        if len(self.results_data) < 2:
+            return
+
+        # Extract key metrics for comparison
+        metrics = []
+        for i, result in enumerate(self.results_data):
+            insert_perf = result.get("insert_performance", {})
+            index_perf = result.get("index_performance", {})
+
+            metric = {
+                "iteration": i + 1,
+                "insert_rate": insert_perf.get("vectors_per_second", 0),
+                "index_time": index_perf.get("creation_time_seconds", 0),
+            }
+
+            # Add query metrics
+            query_perf = result.get("query_performance", {})
+            if "topk_10" in query_perf and "batch_1" in query_perf["topk_10"]:
+                metric["query_qps"] = query_perf["topk_10"]["batch_1"].get(
+                    "queries_per_second", 0
+                )
+
+            metrics.append(metric)
+
+        df = pd.DataFrame(metrics)
+
+        # Normalize metrics for comparison
+        numeric_cols = ["insert_rate", "index_time", "query_qps"]
+        for col in numeric_cols:
+            if col in df.columns:
+                df[f"{col}_norm"] = (df[col] - df[col].min()) / (
+                    df[col].max() - df[col].min() + 1e-6
+                )
+
+        # Create radar chart
+        fig, ax = plt.subplots(figsize=(10, 8), subplot_kw=dict(projection="polar"))
+
+        angles = np.linspace(0, 2 * np.pi, len(numeric_cols), endpoint=False).tolist()
+        angles += angles[:1]  # Complete the circle
+
+        for i, row in df.iterrows():
+            values = [row.get(f"{col}_norm", 0) for col in numeric_cols]
+            values += values[:1]  # Complete the circle
+
+            ax.plot(
+                angles, values, "o-", linewidth=2, label=f'Iteration {row["iteration"]}'
+            )
+            ax.fill(angles, values, alpha=0.25)
+
+        ax.set_xticks(angles[:-1])
+        ax.set_xticklabels(["Insert Rate", "Index Time (inv)", "Query QPS"])
+        ax.set_ylim(0, 1)
+        ax.set_title("Performance Comparison Matrix (Normalized)", y=1.08)
+        ax.legend(loc="upper right", bbox_to_anchor=(1.3, 1.0))
+
+        output_file = os.path.join(
+            self.output_dir,
+            f"performance_matrix.{self.config.get('graph_format', 'png')}",
+        )
+        plt.savefig(
+            output_file, dpi=self.config.get("graph_dpi", 300), bbox_inches="tight"
+        )
+        plt.close()
+
+    def analyze(self) -> bool:
+        """Run complete analysis"""
+        self.logger.info("Starting results analysis...")
+
+        if not self.load_results():
+            return False
+
+        # Generate summary report
+        summary = self.generate_summary_report()
+        summary_file = os.path.join(self.output_dir, "benchmark_summary.txt")
+        with open(summary_file, "w") as f:
+            f.write(summary)
+        self.logger.info(f"Summary report saved to {summary_file}")
+
+        # Generate HTML report
+        html_report = self.generate_html_report()
+        html_file = os.path.join(self.output_dir, "benchmark_report.html")
+        with open(html_file, "w") as f:
+            f.write(html_report)
+        self.logger.info(f"HTML report saved to {html_file}")
+
+        # Generate graphs if enabled
+        if self.config.get("enable_graphing", True):
+            self.generate_graphs()
+
+        # Create consolidated JSON report
+        consolidated_file = os.path.join(self.output_dir, "consolidated_results.json")
+        with open(consolidated_file, "w") as f:
+            json.dump(
+                {
+                    "summary": summary.split("\n"),
+                    "raw_results": self.results_data,
+                    "analysis_timestamp": datetime.now().isoformat(),
+                    "system_info": self.system_info,
+                },
+                f,
+                indent=2,
+            )
+
+        self.logger.info("Analysis completed successfully")
+        return True
+
+
+def main():
+    parser = argparse.ArgumentParser(description="Analyze AI benchmark results")
+    parser.add_argument(
+        "--results-dir", required=True, help="Directory containing result files"
+    )
+    parser.add_argument(
+        "--output-dir", required=True, help="Directory for analysis output"
+    )
+    parser.add_argument("--config", help="Analysis configuration file (JSON)")
+
+    args = parser.parse_args()
+
+    # Load configuration
+    config = {
+        "enable_graphing": True,
+        "graph_format": "png",
+        "graph_dpi": 300,
+        "graph_theme": "default",
+    }
+
+    if args.config:
+        try:
+            with open(args.config, "r") as f:
+                config.update(json.load(f))
+        except Exception as e:
+            print(f"Error loading config file: {e}")
+
+    # Run analysis
+    analyzer = ResultsAnalyzer(args.results_dir, args.output_dir, config)
+    success = analyzer.analyze()
+
+    return 0 if success else 1
+
+
+if __name__ == "__main__":
+    sys.exit(main())
diff --git a/playbooks/roles/ai_collect_results/files/generate_better_graphs.py b/playbooks/roles/ai_collect_results/files/generate_better_graphs.py
new file mode 100755
index 00000000..645bac9e
--- /dev/null
+++ b/playbooks/roles/ai_collect_results/files/generate_better_graphs.py
@@ -0,0 +1,548 @@
+#!/usr/bin/env python3
+"""
+Generate meaningful graphs for AI benchmark results
+Focus on QPS and Latency metrics that matter
+"""
+
+import json
+import os
+import sys
+import glob
+import numpy as np
+import matplotlib
+
+matplotlib.use("Agg")  # Use non-interactive backend
+import matplotlib.pyplot as plt
+from datetime import datetime
+from pathlib import Path
+from collections import defaultdict
+import subprocess
+
+
+def extract_filesystem_from_filename(filename):
+    """Extract filesystem type from result filename"""
+    # Expected format: results_debian13-ai-xfs-4k-4ks_1.json or results_debian13-ai-ext4-4k_1.json
+    if "debian13-ai-" in filename:
+        # Remove the "results_" prefix and ".json" suffix
+        node_name = filename.replace("results_", "").replace(".json", "")
+        # Remove the iteration number at the end
+        if "_" in node_name:
+            parts = node_name.split("_")
+            node_name = "_".join(parts[:-1])  # Remove last part (iteration)
+        
+        # Extract filesystem type from node name
+        if "-xfs-" in node_name:
+            return "xfs"
+        elif "-ext4-" in node_name:
+            return "ext4"  
+        elif "-btrfs-" in node_name:
+            return "btrfs"
+    
+    return "unknown"
+
+def extract_node_config_from_filename(filename):
+    """Extract detailed node configuration from filename"""
+    # Expected format: results_debian13-ai-xfs-4k-4ks_1.json
+    if "debian13-ai-" in filename:
+        # Remove the "results_" prefix and ".json" suffix
+        node_name = filename.replace("results_", "").replace(".json", "")
+        # Remove the iteration number at the end
+        if "_" in node_name:
+            parts = node_name.split("_")
+            node_name = "_".join(parts[:-1])  # Remove last part (iteration)
+        
+        # Remove -dev suffix if present
+        node_name = node_name.replace("-dev", "")
+        
+        return node_name.replace("debian13-ai-", "")
+    
+    return "unknown"
+
+def detect_filesystem():
+    """Detect the filesystem type of /data on test nodes"""
+    # This is now a fallback - we primarily use filename-based detection
+    try:
+        # Try to get filesystem info from a test node
+        result = subprocess.run(
+            ["ssh", "debian13-ai", "df -T /data | tail -1"],
+            capture_output=True,
+            text=True,
+            timeout=5,
+        )
+        if result.returncode == 0:
+            parts = result.stdout.strip().split()
+            if len(parts) >= 2:
+                return parts[1]  # filesystem type is second column
+    except:
+        pass
+
+    # Fallback to local filesystem check
+    try:
+        result = subprocess.run(["df", "-T", "."], capture_output=True, text=True)
+        if result.returncode == 0:
+            lines = result.stdout.strip().split("\n")
+            if len(lines) > 1:
+                parts = lines[1].split()
+                if len(parts) >= 2:
+                    return parts[1]
+    except:
+        pass
+
+    return "unknown"
+
+
+def load_results(results_dir):
+    """Load all JSON result files from the directory"""
+    results = []
+    json_files = glob.glob(os.path.join(results_dir, "results_*.json"))
+
+    for json_file in json_files:
+        try:
+            with open(json_file, "r") as f:
+                data = json.load(f)
+
+                # Extract node type from filename
+                filename = os.path.basename(json_file)
+                data["filename"] = filename
+                
+                # Extract filesystem type and config from filename
+                data["filesystem"] = extract_filesystem_from_filename(filename)
+                data["node_config"] = extract_node_config_from_filename(filename)
+
+                # Determine if it's baseline or dev
+                if "-dev_" in filename or "-dev." in filename:
+                    data["node_type"] = "dev"
+                    data["is_dev"] = True
+                else:
+                    data["node_type"] = "baseline"
+                    data["is_dev"] = False
+
+                # Extract iteration number
+                if "_" in filename:
+                    parts = filename.split("_")
+                    iteration = parts[-1].replace(".json", "")
+                    data["iteration"] = int(iteration) if iteration.isdigit() else 1
+                else:
+                    data["iteration"] = 1
+
+                results.append(data)
+        except Exception as e:
+            print(f"Error loading {json_file}: {e}")
+
+    return results
+
+
+def create_qps_comparison_chart(results, output_dir):
+    """Create a clear QPS comparison chart between baseline and dev"""
+
+    # Organize data by node type and test configuration
+    baseline_data = defaultdict(list)
+    dev_data = defaultdict(list)
+
+    for result in results:
+        if "query_performance" not in result:
+            continue
+
+        qp = result["query_performance"]
+        node_type = result.get("node_type", "unknown")
+
+        # Extract QPS for different configurations
+        for topk in ["topk_1", "topk_10", "topk_100"]:
+            if topk not in qp:
+                continue
+            for batch in ["batch_1", "batch_10", "batch_100"]:
+                if batch not in qp[topk]:
+                    continue
+
+                config_name = f"{topk}_{batch}"
+                qps = qp[topk][batch].get("queries_per_second", 0)
+
+                if node_type == "dev":
+                    dev_data[config_name].append(qps)
+                else:
+                    baseline_data[config_name].append(qps)
+
+    # Calculate averages
+    configs = sorted(set(baseline_data.keys()) | set(dev_data.keys()))
+    baseline_avg = [
+        np.mean(baseline_data[c]) if baseline_data[c] else 0 for c in configs
+    ]
+    dev_avg = [np.mean(dev_data[c]) if dev_data[c] else 0 for c in configs]
+
+    # Create the plot
+    fig, ax = plt.subplots(figsize=(14, 8))
+
+    x = np.arange(len(configs))
+    width = 0.35
+
+    baseline_bars = ax.bar(
+        x - width / 2, baseline_avg, width, label="Baseline", color="#2E86AB"
+    )
+    dev_bars = ax.bar(
+        x + width / 2, dev_avg, width, label="Development", color="#A23B72"
+    )
+
+    # Customize the plot
+    ax.set_xlabel("Query Configuration", fontsize=12)
+    ax.set_ylabel("Queries Per Second (QPS)", fontsize=12)
+    fs_type = results[0].get("filesystem", "unknown") if results else "unknown"
+    ax.set_title(
+        f"Milvus Query Performance Comparison\nFilesystem: {fs_type.upper()}",
+        fontsize=14,
+        fontweight="bold",
+    )
+    ax.set_xticks(x)
+    ax.set_xticklabels([c.replace("_", "\n") for c in configs], rotation=45, ha="right")
+    ax.legend(fontsize=11)
+    ax.grid(True, alpha=0.3, axis="y")
+
+    # Add value labels on bars
+    for bars in [baseline_bars, dev_bars]:
+        for bar in bars:
+            height = bar.get_height()
+            if height > 0:
+                ax.annotate(
+                    f"{height:.0f}",
+                    xy=(bar.get_x() + bar.get_width() / 2, height),
+                    xytext=(0, 3),
+                    textcoords="offset points",
+                    ha="center",
+                    va="bottom",
+                    fontsize=9,
+                )
+
+    plt.tight_layout()
+    plt.savefig(
+        os.path.join(output_dir, "qps_comparison.png"), dpi=150, bbox_inches="tight"
+    )
+    plt.close()
+
+    print(f"Generated QPS comparison chart")
+
+
+def create_latency_comparison_chart(results, output_dir):
+    """Create latency comparison chart (lower is better)"""
+
+    # Organize data by node type and test configuration
+    baseline_latency = defaultdict(list)
+    dev_latency = defaultdict(list)
+
+    for result in results:
+        if "query_performance" not in result:
+            continue
+
+        qp = result["query_performance"]
+        node_type = result.get("node_type", "unknown")
+
+        # Extract latency for different configurations
+        for topk in ["topk_1", "topk_10", "topk_100"]:
+            if topk not in qp:
+                continue
+            for batch in ["batch_1", "batch_10", "batch_100"]:
+                if batch not in qp[topk]:
+                    continue
+
+                config_name = f"{topk}_{batch}"
+                # Convert to milliseconds for readability
+                latency_ms = qp[topk][batch].get("average_time_seconds", 0) * 1000
+
+                if node_type == "dev":
+                    dev_latency[config_name].append(latency_ms)
+                else:
+                    baseline_latency[config_name].append(latency_ms)
+
+    # Calculate averages
+    configs = sorted(set(baseline_latency.keys()) | set(dev_latency.keys()))
+    baseline_avg = [
+        np.mean(baseline_latency[c]) if baseline_latency[c] else 0 for c in configs
+    ]
+    dev_avg = [np.mean(dev_latency[c]) if dev_latency[c] else 0 for c in configs]
+
+    # Create the plot
+    fig, ax = plt.subplots(figsize=(14, 8))
+
+    x = np.arange(len(configs))
+    width = 0.35
+
+    baseline_bars = ax.bar(
+        x - width / 2, baseline_avg, width, label="Baseline", color="#2E86AB"
+    )
+    dev_bars = ax.bar(
+        x + width / 2, dev_avg, width, label="Development", color="#A23B72"
+    )
+
+    # Customize the plot
+    ax.set_xlabel("Query Configuration", fontsize=12)
+    ax.set_ylabel("Average Latency (milliseconds)", fontsize=12)
+    fs_type = results[0].get("filesystem", "unknown") if results else "unknown"
+    ax.set_title(
+        f"Milvus Query Latency Comparison (Lower is Better)\nFilesystem: {fs_type.upper()}",
+        fontsize=14,
+        fontweight="bold",
+    )
+    ax.set_xticks(x)
+    ax.set_xticklabels([c.replace("_", "\n") for c in configs], rotation=45, ha="right")
+    ax.legend(fontsize=11)
+    ax.grid(True, alpha=0.3, axis="y")
+
+    # Add value labels on bars
+    for bars in [baseline_bars, dev_bars]:
+        for bar in bars:
+            height = bar.get_height()
+            if height > 0:
+                ax.annotate(
+                    f"{height:.1f}ms",
+                    xy=(bar.get_x() + bar.get_width() / 2, height),
+                    xytext=(0, 3),
+                    textcoords="offset points",
+                    ha="center",
+                    va="bottom",
+                    fontsize=9,
+                )
+
+    plt.tight_layout()
+    plt.savefig(
+        os.path.join(output_dir, "latency_comparison.png"), dpi=150, bbox_inches="tight"
+    )
+    plt.close()
+
+    print(f"Generated latency comparison chart")
+
+
+def create_insert_performance_chart(results, output_dir):
+    """Create insert performance comparison"""
+
+    baseline_insert = []
+    dev_insert = []
+
+    for result in results:
+        if "insert_performance" not in result:
+            continue
+
+        vectors_per_sec = result["insert_performance"].get("vectors_per_second", 0)
+        node_type = result.get("node_type", "unknown")
+
+        if node_type == "dev":
+            dev_insert.append(vectors_per_sec)
+        else:
+            baseline_insert.append(vectors_per_sec)
+
+    if not baseline_insert and not dev_insert:
+        return
+
+    # Create box plot for insert performance
+    fig, ax = plt.subplots(figsize=(10, 6))
+
+    data_to_plot = []
+    labels = []
+
+    if baseline_insert:
+        data_to_plot.append(baseline_insert)
+        labels.append("Baseline")
+    if dev_insert:
+        data_to_plot.append(dev_insert)
+        labels.append("Development")
+
+    bp = ax.boxplot(data_to_plot, labels=labels, patch_artist=True)
+
+    # Color the boxes
+    colors = ["#2E86AB", "#A23B72"]
+    for patch, color in zip(bp["boxes"], colors[: len(bp["boxes"])]):
+        patch.set_facecolor(color)
+        patch.set_alpha(0.7)
+
+    # Add individual points
+    for i, data in enumerate(data_to_plot, 1):
+        x = np.random.normal(i, 0.04, size=len(data))
+        ax.scatter(x, data, alpha=0.4, s=30, color="black")
+
+    ax.set_ylabel("Vectors per Second", fontsize=12)
+    fs_type = results[0].get("filesystem", "unknown") if results else "unknown"
+    ax.set_title(
+        f"Insert Performance Distribution\nFilesystem: {fs_type.upper()}",
+        fontsize=14,
+        fontweight="bold",
+    )
+    ax.grid(True, alpha=0.3, axis="y")
+
+    # Add mean values
+    for i, data in enumerate(data_to_plot, 1):
+        mean_val = np.mean(data)
+        ax.text(
+            i,
+            mean_val,
+            f"μ={mean_val:.0f}",
+            ha="center",
+            va="bottom",
+            fontweight="bold",
+        )
+
+    plt.tight_layout()
+    plt.savefig(
+        os.path.join(output_dir, "insert_performance.png"), dpi=150, bbox_inches="tight"
+    )
+    plt.close()
+
+    print(f"Generated insert performance chart")
+
+
+def create_performance_summary_table(results, output_dir):
+    """Create a performance summary table as an image"""
+
+    # Calculate summary statistics
+    summary_data = {"Metric": [], "Baseline": [], "Development": [], "Improvement": []}
+
+    # Insert performance
+    baseline_insert = []
+    dev_insert = []
+
+    for result in results:
+        if "insert_performance" in result:
+            vectors_per_sec = result["insert_performance"].get("vectors_per_second", 0)
+            if result.get("node_type") == "dev":
+                dev_insert.append(vectors_per_sec)
+            else:
+                baseline_insert.append(vectors_per_sec)
+
+    if baseline_insert and dev_insert:
+        baseline_avg = np.mean(baseline_insert)
+        dev_avg = np.mean(dev_insert)
+        improvement = ((dev_avg - baseline_avg) / baseline_avg) * 100
+
+        summary_data["Metric"].append("Insert Rate (vec/s)")
+        summary_data["Baseline"].append(f"{baseline_avg:.0f}")
+        summary_data["Development"].append(f"{dev_avg:.0f}")
+        summary_data["Improvement"].append(f"{improvement:+.1f}%")
+
+    # Query performance (best case)
+    baseline_best_qps = 0
+    dev_best_qps = 0
+
+    for result in results:
+        if "query_performance" in result:
+            qp = result["query_performance"]
+            for topk in qp.values():
+                for batch in topk.values():
+                    qps = batch.get("queries_per_second", 0)
+                    if result.get("node_type") == "dev":
+                        dev_best_qps = max(dev_best_qps, qps)
+                    else:
+                        baseline_best_qps = max(baseline_best_qps, qps)
+
+    if baseline_best_qps and dev_best_qps:
+        improvement = ((dev_best_qps - baseline_best_qps) / baseline_best_qps) * 100
+        summary_data["Metric"].append("Best Query QPS")
+        summary_data["Baseline"].append(f"{baseline_best_qps:.0f}")
+        summary_data["Development"].append(f"{dev_best_qps:.0f}")
+        summary_data["Improvement"].append(f"{improvement:+.1f}%")
+
+    # Create table plot
+    # Check if we have data to create a table
+    if not summary_data["Metric"]:
+        print("No comparison data available for performance summary table")
+        return
+
+    fig, ax = plt.subplots(figsize=(10, 3))
+    ax.axis("tight")
+    ax.axis("off")
+
+    table_data = []
+    for i in range(len(summary_data["Metric"])):
+        table_data.append(
+            [
+                summary_data["Metric"][i],
+                summary_data["Baseline"][i],
+                summary_data["Development"][i],
+                summary_data["Improvement"][i],
+            ]
+        )
+
+    table = ax.table(
+        cellText=table_data,
+        colLabels=["Metric", "Baseline", "Development", "Change"],
+        cellLoc="center",
+        loc="center",
+        colWidths=[0.3, 0.2, 0.2, 0.2],
+    )
+
+    table.auto_set_font_size(False)
+    table.set_fontsize(11)
+    table.scale(1.2, 1.5)
+
+    # Style the header
+    for i in range(4):
+        table[(0, i)].set_facecolor("#2E86AB")
+        table[(0, i)].set_text_props(weight="bold", color="white")
+
+    # Color improvement cells
+    for i in range(1, len(table_data) + 1):
+        if "+" in table_data[i - 1][3]:
+            table[(i, 3)].set_facecolor("#90EE90")
+        elif "-" in table_data[i - 1][3]:
+            table[(i, 3)].set_facecolor("#FFB6C1")
+
+    fs_type = results[0].get("filesystem", "unknown") if results else "unknown"
+    plt.title(
+        f"Performance Summary - Filesystem: {fs_type.upper()}",
+        fontsize=14,
+        fontweight="bold",
+        pad=20,
+    )
+
+    plt.savefig(
+        os.path.join(output_dir, "performance_summary.png"),
+        dpi=150,
+        bbox_inches="tight",
+    )
+    plt.close()
+
+    print(f"Generated performance summary table")
+
+
+def main():
+    if len(sys.argv) < 3:
+        print("Usage: generate_better_graphs.py <results_dir> <output_dir>")
+        sys.exit(1)
+
+    results_dir = sys.argv[1]
+    output_dir = sys.argv[2]
+
+    # Create output directory if it doesn't exist
+    os.makedirs(output_dir, exist_ok=True)
+
+    # Load results
+    results = load_results(results_dir)
+    print(f"Loaded {len(results)} result files")
+
+    if not results:
+        print("No results found!")
+        sys.exit(1)
+
+    # Generate graphs
+    print("Generating QPS comparison chart...")
+    create_qps_comparison_chart(results, output_dir)
+
+    print("Generating latency comparison chart...")
+    create_latency_comparison_chart(results, output_dir)
+
+    print("Generating insert performance chart...")
+    create_insert_performance_chart(results, output_dir)
+
+    print("Generating performance summary table...")
+    create_performance_summary_table(results, output_dir)
+
+    print(f"\nAnalysis complete! Graphs saved to {output_dir}")
+
+    # Print summary
+    fs_type = results[0].get("filesystem", "unknown")
+    print(f"Filesystem detected: {fs_type}")
+    print(f"Total tests analyzed: {len(results)}")
+
+    baseline_count = sum(1 for r in results if r.get("node_type") == "baseline")
+    dev_count = sum(1 for r in results if r.get("node_type") == "dev")
+    print(f"Baseline tests: {baseline_count}")
+    print(f"Development tests: {dev_count}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/playbooks/roles/ai_collect_results/files/generate_graphs.py b/playbooks/roles/ai_collect_results/files/generate_graphs.py
new file mode 100755
index 00000000..53a835e2
--- /dev/null
+++ b/playbooks/roles/ai_collect_results/files/generate_graphs.py
@@ -0,0 +1,678 @@
+#!/usr/bin/env python3
+"""
+Generate graphs and analysis for AI benchmark results
+"""
+
+import json
+import os
+import sys
+import glob
+import numpy as np
+import matplotlib
+
+matplotlib.use("Agg")  # Use non-interactive backend
+import matplotlib.pyplot as plt
+from datetime import datetime
+from pathlib import Path
+from collections import defaultdict
+
+
+def load_results(results_dir):
+    """Load all JSON result files from the directory"""
+    results = []
+    json_files = glob.glob(os.path.join(results_dir, "*.json"))
+
+    for json_file in json_files:
+        try:
+            with open(json_file, "r") as f:
+                data = json.load(f)
+                # Extract filesystem info - prefer from JSON data over filename
+                filename = os.path.basename(json_file)
+                
+                # First, try to get filesystem from the JSON data itself
+                fs_type = data.get("filesystem", None)
+                
+                # If not in JSON, try to parse from filename (backwards compatibility)
+                if not fs_type:
+                    parts = filename.replace("results_", "").replace(".json", "").split("-")
+                    
+                    # Parse host info
+                    if "debian13-ai-" in filename:
+                        host_parts = (
+                            filename.replace("results_debian13-ai-", "")
+                            .replace("_1.json", "")
+                            .replace("_2.json", "")
+                            .replace("_3.json", "")
+                            .split("-")
+                        )
+                        if "xfs" in host_parts[0]:
+                            fs_type = "xfs"
+                            # Extract block size (e.g., "4k", "16k", etc.)
+                            block_size = host_parts[1] if len(host_parts) > 1 else "unknown"
+                        elif "ext4" in host_parts[0]:
+                            fs_type = "ext4"
+                            block_size = host_parts[1] if len(host_parts) > 1 else "4k"
+                        elif "btrfs" in host_parts[0]:
+                            fs_type = "btrfs"
+                            block_size = "default"
+                        else:
+                            fs_type = "unknown"
+                            block_size = "unknown"
+                    else:
+                        fs_type = "unknown"
+                        block_size = "unknown"
+                else:
+                    # If filesystem came from JSON, set appropriate block size
+                    if fs_type == "btrfs":
+                        block_size = "default"
+                    elif fs_type in ["ext4", "xfs"]:
+                        block_size = data.get("block_size", "4k")
+                    else:
+                        block_size = data.get("block_size", "default")
+                
+                is_dev = "dev" in filename
+                
+                # Use filesystem from JSON if available, otherwise use parsed value
+                if "filesystem" not in data:
+                    data["filesystem"] = fs_type
+                data["block_size"] = block_size
+                data["is_dev"] = is_dev
+                data["filename"] = filename
+
+                results.append(data)
+        except Exception as e:
+            print(f"Error loading {json_file}: {e}")
+
+    return results
+
+
+def create_filesystem_comparison_chart(results, output_dir):
+    """Create a bar chart comparing performance across filesystems"""
+    # Group by filesystem and baseline/dev
+    fs_data = defaultdict(lambda: {"baseline": [], "dev": []})
+
+    for result in results:
+        fs = result.get("filesystem", "unknown")
+        category = "dev" if result.get("is_dev", False) else "baseline"
+
+        # Extract actual performance data from results
+        if "insert_performance" in result:
+            insert_qps = result["insert_performance"].get("vectors_per_second", 0)
+        else:
+            insert_qps = 0
+        fs_data[fs][category].append(insert_qps)
+
+    # Prepare data for plotting
+    filesystems = list(fs_data.keys())
+    baseline_means = [
+        np.mean(fs_data[fs]["baseline"]) if fs_data[fs]["baseline"] else 0
+        for fs in filesystems
+    ]
+    dev_means = [
+        np.mean(fs_data[fs]["dev"]) if fs_data[fs]["dev"] else 0 for fs in filesystems
+    ]
+
+    x = np.arange(len(filesystems))
+    width = 0.35
+
+    fig, ax = plt.subplots(figsize=(10, 6))
+    baseline_bars = ax.bar(
+        x - width / 2, baseline_means, width, label="Baseline", color="#1f77b4"
+    )
+    dev_bars = ax.bar(
+        x + width / 2, dev_means, width, label="Development", color="#ff7f0e"
+    )
+
+    ax.set_xlabel("Filesystem")
+    ax.set_ylabel("Insert QPS")
+    ax.set_title("Vector Database Performance by Filesystem")
+    ax.set_xticks(x)
+    ax.set_xticklabels(filesystems)
+    ax.legend()
+    ax.grid(True, alpha=0.3)
+
+    # Add value labels on bars
+    for bars in [baseline_bars, dev_bars]:
+        for bar in bars:
+            height = bar.get_height()
+            if height > 0:
+                ax.annotate(
+                    f"{height:.0f}",
+                    xy=(bar.get_x() + bar.get_width() / 2, height),
+                    xytext=(0, 3),
+                    textcoords="offset points",
+                    ha="center",
+                    va="bottom",
+                )
+
+    plt.tight_layout()
+    plt.savefig(os.path.join(output_dir, "filesystem_comparison.png"), dpi=150)
+    plt.close()
+
+
+def create_block_size_analysis(results, output_dir):
+    """Create analysis for different block sizes (XFS specific)"""
+    # Filter XFS results
+    xfs_results = [r for r in results if r.get("filesystem") == "xfs"]
+
+    if not xfs_results:
+        return
+
+    # Group by block size
+    block_size_data = defaultdict(lambda: {"baseline": [], "dev": []})
+
+    for result in xfs_results:
+        block_size = result.get("block_size", "unknown")
+        category = "dev" if result.get("is_dev", False) else "baseline"
+        if "insert_performance" in result:
+            insert_qps = result["insert_performance"].get("vectors_per_second", 0)
+        else:
+            insert_qps = 0
+        block_size_data[block_size][category].append(insert_qps)
+
+    # Sort block sizes
+    block_sizes = sorted(
+        block_size_data.keys(),
+        key=lambda x: (
+            int(x.replace("k", "").replace("s", ""))
+            if x not in ["unknown", "default"]
+            else 0
+        ),
+    )
+
+    # Create grouped bar chart
+    baseline_means = [
+        (
+            np.mean(block_size_data[bs]["baseline"])
+            if block_size_data[bs]["baseline"]
+            else 0
+        )
+        for bs in block_sizes
+    ]
+    dev_means = [
+        np.mean(block_size_data[bs]["dev"]) if block_size_data[bs]["dev"] else 0
+        for bs in block_sizes
+    ]
+
+    x = np.arange(len(block_sizes))
+    width = 0.35
+
+    fig, ax = plt.subplots(figsize=(12, 6))
+    baseline_bars = ax.bar(
+        x - width / 2, baseline_means, width, label="Baseline", color="#2ca02c"
+    )
+    dev_bars = ax.bar(
+        x + width / 2, dev_means, width, label="Development", color="#d62728"
+    )
+
+    ax.set_xlabel("Block Size")
+    ax.set_ylabel("Insert QPS")
+    ax.set_title("XFS Performance by Block Size")
+    ax.set_xticks(x)
+    ax.set_xticklabels(block_sizes)
+    ax.legend()
+    ax.grid(True, alpha=0.3)
+
+    # Add value labels
+    for bars in [baseline_bars, dev_bars]:
+        for bar in bars:
+            height = bar.get_height()
+            if height > 0:
+                ax.annotate(
+                    f"{height:.0f}",
+                    xy=(bar.get_x() + bar.get_width() / 2, height),
+                    xytext=(0, 3),
+                    textcoords="offset points",
+                    ha="center",
+                    va="bottom",
+                )
+
+    plt.tight_layout()
+    plt.savefig(os.path.join(output_dir, "xfs_block_size_analysis.png"), dpi=150)
+    plt.close()
+
+
+def create_heatmap_analysis(results, output_dir):
+    """Create a heatmap showing performance across all configurations"""
+    # Group data by configuration and version
+    config_data = defaultdict(
+        lambda: {
+            "baseline": {"insert": 0, "query": 0},
+            "dev": {"insert": 0, "query": 0},
+        }
+    )
+
+    for result in results:
+        fs = result.get("filesystem", "unknown")
+        block_size = result.get("block_size", "default")
+        config = f"{fs}-{block_size}"
+        version = "dev" if result.get("is_dev", False) else "baseline"
+
+        # Get actual insert performance
+        if "insert_performance" in result:
+            insert_qps = result["insert_performance"].get("vectors_per_second", 0)
+        else:
+            insert_qps = 0
+
+        # Calculate average query QPS
+        query_qps = 0
+        if "query_performance" in result:
+            qp = result["query_performance"]
+            total_qps = 0
+            count = 0
+            for topk_key in ["topk_1", "topk_10", "topk_100"]:
+                if topk_key in qp:
+                    for batch_key in ["batch_1", "batch_10", "batch_100"]:
+                        if batch_key in qp[topk_key]:
+                            total_qps += qp[topk_key][batch_key].get(
+                                "queries_per_second", 0
+                            )
+                            count += 1
+            if count > 0:
+                query_qps = total_qps / count
+
+        config_data[config][version]["insert"] = insert_qps
+        config_data[config][version]["query"] = query_qps
+
+    # Sort configurations
+    configs = sorted(config_data.keys())
+
+    # Prepare data for heatmap
+    insert_baseline = [config_data[c]["baseline"]["insert"] for c in configs]
+    insert_dev = [config_data[c]["dev"]["insert"] for c in configs]
+    query_baseline = [config_data[c]["baseline"]["query"] for c in configs]
+    query_dev = [config_data[c]["dev"]["query"] for c in configs]
+
+    # Create figure with custom heatmap
+    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 8))
+
+    # Create data matrices
+    insert_data = np.array([insert_baseline, insert_dev]).T
+    query_data = np.array([query_baseline, query_dev]).T
+
+    # Insert QPS heatmap
+    im1 = ax1.imshow(insert_data, cmap="YlOrRd", aspect="auto")
+    ax1.set_xticks([0, 1])
+    ax1.set_xticklabels(["Baseline", "Development"])
+    ax1.set_yticks(range(len(configs)))
+    ax1.set_yticklabels(configs)
+    ax1.set_title("Insert Performance Heatmap")
+    ax1.set_ylabel("Configuration")
+
+    # Add text annotations
+    for i in range(len(configs)):
+        for j in range(2):
+            text = ax1.text(
+                j,
+                i,
+                f"{int(insert_data[i, j])}",
+                ha="center",
+                va="center",
+                color="black",
+            )
+
+    # Add colorbar
+    cbar1 = plt.colorbar(im1, ax=ax1)
+    cbar1.set_label("Insert QPS")
+
+    # Query QPS heatmap
+    im2 = ax2.imshow(query_data, cmap="YlGnBu", aspect="auto")
+    ax2.set_xticks([0, 1])
+    ax2.set_xticklabels(["Baseline", "Development"])
+    ax2.set_yticks(range(len(configs)))
+    ax2.set_yticklabels(configs)
+    ax2.set_title("Query Performance Heatmap")
+
+    # Add text annotations
+    for i in range(len(configs)):
+        for j in range(2):
+            text = ax2.text(
+                j,
+                i,
+                f"{int(query_data[i, j])}",
+                ha="center",
+                va="center",
+                color="black",
+            )
+
+    # Add colorbar
+    cbar2 = plt.colorbar(im2, ax=ax2)
+    cbar2.set_label("Query QPS")
+
+    plt.tight_layout()
+    plt.savefig(os.path.join(output_dir, "performance_heatmap.png"), dpi=150)
+    plt.close()
+
+
+def create_performance_trends(results, output_dir):
+    """Create line charts showing performance trends"""
+    # Group by filesystem type
+    fs_types = defaultdict(
+        lambda: {
+            "configs": [],
+            "baseline_insert": [],
+            "dev_insert": [],
+            "baseline_query": [],
+            "dev_query": [],
+        }
+    )
+
+    for result in results:
+        fs = result.get("filesystem", "unknown")
+        block_size = result.get("block_size", "default")
+        config = f"{block_size}"
+
+        if config not in fs_types[fs]["configs"]:
+            fs_types[fs]["configs"].append(config)
+            fs_types[fs]["baseline_insert"].append(0)
+            fs_types[fs]["dev_insert"].append(0)
+            fs_types[fs]["baseline_query"].append(0)
+            fs_types[fs]["dev_query"].append(0)
+
+        idx = fs_types[fs]["configs"].index(config)
+
+        # Calculate average query QPS from all test configurations
+        query_qps = 0
+        if "query_performance" in result:
+            qp = result["query_performance"]
+            total_qps = 0
+            count = 0
+            for topk_key in ["topk_1", "topk_10", "topk_100"]:
+                if topk_key in qp:
+                    for batch_key in ["batch_1", "batch_10", "batch_100"]:
+                        if batch_key in qp[topk_key]:
+                            total_qps += qp[topk_key][batch_key].get(
+                                "queries_per_second", 0
+                            )
+                            count += 1
+            if count > 0:
+                query_qps = total_qps / count
+
+        if result.get("is_dev", False):
+            if "insert_performance" in result:
+                fs_types[fs]["dev_insert"][idx] = result["insert_performance"].get(
+                    "vectors_per_second", 0
+                )
+            fs_types[fs]["dev_query"][idx] = query_qps
+        else:
+            if "insert_performance" in result:
+                fs_types[fs]["baseline_insert"][idx] = result["insert_performance"].get(
+                    "vectors_per_second", 0
+                )
+            fs_types[fs]["baseline_query"][idx] = query_qps
+
+    # Create separate plots for each filesystem
+    for fs, data in fs_types.items():
+        if not data["configs"]:
+            continue
+
+        fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 10))
+
+        x = range(len(data["configs"]))
+
+        # Insert performance
+        ax1.plot(
+            x,
+            data["baseline_insert"],
+            "o-",
+            label="Baseline",
+            linewidth=2,
+            markersize=8,
+        )
+        ax1.plot(
+            x, data["dev_insert"], "s-", label="Development", linewidth=2, markersize=8
+        )
+        ax1.set_xlabel("Configuration")
+        ax1.set_ylabel("Insert QPS")
+        ax1.set_title(f"{fs.upper()} Insert Performance")
+        ax1.set_xticks(x)
+        ax1.set_xticklabels(data["configs"])
+        ax1.legend()
+        ax1.grid(True, alpha=0.3)
+
+        # Query performance
+        ax2.plot(
+            x, data["baseline_query"], "o-", label="Baseline", linewidth=2, markersize=8
+        )
+        ax2.plot(
+            x, data["dev_query"], "s-", label="Development", linewidth=2, markersize=8
+        )
+        ax2.set_xlabel("Configuration")
+        ax2.set_ylabel("Query QPS")
+        ax2.set_title(f"{fs.upper()} Query Performance")
+        ax2.set_xticks(x)
+        ax2.set_xticklabels(data["configs"])
+        ax2.legend()
+        ax2.grid(True, alpha=0.3)
+
+        plt.tight_layout()
+        plt.savefig(os.path.join(output_dir, f"{fs}_performance_trends.png"), dpi=150)
+        plt.close()
+
+
+def create_simple_performance_trends(results, output_dir):
+    """Create a simple performance trends chart for basic Milvus testing"""
+    if not results:
+        return
+    
+    # Separate baseline and dev results
+    baseline_results = [r for r in results if not r.get("is_dev", False)]
+    dev_results = [r for r in results if r.get("is_dev", False)]
+    
+    if not baseline_results and not dev_results:
+        return
+    
+    fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 10))
+    
+    # Prepare data
+    baseline_insert = []
+    baseline_query = []
+    dev_insert = []
+    dev_query = []
+    labels = []
+    
+    # Process baseline results
+    for i, result in enumerate(baseline_results):
+        if "insert_performance" in result:
+            baseline_insert.append(result["insert_performance"].get("vectors_per_second", 0))
+        else:
+            baseline_insert.append(0)
+        
+        # Calculate average query QPS
+        query_qps = 0
+        if "query_performance" in result:
+            qp = result["query_performance"]
+            total_qps = 0
+            count = 0
+            for topk_key in ["topk_1", "topk_10", "topk_100"]:
+                if topk_key in qp:
+                    for batch_key in ["batch_1", "batch_10", "batch_100"]:
+                        if batch_key in qp[topk_key]:
+                            total_qps += qp[topk_key][batch_key].get("queries_per_second", 0)
+                            count += 1
+            if count > 0:
+                query_qps = total_qps / count
+        baseline_query.append(query_qps)
+        labels.append(f"Run {i+1}")
+    
+    # Process dev results
+    for result in dev_results:
+        if "insert_performance" in result:
+            dev_insert.append(result["insert_performance"].get("vectors_per_second", 0))
+        else:
+            dev_insert.append(0)
+        
+        query_qps = 0
+        if "query_performance" in result:
+            qp = result["query_performance"]
+            total_qps = 0
+            count = 0
+            for topk_key in ["topk_1", "topk_10", "topk_100"]:
+                if topk_key in qp:
+                    for batch_key in ["batch_1", "batch_10", "batch_100"]:
+                        if batch_key in qp[topk_key]:
+                            total_qps += qp[topk_key][batch_key].get("queries_per_second", 0)
+                            count += 1
+            if count > 0:
+                query_qps = total_qps / count
+        dev_query.append(query_qps)
+    
+    x = range(len(baseline_results) if baseline_results else len(dev_results))
+    
+    # Insert performance
+    if baseline_insert:
+        ax1.plot(x, baseline_insert, "o-", label="Baseline", linewidth=2, markersize=8)
+    if dev_insert:
+        ax1.plot(x[:len(dev_insert)], dev_insert, "s-", label="Development", linewidth=2, markersize=8)
+    ax1.set_xlabel("Test Run")
+    ax1.set_ylabel("Insert QPS")
+    ax1.set_title("Milvus Insert Performance")
+    ax1.set_xticks(x)
+    ax1.set_xticklabels(labels if labels else [f"Run {i+1}" for i in x])
+    ax1.legend()
+    ax1.grid(True, alpha=0.3)
+    
+    # Query performance
+    if baseline_query:
+        ax2.plot(x, baseline_query, "o-", label="Baseline", linewidth=2, markersize=8)
+    if dev_query:
+        ax2.plot(x[:len(dev_query)], dev_query, "s-", label="Development", linewidth=2, markersize=8)
+    ax2.set_xlabel("Test Run")
+    ax2.set_ylabel("Query QPS")
+    ax2.set_title("Milvus Query Performance")
+    ax2.set_xticks(x)
+    ax2.set_xticklabels(labels if labels else [f"Run {i+1}" for i in x])
+    ax2.legend()
+    ax2.grid(True, alpha=0.3)
+    
+    plt.tight_layout()
+    plt.savefig(os.path.join(output_dir, "performance_trends.png"), dpi=150)
+    plt.close()
+
+
+def generate_summary_statistics(results, output_dir):
+    """Generate summary statistics and save to JSON"""
+    summary = {
+        "total_tests": len(results),
+        "filesystems_tested": list(
+            set(r.get("filesystem", "unknown") for r in results)
+        ),
+        "configurations": {},
+        "performance_summary": {
+            "best_insert_qps": {"value": 0, "config": ""},
+            "best_query_qps": {"value": 0, "config": ""},
+            "average_insert_qps": 0,
+            "average_query_qps": 0,
+        },
+    }
+
+    # Calculate statistics
+    all_insert_qps = []
+    all_query_qps = []
+
+    for result in results:
+        fs = result.get("filesystem", "unknown")
+        block_size = result.get("block_size", "default")
+        is_dev = "dev" if result.get("is_dev", False) else "baseline"
+        config_name = f"{fs}-{block_size}-{is_dev}"
+
+        # Get actual performance metrics
+        if "insert_performance" in result:
+            insert_qps = result["insert_performance"].get("vectors_per_second", 0)
+        else:
+            insert_qps = 0
+
+        # Calculate average query QPS
+        query_qps = 0
+        if "query_performance" in result:
+            qp = result["query_performance"]
+            total_qps = 0
+            count = 0
+            for topk_key in ["topk_1", "topk_10", "topk_100"]:
+                if topk_key in qp:
+                    for batch_key in ["batch_1", "batch_10", "batch_100"]:
+                        if batch_key in qp[topk_key]:
+                            total_qps += qp[topk_key][batch_key].get(
+                                "queries_per_second", 0
+                            )
+                            count += 1
+            if count > 0:
+                query_qps = total_qps / count
+
+        all_insert_qps.append(insert_qps)
+        all_query_qps.append(query_qps)
+
+        summary["configurations"][config_name] = {
+            "insert_qps": insert_qps,
+            "query_qps": query_qps,
+            "host": result.get("host", "unknown"),
+        }
+
+        if insert_qps > summary["performance_summary"]["best_insert_qps"]["value"]:
+            summary["performance_summary"]["best_insert_qps"] = {
+                "value": insert_qps,
+                "config": config_name,
+            }
+
+        if query_qps > summary["performance_summary"]["best_query_qps"]["value"]:
+            summary["performance_summary"]["best_query_qps"] = {
+                "value": query_qps,
+                "config": config_name,
+            }
+
+    summary["performance_summary"]["average_insert_qps"] = (
+        np.mean(all_insert_qps) if all_insert_qps else 0
+    )
+    summary["performance_summary"]["average_query_qps"] = (
+        np.mean(all_query_qps) if all_query_qps else 0
+    )
+
+    # Save summary
+    with open(os.path.join(output_dir, "summary.json"), "w") as f:
+        json.dump(summary, f, indent=2)
+
+    return summary
+
+
+def main():
+    if len(sys.argv) < 3:
+        print("Usage: generate_graphs.py <results_dir> <output_dir>")
+        sys.exit(1)
+
+    results_dir = sys.argv[1]
+    output_dir = sys.argv[2]
+
+    # Create output directory
+    os.makedirs(output_dir, exist_ok=True)
+
+    # Load results
+    results = load_results(results_dir)
+
+    if not results:
+        print("No results found to analyze")
+        sys.exit(1)
+
+    print(f"Loaded {len(results)} result files")
+
+    # Generate graphs
+    print("Generating performance heatmap...")
+    create_heatmap_analysis(results, output_dir)
+
+    print("Generating performance trends...")
+    create_simple_performance_trends(results, output_dir)
+
+    print("Generating summary statistics...")
+    summary = generate_summary_statistics(results, output_dir)
+
+    print(f"\nAnalysis complete! Graphs saved to {output_dir}")
+    print(f"Total configurations tested: {summary['total_tests']}")
+    print(
+        f"Best insert QPS: {summary['performance_summary']['best_insert_qps']['value']} ({summary['performance_summary']['best_insert_qps']['config']})"
+    )
+    print(
+        f"Best query QPS: {summary['performance_summary']['best_query_qps']['value']} ({summary['performance_summary']['best_query_qps']['config']})"
+    )
+
+
+if __name__ == "__main__":
+    main()
diff --git a/playbooks/roles/ai_collect_results/files/generate_html_report.py b/playbooks/roles/ai_collect_results/files/generate_html_report.py
new file mode 100755
index 00000000..a205577c
--- /dev/null
+++ b/playbooks/roles/ai_collect_results/files/generate_html_report.py
@@ -0,0 +1,427 @@
+#!/usr/bin/env python3
+"""
+Generate HTML report for AI benchmark results
+"""
+
+import json
+import os
+import sys
+import glob
+from datetime import datetime
+from pathlib import Path
+
+HTML_TEMPLATE = """
+<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>AI Benchmark Results - {timestamp}</title>
+    <style>
+        body {{
+            font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, 'Helvetica Neue', Arial, sans-serif;
+            line-height: 1.6;
+            color: #333;
+            max-width: 1400px;
+            margin: 0 auto;
+            padding: 20px;
+            background-color: #f5f5f5;
+        }}
+        .header {{
+            background-color: #2c3e50;
+            color: white;
+            padding: 30px;
+            border-radius: 8px;
+            margin-bottom: 30px;
+            text-align: center;
+        }}
+        h1 {{
+            margin: 0;
+            font-size: 2.5em;
+        }}
+        .subtitle {{
+            margin-top: 10px;
+            opacity: 0.9;
+        }}
+        .summary-cards {{
+            display: grid;
+            grid-template-columns: repeat(auto-fit, minmax(250px, 1fr));
+            gap: 20px;
+            margin-bottom: 40px;
+        }}
+        .card {{
+            background: white;
+            padding: 20px;
+            border-radius: 8px;
+            box-shadow: 0 2px 4px rgba(0,0,0,0.1);
+            text-align: center;
+        }}
+        .card h3 {{
+            margin: 0 0 10px 0;
+            color: #2c3e50;
+        }}
+        .card .value {{
+            font-size: 2em;
+            font-weight: bold;
+            color: #3498db;
+        }}
+        .card .label {{
+            color: #7f8c8d;
+            font-size: 0.9em;
+        }}
+        .section {{
+            background: white;
+            padding: 30px;
+            border-radius: 8px;
+            box-shadow: 0 2px 4px rgba(0,0,0,0.1);
+            margin-bottom: 30px;
+        }}
+        .section h2 {{
+            color: #2c3e50;
+            border-bottom: 2px solid #3498db;
+            padding-bottom: 10px;
+            margin-bottom: 20px;
+        }}
+        .graph-container {{
+            text-align: center;
+            margin: 20px 0;
+        }}
+        .graph-container img {{
+            max-width: 100%;
+            height: auto;
+            border-radius: 4px;
+            box-shadow: 0 2px 8px rgba(0,0,0,0.1);
+        }}
+        .results-table {{
+            width: 100%;
+            border-collapse: collapse;
+            margin-top: 20px;
+        }}
+        .results-table th, .results-table td {{
+            padding: 12px;
+            text-align: left;
+            border-bottom: 1px solid #ddd;
+        }}
+        .results-table th {{
+            background-color: #f8f9fa;
+            font-weight: 600;
+            color: #2c3e50;
+        }}
+        .results-table tr:hover {{
+            background-color: #f8f9fa;
+        }}
+        .baseline {{
+            background-color: #e8f4fd;
+        }}
+        .dev {{
+            background-color: #fff3cd;
+        }}
+        .footer {{
+            text-align: center;
+            padding: 20px;
+            color: #7f8c8d;
+            font-size: 0.9em;
+        }}
+        .graph-grid {{
+            display: grid;
+            grid-template-columns: repeat(auto-fit, minmax(500px, 1fr));
+            gap: 20px;
+            margin: 20px 0;
+        }}
+        .best-config {{
+            background-color: #d4edda;
+            font-weight: bold;
+        }}
+        .navigation {{
+            position: sticky;
+            top: 20px;
+            background: white;
+            padding: 20px;
+            border-radius: 8px;
+            box-shadow: 0 2px 4px rgba(0,0,0,0.1);
+            margin-bottom: 30px;
+        }}
+        .navigation ul {{
+            list-style: none;
+            padding: 0;
+            margin: 0;
+        }}
+        .navigation li {{
+            display: inline-block;
+            margin-right: 20px;
+        }}
+        .navigation a {{
+            color: #3498db;
+            text-decoration: none;
+            font-weight: 500;
+        }}
+        .navigation a:hover {{
+            text-decoration: underline;
+        }}
+    </style>
+</head>
+<body>
+    <div class="header">
+        <h1>AI Vector Database Benchmark Results</h1>
+        <div class="subtitle">Generated on {timestamp}</div>
+    </div>
+    
+    <nav class="navigation">
+        <ul>
+            <li><a href="#summary">Summary</a></li>
+            <li><a href="#performance-metrics">Performance Metrics</a></li>
+            <li><a href="#performance-trends">Performance Trends</a></li>
+            <li><a href="#detailed-results">Detailed Results</a></li>
+        </ul>
+    </nav>
+    
+    <div id="summary" class="summary-cards">
+        <div class="card">
+            <h3>Total Tests</h3>
+            <div class="value">{total_tests}</div>
+            <div class="label">Configurations</div>
+        </div>
+        <div class="card">
+            <h3>Best Insert QPS</h3>
+            <div class="value">{best_insert_qps}</div>
+            <div class="label">{best_insert_config}</div>
+        </div>
+        <div class="card">
+            <h3>Best Query QPS</h3>
+            <div class="value">{best_query_qps}</div>
+            <div class="label">{best_query_config}</div>
+        </div>
+        <div class="card">
+            <h3>Test Runs</h3>
+            <div class="value">{total_tests}</div>
+            <div class="label">Benchmark Executions</div>
+        </div>
+    </div>
+    
+    <div id="performance-metrics" class="section">
+        <h2>Performance Metrics</h2>
+        <p>Key performance indicators for Milvus vector database operations.</p>
+        <div class="graph-container">
+            <img src="graphs/performance_heatmap.png" alt="Performance Metrics">
+        </div>
+    </div>
+    
+    <div id="performance-trends" class="section">
+        <h2>Performance Trends</h2>
+        <p>Performance comparison between baseline and development configurations.</p>
+        <div class="graph-container">
+            <img src="graphs/performance_trends.png" alt="Performance Trends">
+        </div>
+    </div>
+    
+    <div id="detailed-results" class="section">
+        <h2>Detailed Results Table</h2>
+        <table class="results-table">
+            <thead>
+                <tr>
+                    <th>Host</th>
+                    <th>Type</th>
+                    <th>Insert QPS</th>
+                    <th>Query QPS</th>
+                    <th>Timestamp</th>
+                </tr>
+            </thead>
+            <tbody>
+                {table_rows}
+            </tbody>
+        </table>
+    </div>
+    
+    <div class="footer">
+        <p>Generated by kdevops AI Benchmark Suite | <a href="https://github.com/linux-kdevops/kdevops">GitHub</a></p>
+    </div>
+</body>
+</html>
+"""
+
+
+def load_summary(graphs_dir):
+    """Load the summary.json file"""
+    summary_path = os.path.join(graphs_dir, "summary.json")
+    if os.path.exists(summary_path):
+        with open(summary_path, "r") as f:
+            return json.load(f)
+    return None
+
+
+def load_results(results_dir):
+    """Load all result files for detailed table"""
+    results = []
+    json_files = glob.glob(os.path.join(results_dir, "*.json"))
+
+    for json_file in json_files:
+        try:
+            with open(json_file, "r") as f:
+                data = json.load(f)
+                # Get filesystem from JSON data first, then fallback to filename parsing
+                filename = os.path.basename(json_file)
+                
+                # Skip results without valid performance data
+                insert_perf = data.get("insert_performance", {})
+                query_perf = data.get("query_performance", {})
+                if not insert_perf or not query_perf:
+                    continue
+                
+                # Get filesystem from JSON data
+                fs_type = data.get("filesystem", None)
+                
+                # If not in JSON, try to parse from filename (backwards compatibility)
+                if not fs_type and "debian13-ai" in filename:
+                    host_parts = (
+                        filename.replace("results_debian13-ai-", "")
+                        .replace("_1.json", "")
+                        .replace("_2.json", "")
+                        .replace("_3.json", "")
+                        .split("-")
+                    )
+                    if "xfs" in host_parts[0]:
+                        fs_type = "xfs"
+                        block_size = host_parts[1] if len(host_parts) > 1 else "4k"
+                    elif "ext4" in host_parts[0]:
+                        fs_type = "ext4"
+                        block_size = host_parts[1] if len(host_parts) > 1 else "4k"
+                    elif "btrfs" in host_parts[0]:
+                        fs_type = "btrfs"
+                        block_size = "default"
+                    else:
+                        fs_type = "unknown"
+                        block_size = "unknown"
+                else:
+                    # Set appropriate block size based on filesystem
+                    if fs_type == "btrfs":
+                        block_size = "default"
+                    else:
+                        block_size = data.get("block_size", "default")
+                
+                # Default to unknown if still not found
+                if not fs_type:
+                    fs_type = "unknown"
+                    block_size = "unknown"
+                
+                is_dev = "dev" in filename
+                
+                # Calculate average QPS from query performance data
+                query_qps = 0
+                query_count = 0
+                for topk_data in query_perf.values():
+                    for batch_data in topk_data.values():
+                        qps = batch_data.get("queries_per_second", 0)
+                        if qps > 0:
+                            query_qps += qps
+                            query_count += 1
+                if query_count > 0:
+                    query_qps = query_qps / query_count
+                
+                results.append(
+                    {
+                        "host": filename.replace("results_", "").replace(".json", ""),
+                        "filesystem": fs_type,
+                        "block_size": block_size,
+                        "type": "Development" if is_dev else "Baseline",
+                        "insert_qps": insert_perf.get("vectors_per_second", 0),
+                        "query_qps": query_qps,
+                        "timestamp": data.get("timestamp", "N/A"),
+                        "is_dev": is_dev,
+                    }
+                )
+        except Exception as e:
+            print(f"Error loading {json_file}: {e}")
+
+    # Sort by filesystem, block size, then type
+    results.sort(key=lambda x: (x["filesystem"], x["block_size"], x["type"]))
+    return results
+
+
+def generate_table_rows(results, best_configs):
+    """Generate HTML table rows"""
+    rows = []
+    for result in results:
+        config_key = f"{result['filesystem']}-{result['block_size']}-{'dev' if result['is_dev'] else 'baseline'}"
+        row_class = "dev" if result["is_dev"] else "baseline"
+
+        # Check if this is a best configuration
+        if config_key in best_configs:
+            row_class += " best-config"
+
+        row = f"""
+        <tr class="{row_class}">
+            <td>{result['host']}</td>
+            <td>{result['type']}</td>
+            <td>{result['insert_qps']:,}</td>
+            <td>{result['query_qps']:,}</td>
+            <td>{result['timestamp']}</td>
+        </tr>
+        """
+        rows.append(row)
+
+    return "\n".join(rows)
+
+
+def find_performance_trend_graphs(graphs_dir):
+    """Find performance trend graph"""
+    # Not used in basic implementation since we embed the graph directly
+    return ""
+
+
+def generate_html_report(results_dir, graphs_dir, output_path):
+    """Generate the HTML report"""
+    # Load summary
+    summary = load_summary(graphs_dir)
+    if not summary:
+        print("Warning: No summary.json found")
+        summary = {
+            "total_tests": 0,
+            "filesystems_tested": [],
+            "performance_summary": {
+                "best_insert_qps": {"value": 0, "config": "N/A"},
+                "best_query_qps": {"value": 0, "config": "N/A"},
+            },
+        }
+
+    # Load detailed results
+    results = load_results(results_dir)
+
+    # Find best configurations
+    best_configs = set()
+    if summary["performance_summary"]["best_insert_qps"]["config"]:
+        best_configs.add(summary["performance_summary"]["best_insert_qps"]["config"])
+    if summary["performance_summary"]["best_query_qps"]["config"]:
+        best_configs.add(summary["performance_summary"]["best_query_qps"]["config"])
+
+    # Generate HTML
+    html_content = HTML_TEMPLATE.format(
+        timestamp=datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
+        total_tests=summary["total_tests"],
+        best_insert_qps=f"{summary['performance_summary']['best_insert_qps']['value']:,}",
+        best_insert_config=summary["performance_summary"]["best_insert_qps"]["config"],
+        best_query_qps=f"{summary['performance_summary']['best_query_qps']['value']:,}",
+        best_query_config=summary["performance_summary"]["best_query_qps"]["config"],
+        table_rows=generate_table_rows(results, best_configs),
+    )
+
+    # Write HTML file
+    with open(output_path, "w") as f:
+        f.write(html_content)
+
+    print(f"HTML report generated: {output_path}")
+
+
+def main():
+    if len(sys.argv) < 4:
+        print("Usage: generate_html_report.py <results_dir> <graphs_dir> <output_html>")
+        sys.exit(1)
+
+    results_dir = sys.argv[1]
+    graphs_dir = sys.argv[2]
+    output_html = sys.argv[3]
+
+    generate_html_report(results_dir, graphs_dir, output_html)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/playbooks/roles/ai_collect_results/tasks/main.yml b/playbooks/roles/ai_collect_results/tasks/main.yml
new file mode 100644
index 00000000..6a15d89c
--- /dev/null
+++ b/playbooks/roles/ai_collect_results/tasks/main.yml
@@ -0,0 +1,220 @@
+---
+- name: Import optional extra_args file
+  ansible.builtin.include_vars: "{{ item }}"
+  # TODO: Review - was ignore_errors: true
+  failed_when: false  # Always succeed - review this condition
+  with_items:
+    - "../extra_vars.yaml"
+  tags: vars
+
+- name: Set local directories
+  ansible.builtin.set_fact:
+    local_results_dir: "{{ topdir_path }}/workflows/ai/results"
+    local_scripts_dir: "{{ topdir_path }}/workflows/ai/scripts"
+  run_once: true
+  delegate_to: localhost
+
+- name: Create local directories if they don't exist
+  ansible.builtin.file:
+    path: "{{ item }}"
+    state: directory
+    mode: '0755'
+  loop:
+    - "{{ local_results_dir }}"
+    - "{{ local_scripts_dir }}"
+  run_once: true
+  delegate_to: localhost
+
+- name: Create analysis directory
+  ansible.builtin.file:
+    path: "{{ ai_benchmark_results_dir }}/analysis"
+    state: directory
+    mode: '0755'
+  become: true
+
+- name: Copy analysis scripts to scripts directory
+  ansible.builtin.copy:
+    src: "{{ item }}"
+    dest: "{{ local_scripts_dir }}/{{ item }}"
+    mode: '0755'
+    force: yes
+  loop:
+    - analyze_results.py
+    - generate_graphs.py
+    - generate_html_report.py
+  run_once: true
+  delegate_to: localhost
+  become: true
+
+- name: Generate analysis configuration
+  ansible.builtin.template:
+    src: analysis_config.json.j2
+    dest: "{{ local_scripts_dir }}/analysis_config.json"
+    mode: '0644'
+  run_once: true
+  delegate_to: localhost
+  when: ai_benchmark_enable_graphing | bool
+
+- name: Check if benchmark results exist
+  ansible.builtin.stat:
+    path: "{{ ai_benchmark_results_dir }}"
+  register: results_dir_check
+
+- name: Find benchmark result files on remote host
+  ansible.builtin.find:
+    paths: "{{ ai_benchmark_results_dir }}"
+    patterns: "results_*.json"
+  register: remote_results
+  when: results_dir_check.stat.exists
+
+- name: Clean up entire local results directory before collection
+  ansible.builtin.file:
+    path: "{{ local_results_dir }}"
+    state: absent
+  run_once: true
+  delegate_to: localhost
+  become: true
+  when:
+    - results_dir_check.stat.exists
+    - remote_results.files is defined
+
+- name: Recreate local results directory with correct permissions
+  ansible.builtin.file:
+    path: "{{ local_results_dir }}"
+    state: directory
+    mode: '0755'
+  run_once: true
+  delegate_to: localhost
+  become: false
+  when:
+    - results_dir_check.stat.exists
+    - remote_results.files is defined
+
+- name: Collect result files from all hosts
+  ansible.builtin.fetch:
+    src: "{{ item.path }}"
+    dest: "{{ local_results_dir }}/{{ item.path | basename }}"
+    flat: true
+    mode: '0644'
+  loop: "{{ remote_results.files | default([]) }}"
+  when:
+    - results_dir_check.stat.exists
+    - remote_results.files is defined
+
+- name: Check if any results were collected
+  ansible.builtin.find:
+    paths: "{{ local_results_dir }}"
+    patterns: "*results_*.json"
+  register: collected_results
+  run_once: true
+  delegate_to: localhost
+
+- name: Display message if no results found
+  ansible.builtin.debug:
+    msg: |
+      No benchmark results found to analyze.
+      Please run 'make ai-tests' first to generate benchmark results.
+  when: collected_results.files is not defined or collected_results.files | length == 0
+  run_once: true
+  delegate_to: localhost
+
+- name: Ensure results directory has correct permissions
+  ansible.builtin.file:
+    path: "{{ local_results_dir }}"
+    owner: "{{ lookup('env', 'USER') }}"
+    group: "{{ lookup('env', 'USER') }}"
+    mode: '0755'
+    recurse: true
+  run_once: true
+  delegate_to: localhost
+  become: true
+  tags: ['results', 'analysis']
+
+- name: Run results analysis
+  ansible.builtin.command: >
+    python3 {{ local_scripts_dir }}/analyze_results.py
+    --results-dir {{ local_results_dir }}
+    --output-dir {{ local_results_dir }}
+    {% if ai_benchmark_enable_graphing | bool %}--config {{ local_scripts_dir }}/analysis_config.json{% endif %}
+  register: analysis_result
+  run_once: true
+  delegate_to: localhost
+  when: collected_results.files is defined and collected_results.files | length > 0
+  tags: ['results', 'analysis']
+
+
+- name: Create graphs directory
+  ansible.builtin.file:
+    path: "{{ local_results_dir }}/graphs"
+    state: directory
+    mode: '0755'
+  run_once: true
+  delegate_to: localhost
+  when:
+    - collected_results.files is defined
+    - collected_results.files | length > 0
+  tags: ['results', 'graphs']
+
+- name: Generate performance graphs
+  ansible.builtin.command: >
+    python3 {{ local_scripts_dir }}/generate_better_graphs.py
+    {{ local_results_dir }}
+    {{ local_results_dir }}/graphs
+  register: graph_generation_result
+  failed_when: false
+  run_once: true
+  delegate_to: localhost
+  when:
+    - collected_results.files is defined
+    - collected_results.files | length > 0
+    - ai_benchmark_enable_graphing|bool
+  tags: ['results', 'graphs']
+
+- name: Fallback to basic graphs if better graphs fail
+  ansible.builtin.command: >
+    python3 {{ local_scripts_dir }}/generate_graphs.py
+    {{ local_results_dir }}
+    {{ local_results_dir }}/graphs
+  run_once: true
+  delegate_to: localhost
+  when:
+    - collected_results.files is defined
+    - collected_results.files | length > 0
+    - ai_benchmark_enable_graphing|bool
+    - graph_generation_result is defined
+    - graph_generation_result.rc != 0
+  tags: ['results', 'graphs']
+
+- name: Generate HTML report
+  ansible.builtin.command: >
+    python3 {{ local_scripts_dir }}/generate_html_report.py
+    {{ local_results_dir }}
+    {{ local_results_dir }}/graphs
+    {{ local_results_dir }}/benchmark_report.html
+  register: html_generation_result
+  run_once: true
+  delegate_to: localhost
+  when:
+    - collected_results.files is defined
+    - collected_results.files | length > 0
+
+- name: Display analysis completion message
+  ansible.builtin.debug:
+    msg: |
+      Benchmark analysis completed!
+      Results available in: {{ local_results_dir }}/
+      Summary report: {{ local_results_dir }}/benchmark_summary.txt
+      HTML report: {{ local_results_dir }}/benchmark_report.html
+      {% if ai_benchmark_enable_graphing | bool %}
+      Graphs generated in: {{ local_results_dir }}/graphs/
+      {% endif %}
+
+      To view the HTML report:
+      - Open {{ local_results_dir }}/benchmark_report.html in a web browser
+  run_once: true
+  delegate_to: localhost
+  when:
+    - collected_results.files is defined
+    - collected_results.files | length > 0
+    - analysis_result is defined
+    - analysis_result.rc == 0
diff --git a/playbooks/roles/ai_collect_results/templates/analysis_config.json.j2 b/playbooks/roles/ai_collect_results/templates/analysis_config.json.j2
new file mode 100644
index 00000000..5a879649
--- /dev/null
+++ b/playbooks/roles/ai_collect_results/templates/analysis_config.json.j2
@@ -0,0 +1,6 @@
+{
+  "enable_graphing": {{ ai_benchmark_enable_graphing|default(true)|lower }},
+  "graph_format": "{{ ai_benchmark_graph_format|default('png') }}",
+  "graph_dpi": {{ ai_benchmark_graph_dpi|default(150) }},
+  "graph_theme": "{{ ai_benchmark_graph_theme|default('seaborn') }}"
+}
diff --git a/playbooks/roles/ai_destroy/tasks/main.yml b/playbooks/roles/ai_destroy/tasks/main.yml
new file mode 100644
index 00000000..29406b37
--- /dev/null
+++ b/playbooks/roles/ai_destroy/tasks/main.yml
@@ -0,0 +1,63 @@
+---
+- name: Import optional extra_args file
+  ansible.builtin.include_vars: "{{ item }}"
+  # TODO: Review - was ignore_errors: true
+  failed_when: false  # Always succeed - review this condition
+  with_items:
+    - "../extra_vars.yaml"
+  tags: vars
+
+- name: Stop and remove all AI containers
+  community.docker.docker_container:
+    name: "{{ item }}"
+    state: absent
+  loop:
+    - "{{ ai_milvus_container_name }}"
+    - "{{ ai_minio_container_name }}"
+    - "{{ ai_etcd_container_name }}"
+  when: ai_milvus_docker | bool
+  # TODO: Review - was ignore_errors: true
+  failed_when: false  # Always succeed - review this condition
+
+- name: Remove Docker network
+  community.docker.docker_network:
+    name: "{{ ai_docker_network_name }}"
+    state: absent
+  when: ai_milvus_docker | bool
+  # TODO: Review - was ignore_errors: true
+  failed_when: false  # Always succeed - review this condition
+
+- name: Remove Docker storage directories
+  ansible.builtin.file:
+    path: "{{ item }}"
+    state: absent
+  loop:
+    - "{{ ai_docker_data_path }}"
+    - "{{ ai_docker_etcd_data_path }}"
+    - "{{ ai_docker_minio_data_path }}"
+  when: ai_milvus_docker | bool
+  become: true
+
+- name: Remove benchmark results directory
+  ansible.builtin.file:
+    path: "{{ ai_benchmark_results_dir }}"
+    state: absent
+  become: true
+
+- name: Remove Docker images (optional)
+  community.docker.docker_image:
+    name: "{{ item }}"
+    state: absent
+  loop:
+    - "{{ ai_milvus_container_image_string }}"
+    - "{{ ai_etcd_container_image_string }}"
+    - "{{ ai_minio_container_image_string }}"
+  when: ai_milvus_docker | bool
+  # TODO: Review - was ignore_errors: true
+  failed_when: false  # Always succeed - review this condition
+
+- name: Display destroy completion message
+  ansible.builtin.debug:
+    msg: |
+      AI benchmark environment completely destroyed.
+      All data, containers, and results have been removed.
diff --git a/playbooks/roles/ai_docker_storage/tasks/main.yml b/playbooks/roles/ai_docker_storage/tasks/main.yml
new file mode 100644
index 00000000..612df3cb
--- /dev/null
+++ b/playbooks/roles/ai_docker_storage/tasks/main.yml
@@ -0,0 +1,123 @@
+---
+- name: Import optional extra_args file
+  include_vars: "{{ item }}"
+  ignore_errors: yes
+  with_items:
+    - "../extra_vars.yaml"
+  tags: vars
+
+- name: Docker storage setup
+  when: ai_docker_storage_enable|bool
+  block:
+    - name: Install filesystem utilities
+      package:
+        name:
+          - xfsprogs
+          - e2fsprogs
+          - btrfs-progs
+          - rsync
+        state: present
+      become: yes
+      become_method: sudo
+
+    - name: Check if device exists
+      stat:
+        path: "{{ ai_docker_device }}"
+      register: docker_device_stat
+      failed_when: not docker_device_stat.stat.exists
+
+    - name: Check if Docker storage is already mounted
+      command: mountpoint -q {{ ai_docker_mount_point }}
+      register: docker_mount_check
+      changed_when: false
+      failed_when: false
+
+    - name: Setup Docker storage filesystem
+      when: docker_mount_check.rc != 0
+      block:
+        - name: Create Docker mount point directory
+          file:
+            path: "{{ ai_docker_mount_point }}"
+            state: directory
+            mode: '0755'
+          become: yes
+          become_method: sudo
+
+        - name: Format device with XFS
+          command: >
+            mkfs.xfs -f
+            -b size={{ ai_docker_xfs_blocksize | default(4096) }}
+            -s size={{ ai_docker_xfs_sectorsize | default(4096) }}
+            {{ ai_docker_xfs_mkfs_opts | default('') }}
+            {{ ai_docker_device }}
+          when: ai_docker_fstype == "xfs"
+          become: yes
+          become_method: sudo
+
+        - name: Format device with Btrfs
+          command: mkfs.btrfs {{ ai_docker_btrfs_mkfs_opts }} {{ ai_docker_device }}
+          when: ai_docker_fstype == "btrfs"
+          become: yes
+          become_method: sudo
+
+        - name: Format device with ext4
+          command: mkfs.ext4 {{ ai_docker_ext4_mkfs_opts }} {{ ai_docker_device }}
+          when: ai_docker_fstype == "ext4"
+          become: yes
+          become_method: sudo
+
+        - name: Mount Docker storage filesystem
+          mount:
+            path: "{{ ai_docker_mount_point }}"
+            src: "{{ ai_docker_device }}"
+            fstype: "{{ ai_docker_fstype }}"
+            opts: defaults,noatime
+            state: mounted
+          become: yes
+          become_method: sudo
+
+        - name: Add Docker storage mount to fstab
+          mount:
+            path: "{{ ai_docker_mount_point }}"
+            src: "{{ ai_docker_device }}"
+            fstype: "{{ ai_docker_fstype }}"
+            opts: defaults,noatime
+            state: present
+          become: yes
+          become_method: sudo
+
+    - name: Check if Docker service exists
+      systemd:
+        name: docker
+      register: docker_service_status
+      failed_when: false
+      changed_when: false
+
+    - name: Stop Docker service if running
+      systemd:
+        name: docker
+        state: stopped
+      become: yes
+      become_method: sudo
+      when: docker_service_status.status is defined and docker_service_status.status.ActiveState == 'active'
+      ignore_errors: yes
+
+    # Note: When ai_docker_storage_enable is true, we mount directly to /var/lib/docker
+    # No need to move data or create symlinks as the storage is already in the right place
+
+    - name: Ensure Docker directory has proper permissions
+      file:
+        path: "{{ ai_docker_mount_point }}"
+        state: directory
+        mode: '0711'
+        owner: root
+        group: root
+      become: yes
+      become_method: sudo
+      when: ai_docker_mount_point == '/var/lib/docker'
+
+    # Docker will be installed and started later by the ai role
+    # We only prepare the storage here
+    - name: Display Docker storage setup complete
+      debug:
+        msg: "Docker storage has been prepared at: {{ ai_docker_mount_point }}"
diff --git a/playbooks/roles/ai_install/tasks/main.yml b/playbooks/roles/ai_install/tasks/main.yml
new file mode 100644
index 00000000..820e0f64
--- /dev/null
+++ b/playbooks/roles/ai_install/tasks/main.yml
@@ -0,0 +1,90 @@
+---
+- name: Include role create_data_partition
+  include_role:
+    name: create_data_partition
+  tags: ['setup', 'data_partition']
+
+- name: Include role common
+  include_role:
+    name: common
+  when:
+    - infer_uid_and_group|bool
+
+- name: Ensure data_dir has correct ownership
+  tags: ['setup']
+  become: true
+  # become_method: sudo  # sudo is the default, not needed
+  ansible.builtin.file:
+    path: "{{ data_path }}"
+    owner: "{{ data_user }}"
+    group: "{{ data_group }}"
+    recurse: true
+    state: directory
+    mode: '0755'
+
+- name: Import optional extra_args file
+  ansible.builtin.include_vars: "{{ item }}"
+  # TODO: Review - was ignore_errors: true
+  failed_when: false  # Always succeed - review this condition
+  with_items:
+    - "../extra_vars.yaml"
+  tags: vars
+
+- name: Install Docker if using Docker deployment
+  ansible.builtin.apt:
+    name:
+      - docker.io
+      - docker-compose
+    state: present
+    update_cache: true
+  when: ai_milvus_docker | bool
+  become: true
+
+- name: Add user to docker group
+  ansible.builtin.user:
+    name: "{{ data_user | default(ansible_user_id) }}"
+    groups: docker
+    append: true
+  when: ai_milvus_docker | bool
+  become: true
+
+- name: Install Python dependencies for AI benchmarks
+  ansible.builtin.pip:
+    name:
+      - pymilvus>=2.3.0
+      - numpy
+      - scikit-learn
+      - pandas
+      - tqdm
+    state: present
+  when: ai_benchmark_enable_graphing | bool
+
+- name: Install additional Python dependencies for graphing
+  ansible.builtin.pip:
+    name:
+      - matplotlib
+      - seaborn
+      - plotly
+    state: present
+  when: ai_benchmark_enable_graphing | bool
+
+- name: Install filesystem utilities for XFS
+  ansible.builtin.apt:
+    name: xfsprogs
+    state: present
+  when: ai_filesystem == "xfs"
+  become: true
+
+- name: Install filesystem utilities for Btrfs
+  ansible.builtin.apt:
+    name: btrfs-progs
+    state: present
+  when: ai_filesystem == "btrfs"
+  become: true
+
+- name: Create benchmark results directory
+  ansible.builtin.file:
+    path: "{{ ai_benchmark_results_dir }}"
+    state: directory
+    mode: '0755'
+  become: true
diff --git a/playbooks/roles/ai_results/tasks/main.yml b/playbooks/roles/ai_results/tasks/main.yml
new file mode 100644
index 00000000..094a9025
--- /dev/null
+++ b/playbooks/roles/ai_results/tasks/main.yml
@@ -0,0 +1,22 @@
+---
+# AI Results collection role
+# This role collects and aggregates benchmark results from various AI components
+
+- name: Create central results directory
+  ansible.builtin.file:
+    path: "{{ ai_benchmark_results_dir }}"
+    state: directory
+    mode: '0755'
+
+- name: Find all benchmark result files
+  ansible.builtin.find:
+    paths: "{{ ai_benchmark_results_dir }}"
+    patterns: "*.json"
+    recurse: true
+  register: result_files
+
+- name: Display found result files
+  ansible.builtin.debug:
+    msg: "Found {{ result_files.files | length }} result files"
+
+# Future: Add result aggregation, analysis, and reporting tasks here
diff --git a/playbooks/roles/ai_run_benchmarks/files/milvus_benchmark.py b/playbooks/roles/ai_run_benchmarks/files/milvus_benchmark.py
new file mode 100644
index 00000000..4ce14fb7
--- /dev/null
+++ b/playbooks/roles/ai_run_benchmarks/files/milvus_benchmark.py
@@ -0,0 +1,506 @@
+#!/usr/bin/env python3
+"""
+Milvus Vector Database Benchmark Script
+
+This script performs comprehensive benchmarking of Milvus vector database
+including vector insertion, index creation, and query performance testing.
+"""
+
+import json
+import numpy as np
+import time
+import argparse
+import sys
+import subprocess
+import os
+from datetime import datetime
+from typing import List, Dict, Any, Tuple
+import logging
+
+try:
+    from pymilvus import (
+        connections,
+        Collection,
+        CollectionSchema,
+        FieldSchema,
+        DataType,
+        utility,
+    )
+    from pymilvus.client.types import LoadState
+except ImportError as e:
+    print(f"Error importing pymilvus: {e}")
+    print(f"Python executable: {sys.executable}")
+    print(f"Python path: {sys.path}")
+    print("Please ensure pymilvus is installed in the virtual environment")
+    sys.exit(1)
+
+
+class MilvusBenchmark:
+    def __init__(self, config: Dict[str, Any]):
+        self.config = config
+        self.collection = None
+        self.results = {
+            "config": config,
+            "timestamp": datetime.now().isoformat(),
+            "insert_performance": {},
+            "index_performance": {},
+            "query_performance": {},
+            "system_info": {},
+        }
+
+        # Setup logging
+        logging.basicConfig(
+            level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s"
+        )
+        self.logger = logging.getLogger(__name__)
+
+    def get_filesystem_info(self, path: str = "/data") -> Dict[str, str]:
+        """Detect filesystem type for the given path"""
+        try:
+            # Use df -T to get filesystem type
+            result = subprocess.run(
+                ["df", "-T", path], capture_output=True, text=True, check=True
+            )
+
+            lines = result.stdout.strip().split("\n")
+            if len(lines) >= 2:
+                # Second line contains the filesystem info
+                # Format: Filesystem Type 1K-blocks Used Available Use% Mounted on
+                parts = lines[1].split()
+                if len(parts) >= 2:
+                    filesystem_type = parts[1]
+                    mount_point = parts[-1] if len(parts) >= 7 else path
+
+                    return {
+                        "filesystem": filesystem_type,
+                        "mount_point": mount_point,
+                        "data_path": path,
+                    }
+        except subprocess.CalledProcessError as e:
+            self.logger.warning(f"Failed to detect filesystem for {path}: {e}")
+        except Exception as e:
+            self.logger.warning(f"Error detecting filesystem for {path}: {e}")
+
+        # Fallback: try to detect from /proc/mounts
+        try:
+            with open("/proc/mounts", "r") as f:
+                mounts = f.readlines()
+
+            # Find the mount that contains our path
+            best_match = ""
+            best_fs = "unknown"
+
+            for line in mounts:
+                parts = line.strip().split()
+                if len(parts) >= 3:
+                    mount_point = parts[1]
+                    fs_type = parts[2]
+
+                    # Check if this mount point is a prefix of our path
+                    if path.startswith(mount_point) and len(mount_point) > len(
+                        best_match
+                    ):
+                        best_match = mount_point
+                        best_fs = fs_type
+
+            if best_fs != "unknown":
+                return {
+                    "filesystem": best_fs,
+                    "mount_point": best_match,
+                    "data_path": path,
+                }
+
+        except Exception as e:
+            self.logger.warning(f"Error reading /proc/mounts: {e}")
+
+        # Final fallback
+        return {"filesystem": "unknown", "mount_point": "/", "data_path": path}
+
+    def connect_to_milvus(self) -> bool:
+        """Connect to Milvus server"""
+        try:
+            connections.connect(
+                alias="default", host=self.config["host"], port=self.config["port"]
+            )
+            self.logger.info(
+                f"Connected to Milvus at {self.config['host']}:{self.config['port']}"
+            )
+            return True
+        except Exception as e:
+            self.logger.error(f"Failed to connect to Milvus: {e}")
+            return False
+
+    def create_collection(self) -> bool:
+        """Create benchmark collection"""
+        try:
+            collection_name = self.config["database_name"]
+
+            # Drop collection if exists
+            if utility.has_collection(collection_name):
+                utility.drop_collection(collection_name)
+                self.logger.info(f"Dropped existing collection: {collection_name}")
+
+            # Define schema
+            fields = [
+                FieldSchema(
+                    name="id", dtype=DataType.INT64, is_primary=True, auto_id=False
+                ),
+                FieldSchema(
+                    name="vector",
+                    dtype=DataType.FLOAT_VECTOR,
+                    dim=self.config["vector_dimensions"],
+                ),
+            ]
+            schema = CollectionSchema(
+                fields,
+                f"Benchmark collection with {self.config['vector_dimensions']}D vectors",
+            )
+
+            # Create collection
+            self.collection = Collection(collection_name, schema)
+            self.logger.info(f"Created collection: {collection_name}")
+            return True
+        except Exception as e:
+            self.logger.error(f"Failed to create collection: {e}")
+            return False
+
+    def generate_vectors(self, count: int) -> Tuple[List[int], List[List[float]]]:
+        """Generate random vectors for benchmarking"""
+        ids = list(range(count))
+        vectors = (
+            np.random.random((count, self.config["vector_dimensions"]))
+            .astype(np.float32)
+            .tolist()
+        )
+        return ids, vectors
+
+    def benchmark_insert(self) -> bool:
+        """Benchmark vector insertion performance"""
+        try:
+            self.logger.info("Starting insert benchmark...")
+
+            batch_size = 1000
+            total_vectors = self.config["vector_dataset_size"]
+
+            insert_times = []
+
+            for i in range(0, total_vectors, batch_size):
+                current_batch_size = min(batch_size, total_vectors - i)
+
+                # Generate batch data
+                ids, vectors = self.generate_vectors(current_batch_size)
+                ids = [id + i for id in ids]  # Ensure unique IDs
+
+                # Insert batch
+                start_time = time.time()
+                self.collection.insert([ids, vectors])
+                insert_time = time.time() - start_time
+                insert_times.append(insert_time)
+
+                if (i // batch_size) % 100 == 0:
+                    self.logger.info(
+                        f"Inserted {i + current_batch_size}/{total_vectors} vectors"
+                    )
+
+            # Flush to ensure data is persisted
+            self.logger.info("Flushing collection...")
+            flush_start = time.time()
+            self.collection.flush()
+            flush_time = time.time() - flush_start
+
+            # Calculate statistics
+            total_insert_time = sum(insert_times)
+            avg_insert_time = total_insert_time / len(insert_times)
+            vectors_per_second = total_vectors / total_insert_time
+
+            self.results["insert_performance"] = {
+                "total_vectors": total_vectors,
+                "total_time_seconds": total_insert_time,
+                "flush_time_seconds": flush_time,
+                "average_batch_time_seconds": avg_insert_time,
+                "vectors_per_second": vectors_per_second,
+                "batch_size": batch_size,
+            }
+
+            self.logger.info(
+                f"Insert benchmark completed: {vectors_per_second:.2f} vectors/sec"
+            )
+            return True
+
+        except Exception as e:
+            self.logger.error(f"Insert benchmark failed: {e}")
+            return False
+
+    def benchmark_index_creation(self) -> bool:
+        """Benchmark index creation performance"""
+        try:
+            self.logger.info("Starting index creation benchmark...")
+
+            index_params = {
+                "metric_type": "L2",
+                "index_type": self.config["index_type"],
+                "params": {},
+            }
+
+            if self.config["index_type"] == "HNSW":
+                index_params["params"] = {
+                    "M": self.config.get("index_hnsw_m", 16),
+                    "efConstruction": self.config.get(
+                        "index_hnsw_ef_construction", 200
+                    ),
+                }
+            elif self.config["index_type"] == "IVF_FLAT":
+                index_params["params"] = {
+                    "nlist": self.config.get("index_ivf_nlist", 1024)
+                }
+
+            start_time = time.time()
+            self.collection.create_index("vector", index_params)
+            index_time = time.time() - start_time
+
+            self.results["index_performance"] = {
+                "index_type": self.config["index_type"],
+                "index_params": index_params,
+                "creation_time_seconds": index_time,
+            }
+
+            self.logger.info(f"Index creation completed in {index_time:.2f} seconds")
+            return True
+
+        except Exception as e:
+            self.logger.error(f"Index creation failed: {e}")
+            return False
+
+    def benchmark_queries(self) -> bool:
+        """Benchmark query performance"""
+        try:
+            self.logger.info("Starting query benchmark...")
+
+            # Load collection with timeout and retry logic
+            self.logger.info("Loading collection into memory...")
+            max_retries = 3
+            retry_count = 0
+            load_success = False
+
+            while retry_count < max_retries and not load_success:
+                try:
+                    # First, ensure the collection is released if previously loaded
+                    if utility.load_state(self.collection.name) != LoadState.NotLoad:
+                        self.logger.info("Releasing existing collection load...")
+                        self.collection.release()
+                        time.sleep(5)  # Wait for release to complete
+
+                    # Now load the collection with explicit timeout
+                    # For large collections, we may need to adjust replica number
+                    self.logger.info(
+                        f"Loading collection (attempt {retry_count + 1}/{max_retries})..."
+                    )
+                    # Check collection size first
+                    collection_stats = self.collection.num_entities
+                    self.logger.info(f"Collection has {collection_stats} entities")
+
+                    # For very large collections, load with specific parameters
+                    if collection_stats > 500000:
+                        self.logger.info(
+                            "Large collection detected, using optimized loading parameters"
+                        )
+                        self.collection.load(
+                            replica_number=1, timeout=1200
+                        )  # 20 minute timeout for large collections
+                        max_wait_time = (
+                            1800  # 30 minutes max wait for large collections
+                        )
+                    else:
+                        self.collection.load(timeout=300)  # 5 minute timeout
+                        max_wait_time = 600  # 10 minutes max wait
+
+                    # Wait for the collection to be fully loaded
+                    start_wait = time.time()
+
+                    while time.time() - start_wait < max_wait_time:
+                        load_state = utility.load_state(self.collection.name)
+                        if load_state == LoadState.Loaded:
+                            self.logger.info(
+                                "Collection successfully loaded into memory"
+                            )
+                            load_success = True
+                            break
+                        elif load_state == LoadState.Loading:
+                            try:
+                                progress = utility.loading_progress(
+                                    self.collection.name
+                                )
+                                self.logger.info(f"Loading progress: {progress}%")
+                            except Exception as e:
+                                self.logger.warning(
+                                    f"Could not get loading progress: {e}"
+                                )
+                            time.sleep(10)  # Check every 10 seconds
+                        else:
+                            self.logger.warning(f"Unexpected load state: {load_state}")
+                            break
+
+                    if not load_success:
+                        self.logger.warning(
+                            f"Collection loading timed out after {max_wait_time} seconds"
+                        )
+                        retry_count += 1
+                        if retry_count < max_retries:
+                            self.logger.info("Retrying collection load...")
+                            time.sleep(30)  # Wait before retry
+
+                except Exception as e:
+                    self.logger.error(f"Error loading collection: {e}")
+                    retry_count += 1
+                    if retry_count < max_retries:
+                        self.logger.info("Retrying after error...")
+                        time.sleep(30)
+                    else:
+                        raise
+
+            if not load_success:
+                self.logger.error("Failed to load collection after all retries")
+                return False
+
+            # Generate query vectors
+            query_count = 1000
+            _, query_vectors = self.generate_vectors(query_count)
+
+            query_results = {}
+
+            # Test different top-k values
+            topk_values = []
+            if self.config.get("benchmark_query_topk_1", False):
+                topk_values.append(1)
+            if self.config.get("benchmark_query_topk_10", False):
+                topk_values.append(10)
+            if self.config.get("benchmark_query_topk_100", False):
+                topk_values.append(100)
+
+            # Test different batch sizes
+            batch_sizes = []
+            if self.config.get("benchmark_batch_1", False):
+                batch_sizes.append(1)
+            if self.config.get("benchmark_batch_10", False):
+                batch_sizes.append(10)
+            if self.config.get("benchmark_batch_100", False):
+                batch_sizes.append(100)
+
+            for topk in topk_values:
+                query_results[f"topk_{topk}"] = {}
+
+                search_params = {"metric_type": "L2", "params": {}}
+                if self.config["index_type"] == "HNSW":
+                    # For HNSW, ef must be at least as large as topk
+                    default_ef = self.config.get("index_hnsw_ef", 64)
+                    search_params["params"]["ef"] = max(default_ef, topk)
+                elif self.config["index_type"] == "IVF_FLAT":
+                    search_params["params"]["nprobe"] = self.config.get(
+                        "index_ivf_nprobe", 16
+                    )
+
+                for batch_size in batch_sizes:
+                    self.logger.info(f"Testing topk={topk}, batch_size={batch_size}")
+
+                    times = []
+                    for i in range(
+                        0, min(query_count, 100), batch_size
+                    ):  # Limit to 100 queries for speed
+                        batch_vectors = query_vectors[i : i + batch_size]
+
+                        start_time = time.time()
+                        results = self.collection.search(
+                            batch_vectors,
+                            "vector",
+                            search_params,
+                            limit=topk,
+                            output_fields=["id"],
+                        )
+                        query_time = time.time() - start_time
+                        times.append(query_time)
+
+                    avg_time = sum(times) / len(times)
+                    qps = batch_size / avg_time
+
+                    query_results[f"topk_{topk}"][f"batch_{batch_size}"] = {
+                        "average_time_seconds": avg_time,
+                        "queries_per_second": qps,
+                        "total_queries": len(times) * batch_size,
+                    }
+
+            self.results["query_performance"] = query_results
+            self.logger.info("Query benchmark completed")
+            return True
+
+        except Exception as e:
+            self.logger.error(f"Query benchmark failed: {e}")
+            return False
+
+    def run_benchmark(self) -> bool:
+        """Run complete benchmark suite"""
+        self.logger.info("Starting Milvus benchmark suite...")
+
+        # Detect filesystem information
+        fs_info = self.get_filesystem_info("/data")
+        self.results["system_info"] = fs_info
+        # Also add filesystem at top level for compatibility with existing graphs
+        self.results["filesystem"] = fs_info["filesystem"]
+        self.logger.info(
+            f"Detected filesystem: {fs_info['filesystem']} at {fs_info['mount_point']}"
+        )
+
+        if not self.connect_to_milvus():
+            return False
+
+        if not self.create_collection():
+            return False
+
+        if not self.benchmark_insert():
+            return False
+
+        if not self.benchmark_index_creation():
+            return False
+
+        if not self.benchmark_queries():
+            return False
+
+        self.logger.info("Benchmark suite completed successfully")
+        return True
+
+    def save_results(self, output_file: str):
+        """Save benchmark results to file"""
+        try:
+            with open(output_file, "w") as f:
+                json.dump(self.results, f, indent=2)
+            self.logger.info(f"Results saved to {output_file}")
+        except Exception as e:
+            self.logger.error(f"Failed to save results: {e}")
+
+
+def main():
+    parser = argparse.ArgumentParser(description="Milvus Vector Database Benchmark")
+    parser.add_argument("--config", required=True, help="JSON configuration file")
+    parser.add_argument("--output", required=True, help="Output results file")
+
+    args = parser.parse_args()
+
+    # Load configuration
+    try:
+        with open(args.config, "r") as f:
+            config = json.load(f)
+    except Exception as e:
+        print(f"Error loading config file: {e}")
+        return 1
+
+    # Run benchmark
+    benchmark = MilvusBenchmark(config)
+    success = benchmark.run_benchmark()
+
+    # Save results
+    benchmark.save_results(args.output)
+
+    return 0 if success else 1
+
+
+if __name__ == "__main__":
+    sys.exit(main())
diff --git a/playbooks/roles/ai_run_benchmarks/tasks/main.yml b/playbooks/roles/ai_run_benchmarks/tasks/main.yml
new file mode 100644
index 00000000..81fd5a87
--- /dev/null
+++ b/playbooks/roles/ai_run_benchmarks/tasks/main.yml
@@ -0,0 +1,181 @@
+---
+- name: Import optional extra_args file
+  ansible.builtin.include_vars: "{{ item }}"
+  # TODO: Review - was ignore_errors: true
+  failed_when: false  # Always succeed - review this condition
+  with_items:
+    - "../extra_vars.yaml"
+  tags: vars
+
+- name: Clean up any stale lock files from previous runs (force mode)
+  ansible.builtin.file:
+    path: "{{ ai_benchmark_results_dir }}/.benchmark.lock"
+    state: absent
+  failed_when: false
+  when: ai_benchmark_force_unlock | default(false) | bool
+  tags: cleanup
+
+- name: Check for ongoing benchmark processes
+  ansible.builtin.shell: |
+    pgrep -f "python.*milvus_benchmark\.py" | xargs -r ps -p 2>/dev/null | grep -v "sh -c" | grep -v grep | wc -l || echo "0"
+  register: benchmark_check
+  changed_when: false
+  failed_when: false
+
+- name: Fail if benchmark is already running
+  ansible.builtin.fail:
+    msg: |
+      ERROR: A benchmark is already running on this system!
+      Number of benchmark processes: {{ benchmark_check.stdout }}
+      Please wait for the current benchmark to complete or terminate it before starting a new one.
+  when: benchmark_check.stdout | int > 0
+
+- name: Ensure benchmark results directory exists
+  ansible.builtin.file:
+    path: "{{ ai_benchmark_results_dir }}"
+    state: directory
+    mode: '0755'
+
+- name: Check for benchmark lock file
+  ansible.builtin.stat:
+    path: "{{ ai_benchmark_results_dir }}/.benchmark.lock"
+  register: lock_file
+
+- name: Check if lock file is stale (older than 5 minutes)
+  ansible.builtin.set_fact:
+    lock_is_stale: "{{ (ansible_date_time.epoch | int - lock_file.stat.mtime | default(0) | int) > 300 }}"
+  when: lock_file.stat.exists
+
+- name: Remove stale lock file
+  ansible.builtin.file:
+    path: "{{ ai_benchmark_results_dir }}/.benchmark.lock"
+    state: absent
+  when:
+    - lock_file.stat.exists
+    - lock_is_stale|default(false)|bool
+
+- name: Fail if recent benchmark lock exists
+  ansible.builtin.fail:
+    msg: |
+      ERROR: Benchmark lock file exists at {{ ai_benchmark_results_dir }}/.benchmark.lock
+      This indicates a benchmark may be in progress or was terminated abnormally.
+      Lock file age: {{ (ansible_date_time.epoch | int - lock_file.stat.mtime | default(0) | int) }} seconds
+      If you're sure no benchmark is running, remove the lock file manually.
+  when:
+    - lock_file.stat.exists
+    - not lock_is_stale|default(false)|bool
+
+- name: Run benchmark with lock management
+  block:
+    - name: Create benchmark lock file
+      ansible.builtin.file:
+        path: "{{ ai_benchmark_results_dir }}/.benchmark.lock"
+        state: touch
+        mode: '0644'
+      register: lock_created
+
+    - name: Create benchmark working directory
+      ansible.builtin.file:
+        path: "{{ ai_benchmark_results_dir }}/workdir"
+        state: directory
+        mode: '0755'
+
+    - name: Copy benchmark script
+      ansible.builtin.copy:
+        src: milvus_benchmark.py
+        dest: "{{ ai_benchmark_results_dir }}/workdir/milvus_benchmark.py"
+        mode: '0755'
+
+    - name: Ensure Python venv package is installed
+      ansible.builtin.package:
+        name:
+          - python3-venv
+          - python3-pip
+          - python3-dev
+        state: present
+      become: true
+
+    - name: Clean up any globally installed packages (if accidentally installed)
+      ansible.builtin.shell: |
+        pip3 uninstall -y pymilvus numpy 2>/dev/null || true
+      become: true
+      changed_when: false
+      failed_when: false
+
+    - name: Check if virtual environment exists
+      ansible.builtin.stat:
+        path: "{{ ai_benchmark_results_dir }}/venv/bin/python"
+      register: venv_exists
+
+    - name: Verify virtual environment has required packages
+      block:
+        - name: Check if pymilvus is installed in virtual environment
+          ansible.builtin.command: "{{ ai_benchmark_results_dir }}/venv/bin/python -c 'import pymilvus; print(pymilvus.__version__)'"
+          register: pymilvus_check
+          changed_when: false
+          failed_when: false
+
+        - name: Display current pymilvus version
+          ansible.builtin.debug:
+            msg: "Current pymilvus version: {{ pymilvus_check.stdout }}"
+          when: pymilvus_check.rc == 0
+
+        - name: Virtual environment is not properly configured
+          ansible.builtin.debug:
+            msg: "Virtual environment at {{ ai_benchmark_results_dir }}/venv is missing or incomplete. Please run 'make ai' first to set up the environment."
+          when: not venv_exists.stat.exists or pymilvus_check.rc != 0
+
+        - name: Fail if virtual environment is not ready
+          ansible.builtin.fail:
+            msg: "Virtual environment is not properly configured. Please run 'make ai' to set up the environment first."
+          when: not venv_exists.stat.exists or pymilvus_check.rc != 0
+
+    - name: List installed packages in virtual environment for verification
+      ansible.builtin.command: "{{ ai_benchmark_results_dir }}/venv/bin/pip list"
+      register: pip_list
+      changed_when: false
+
+    - name: Display installed packages
+      ansible.builtin.debug:
+        msg: "Installed packages in venv: {{ pip_list.stdout }}"
+
+    - name: Generate benchmark configuration
+      ansible.builtin.template:
+        src: benchmark_config.json.j2
+        dest: "{{ ai_benchmark_results_dir }}/workdir/benchmark_config.json"
+        mode: '0644'
+
+    - name: Wait for Milvus to be ready
+      ansible.builtin.wait_for:
+        host: "localhost"
+        port: "{{ ai_vector_db_milvus_port }}"
+        delay: 10
+        timeout: 300
+
+    - name: Run Milvus benchmark for iteration {{ item }}
+      ansible.builtin.command: >
+        {{ ai_benchmark_results_dir }}/venv/bin/python
+        {{ ai_benchmark_results_dir }}/workdir/milvus_benchmark.py
+        --config {{ ai_benchmark_results_dir }}/workdir/benchmark_config.json
+        --output {{ ai_benchmark_results_dir }}/results_{{ ansible_hostname }}_{{ item }}.json
+      register: benchmark_result
+      with_sequence: start=1 end={{ ai_benchmark_iterations }}
+      tags: run_benchmark
+
+    - name: Display benchmark results
+      ansible.builtin.debug:
+        var: benchmark_result
+      when: benchmark_result is defined
+
+  always:
+    - name: Remove benchmark lock file
+      ansible.builtin.file:
+        path: "{{ ai_benchmark_results_dir }}/.benchmark.lock"
+        state: absent
+      failed_when: false
+      when: lock_created is defined and lock_created.changed
+
+    - name: Ensure lock file is removed (fallback)
+      ansible.builtin.shell: rm -f {{ ai_benchmark_results_dir }}/.benchmark.lock
+      failed_when: false
+      when: lock_created is defined
diff --git a/playbooks/roles/ai_run_benchmarks/templates/benchmark_config.json.j2 b/playbooks/roles/ai_run_benchmarks/templates/benchmark_config.json.j2
new file mode 100644
index 00000000..9983fc16
--- /dev/null
+++ b/playbooks/roles/ai_run_benchmarks/templates/benchmark_config.json.j2
@@ -0,0 +1,24 @@
+{
+  "host": "localhost",
+  "port": {{ ai_vector_db_milvus_port }},
+  "database_name": "default",
+  "collection_name": "{{ ai_vector_db_milvus_collection_name }}",
+  "vector_dataset_size": {{ ai_vector_db_milvus_dataset_size }},
+  "vector_dimensions": {{ ai_vector_db_milvus_dimension }},
+  "benchmark_runtime": {{ ai_benchmark_runtime|default(60) }},
+  "benchmark_warmup_time": {{ ai_benchmark_warmup_time|default(10) }},
+  "benchmark_query_topk_1": {{ ai_benchmark_query_topk_1|default(true)|lower }},
+  "benchmark_query_topk_10": {{ ai_benchmark_query_topk_10|default(true)|lower }},
+  "benchmark_query_topk_100": {{ ai_benchmark_query_topk_100|default(true)|lower }},
+  "benchmark_batch_1": {{ ai_benchmark_batch_1|default(true)|lower }},
+  "benchmark_batch_10": {{ ai_benchmark_batch_10|default(true)|lower }},
+  "benchmark_batch_100": {{ ai_benchmark_batch_100|default(true)|lower }},
+  "batch_size": {{ ai_vector_db_milvus_batch_size }},
+  "num_queries": {{ ai_vector_db_milvus_num_queries }},
+  "index_type": "{{ ai_index_type|default('HNSW') }}",
+  "index_hnsw_m": {{ ai_index_hnsw_m|default(16) }},
+  "index_hnsw_ef_construction": {{ ai_index_hnsw_ef_construction|default(200) }},
+  "index_hnsw_ef": {{ ai_index_hnsw_ef|default(64) }}{% if ai_index_type|default('HNSW') == "IVF_FLAT" %},
+  "index_ivf_nlist": {{ ai_index_ivf_nlist|default(1024) }},
+  "index_ivf_nprobe": {{ ai_index_ivf_nprobe|default(16) }}{% endif %}
+}
diff --git a/playbooks/roles/ai_setup/tasks/main.yml b/playbooks/roles/ai_setup/tasks/main.yml
new file mode 100644
index 00000000..b894c964
--- /dev/null
+++ b/playbooks/roles/ai_setup/tasks/main.yml
@@ -0,0 +1,115 @@
+---
+- name: Import optional extra_args file
+  ansible.builtin.include_vars: "{{ item }}"
+  # TODO: Review - was ignore_errors: true
+  failed_when: false  # Always succeed - review this condition
+  with_items:
+    - "../extra_vars.yaml"
+  tags: vars
+
+- name: Create Docker storage directories
+  ansible.builtin.file:
+    path: "{{ item }}"
+    state: directory
+    mode: '0755'
+  loop:
+    - "{{ ai_docker_data_path }}"
+    - "{{ ai_docker_etcd_data_path }}"
+    - "{{ ai_docker_minio_data_path }}"
+  when: ai_milvus_docker | bool
+  become: true
+
+- name: Create Docker network for Milvus
+  community.docker.docker_network:
+    name: "{{ ai_docker_network_name }}"
+    state: present
+  when: ai_milvus_docker | bool
+
+- name: Start etcd container
+  community.docker.docker_container:
+    name: "{{ ai_etcd_container_name }}"
+    image: "{{ ai_etcd_container_image_string }}"
+    state: started
+    restart_policy: unless-stopped
+    networks:
+      - name: "{{ ai_docker_network_name }}"
+    ports:
+      - "{{ ai_etcd_client_port }}:2379"
+      - "{{ ai_etcd_peer_port }}:2380"
+    env:
+      ETCD_AUTO_COMPACTION_MODE: revision
+      ETCD_AUTO_COMPACTION_RETENTION: "1000"
+      ETCD_QUOTA_BACKEND_BYTES: "4294967296"
+      ETCD_SNAPSHOT_COUNT: "50000"
+    command: >
+      etcd -advertise-client-urls=http://127.0.0.1:2379
+      -listen-client-urls http://0.0.0.0:2379
+      --data-dir /etcd
+    volumes:
+      - "{{ ai_docker_etcd_data_path }}:/etcd"
+    memory: "{{ ai_etcd_memory_limit }}"
+  when: ai_milvus_docker | bool
+
+- name: Start MinIO container
+  community.docker.docker_container:
+    name: "{{ ai_minio_container_name }}"
+    image: "{{ ai_minio_container_image_string }}"
+    state: started
+    restart_policy: unless-stopped
+    networks:
+      - name: "{{ ai_docker_network_name }}"
+    ports:
+      - "{{ ai_minio_api_port }}:9000"
+      - "{{ ai_minio_console_port }}:9001"
+    env:
+      MINIO_ACCESS_KEY: "{{ ai_minio_access_key }}"
+      MINIO_SECRET_KEY: "{{ ai_minio_secret_key }}"
+    ansible.builtin.command: server /minio_data --console-address ":9001"
+    volumes:
+      - "{{ ai_docker_minio_data_path }}:/minio_data"
+    memory: "{{ ai_minio_memory_limit }}"
+  when: ai_milvus_docker | bool
+
+- name: Wait for etcd to be ready
+  ansible.builtin.wait_for:
+    host: localhost
+    port: "{{ ai_etcd_client_port }}"
+    timeout: 60
+  when: ai_milvus_docker | bool
+
+- name: Wait for MinIO to be ready
+  ansible.builtin.wait_for:
+    host: localhost
+    port: "{{ ai_minio_api_port }}"
+    timeout: 60
+  when: ai_milvus_docker | bool
+
+- name: Start Milvus container
+  community.docker.docker_container:
+    name: "{{ ai_milvus_container_name }}"
+    image: "{{ ai_milvus_container_image_string }}"
+    state: started
+    restart_policy: unless-stopped
+    networks:
+      - name: "{{ ai_docker_network_name }}"
+    ports:
+      - "{{ ai_milvus_port }}:19530"
+      - "{{ ai_milvus_web_ui_port }}:9091"
+    env:
+      ETCD_ENDPOINTS: "{{ ai_etcd_container_name }}:2379"
+      MINIO_ADDRESS: "{{ ai_minio_container_name }}:9000"
+      MINIO_ACCESS_KEY: "{{ ai_minio_access_key }}"
+      MINIO_SECRET_KEY: "{{ ai_minio_secret_key }}"
+    volumes:
+      - "{{ ai_docker_data_path }}:/var/lib/milvus"
+    memory: "{{ ai_milvus_memory_limit }}"
+    cpus: "{{ ai_milvus_cpu_limit }}"
+    ansible.builtin.command: milvus run standalone
+  when: ai_milvus_docker | bool
+
+- name: Wait for Milvus to be ready
+  ansible.builtin.wait_for:
+    host: localhost
+    port: "{{ ai_milvus_port }}"
+    timeout: 120
+  when: ai_milvus_docker | bool
diff --git a/playbooks/roles/ai_uninstall/tasks/main.yml b/playbooks/roles/ai_uninstall/tasks/main.yml
new file mode 100644
index 00000000..4d35465b
--- /dev/null
+++ b/playbooks/roles/ai_uninstall/tasks/main.yml
@@ -0,0 +1,62 @@
+---
+- name: Import optional extra_args file
+  ansible.builtin.include_vars: "{{ item }}"
+  # TODO: Review - was ignore_errors: true
+  failed_when: false  # Always succeed - review this condition
+  with_items:
+    - "../extra_vars.yaml"
+  tags: vars
+
+- name: Stop and remove Milvus container
+  community.docker.docker_container:
+    name: "{{ ai_milvus_container_name }}"
+    state: absent
+  when: ai_milvus_docker | bool
+  # TODO: Review - was ignore_errors: true
+  failed_when: false  # Always succeed - review this condition
+
+- name: Stop and remove MinIO container
+  community.docker.docker_container:
+    name: "{{ ai_minio_container_name }}"
+    state: absent
+  when: ai_milvus_docker | bool
+  # TODO: Review - was ignore_errors: true
+  failed_when: false  # Always succeed - review this condition
+
+- name: Stop and remove etcd container
+  community.docker.docker_container:
+    name: "{{ ai_etcd_container_name }}"
+    state: absent
+  when: ai_milvus_docker | bool
+  # TODO: Review - was ignore_errors: true
+  failed_when: false  # Always succeed - review this condition
+
+- name: Remove Docker network
+  community.docker.docker_network:
+    name: "{{ ai_docker_network_name }}"
+    state: absent
+  when: ai_milvus_docker | bool
+  # TODO: Review - was ignore_errors: true
+  failed_when: false  # Always succeed - review this condition
+
+- name: Clean up Python packages (optional)
+  ansible.builtin.pip:
+    name:
+      - pymilvus
+      - matplotlib
+      - seaborn
+      - plotly
+    state: absent
+  when: ai_benchmark_enable_graphing | bool
+  # TODO: Review - was ignore_errors: true
+  failed_when: false  # Always succeed - review this condition
+
+- name: Display uninstall completion message
+  ansible.builtin.debug:
+    msg: |
+      AI benchmark components uninstalled successfully.
+      Data directories preserved:
+      - {{ ai_docker_data_path }}
+      - {{ ai_benchmark_results_dir }}
+
+      To completely remove all data, run the ai-destroy target.
diff --git a/playbooks/roles/gen_hosts/tasks/main.yml b/playbooks/roles/gen_hosts/tasks/main.yml
index ec11d039..4b35d9f6 100644
--- a/playbooks/roles/gen_hosts/tasks/main.yml
+++ b/playbooks/roles/gen_hosts/tasks/main.yml
@@ -381,6 +381,20 @@
     - workflows_reboot_limit
     - ansible_hosts_template.stat.exists
 
+- name: Generate the Ansible hosts file for a dedicated AI setup
+  tags: ['hosts']
+  ansible.builtin.template:
+    src: "{{ kdevops_hosts_template }}"
+    dest: "{{ ansible_cfg_inventory }}"
+    force: true
+    trim_blocks: True
+    lstrip_blocks: True
+    mode: '0644'
+  when:
+    - kdevops_workflows_dedicated_workflow
+    - kdevops_workflow_enable_ai
+    - ansible_hosts_template.stat.exists
+
 - name: Verify if final host file exists
   ansible.builtin.stat:
     path: "{{ ansible_cfg_inventory }}"
diff --git a/playbooks/roles/gen_hosts/templates/hosts.j2 b/playbooks/roles/gen_hosts/templates/hosts.j2
index 6d83191d..cdcd1883 100644
--- a/playbooks/roles/gen_hosts/templates/hosts.j2
+++ b/playbooks/roles/gen_hosts/templates/hosts.j2
@@ -77,6 +77,114 @@ ansible_python_interpreter = "{{ kdevops_python_interpreter }}"
 
 [service:vars]
 ansible_python_interpreter = "{{ kdevops_python_interpreter }}"
+{% elif kdevops_workflow_enable_ai %}
+{% if ai_enable_multifs_testing|default(false)|bool %}
+{# Multi-filesystem section-based hosts #}
+[all]
+localhost ansible_connection=local
+{% for node in all_generic_nodes %}
+{{ node }}
+{% endfor %}
+
+[all:vars]
+ansible_python_interpreter = "{{ kdevops_python_interpreter }}"
+
+[baseline]
+{% for node in all_generic_nodes %}
+{% if not node.endswith('-dev') %}
+{{ node }}
+{% endif %}
+{% endfor %}
+
+[baseline:vars]
+ansible_python_interpreter = "{{ kdevops_python_interpreter }}"
+
+{% if kdevops_baseline_and_dev %}
+[dev]
+{% for node in all_generic_nodes %}
+{% if node.endswith('-dev') %}
+{{ node }}
+{% endif %}
+{% endfor %}
+
+[dev:vars]
+ansible_python_interpreter = "{{ kdevops_python_interpreter }}"
+
+{% endif %}
+[ai]
+{% for node in all_generic_nodes %}
+{{ node }}
+{% endfor %}
+
+[ai:vars]
+ansible_python_interpreter = "{{ kdevops_python_interpreter }}"
+
+{% set fs_configs = [] %}
+{% for node in all_generic_nodes %}
+{% set node_parts = node.split('-') %}
+{% if node_parts|length >= 3 %}
+{% set fs_type = node_parts[2] %}
+{% set fs_config = node_parts[3:] | select('ne', 'dev') | join('_') %}
+{% set fs_group = fs_type + '_' + fs_config if fs_config else fs_type %}
+{% if fs_group not in fs_configs %}
+{% set _ = fs_configs.append(fs_group) %}
+{% endif %}
+{% endif %}
+{% endfor %}
+
+{% for fs_group in fs_configs %}
+[ai_{{ fs_group }}]
+{% for node in all_generic_nodes %}
+{% set node_parts = node.split('-') %}
+{% if node_parts|length >= 3 %}
+{% set fs_type = node_parts[2] %}
+{% set fs_config = node_parts[3:] | select('ne', 'dev') | join('_') %}
+{% set node_fs_group = fs_type + '_' + fs_config if fs_config else fs_type %}
+{% if node_fs_group == fs_group %}
+{{ node }}
+{% endif %}
+{% endif %}
+{% endfor %}
+
+[ai_{{ fs_group }}:vars]
+ansible_python_interpreter = "{{ kdevops_python_interpreter }}"
+
+{% endfor %}
+{% else %}
+{# Single-node AI hosts #}
+[all]
+localhost ansible_connection=local
+{{ kdevops_host_prefix }}-ai
+{% if kdevops_baseline_and_dev %}
+{{ kdevops_host_prefix }}-ai-dev
+{% endif %}
+
+[all:vars]
+ansible_python_interpreter = "{{ kdevops_python_interpreter }}"
+
+[baseline]
+{{ kdevops_host_prefix }}-ai
+
+[baseline:vars]
+ansible_python_interpreter = "{{ kdevops_python_interpreter }}"
+
+{% if kdevops_baseline_and_dev %}
+[dev]
+{{ kdevops_host_prefix }}-ai-dev
+
+[dev:vars]
+ansible_python_interpreter = "{{ kdevops_python_interpreter }}"
+
+{% endif %}
+[ai]
+{{ kdevops_host_prefix }}-ai
+{% if kdevops_baseline_and_dev %}
+{{ kdevops_host_prefix }}-ai-dev
+{% endif %}
+
+[ai:vars]
+ansible_python_interpreter = "{{ kdevops_python_interpreter }}"
+{% endif %}
 {% else %}
 [all]
 localhost ansible_connection=local
diff --git a/playbooks/roles/gen_nodes/tasks/main.yml b/playbooks/roles/gen_nodes/tasks/main.yml
index a8598481..d54977be 100644
--- a/playbooks/roles/gen_nodes/tasks/main.yml
+++ b/playbooks/roles/gen_nodes/tasks/main.yml
@@ -642,6 +642,40 @@
     - ansible_nodes_template.stat.exists
 
 
+- name: Generate the AI kdevops nodes file using {{ kdevops_nodes_template }} as jinja2 source template
+  tags: ['hosts']
+  vars:
+    node_template: "{{ kdevops_nodes_template | basename }}"
+    nodes: "{{ [kdevops_host_prefix + '-ai'] }}"
+    all_generic_nodes: "{{ [kdevops_host_prefix + '-ai'] }}"
+  ansible.builtin.template:
+    src: "{{ node_template }}"
+    dest: "{{ topdir_path }}/{{ kdevops_nodes }}"
+    force: true
+    mode: '0644'
+  when:
+    - kdevops_workflows_dedicated_workflow
+    - kdevops_workflow_enable_ai
+    - ansible_nodes_template.stat.exists
+    - not kdevops_baseline_and_dev
+
+- name: Generate the AI kdevops nodes file with dev hosts using {{ kdevops_nodes_template }} as jinja2 source template
+  tags: ['hosts']
+  vars:
+    node_template: "{{ kdevops_nodes_template | basename }}"
+    nodes: "{{ [kdevops_host_prefix + '-ai', kdevops_host_prefix + '-ai-dev'] }}"
+    all_generic_nodes: "{{ [kdevops_host_prefix + '-ai', kdevops_host_prefix + '-ai-dev'] }}"
+  ansible.builtin.template:
+    src: "{{ node_template }}"
+    dest: "{{ topdir_path }}/{{ kdevops_nodes }}"
+    force: true
+    mode: '0644'
+  when:
+    - kdevops_workflows_dedicated_workflow
+    - kdevops_workflow_enable_ai
+    - ansible_nodes_template.stat.exists
+    - kdevops_baseline_and_dev
+
 - name: Get the control host's timezone
   ansible.builtin.command: "timedatectl show -p Timezone --value"
   register: kdevops_host_timezone
diff --git a/playbooks/roles/milvus/README.md b/playbooks/roles/milvus/README.md
new file mode 100644
index 00000000..e6571167
--- /dev/null
+++ b/playbooks/roles/milvus/README.md
@@ -0,0 +1,181 @@
+# Milvus Vector Database Role
+
+This Ansible role manages the Milvus vector database for AI benchmarking in kdevops.
+
+## Overview
+
+Milvus is an open-source vector database designed for embedding similarity search
+and AI applications. This role provides:
+
+- Docker-based deployment with etcd and MinIO
+- Comprehensive performance benchmarking
+- Scalable testing from small to large datasets
+- Multiple index type support (HNSW, IVF_FLAT, etc.)
+
+## Role Variables
+
+### Required Variables
+
+- `ai_vector_db_milvus_enable`: Enable/disable Milvus deployment
+- `ai_vector_db_milvus_dimension`: Vector dimension size (default: 768)
+- `ai_vector_db_milvus_dataset_size`: Number of vectors to test (default: 1000000)
+
+### Docker Configuration
+
+- `ai_vector_db_milvus_container_name`: Milvus container name
+- `ai_vector_db_milvus_port`: Milvus service port (default: 19530)
+- `ai_vector_db_milvus_memory_limit`: Container memory limit
+- `ai_vector_db_milvus_cpu_limit`: Container CPU limit
+
+### Benchmark Configuration
+
+- `ai_vector_db_milvus_batch_size`: Insertion batch size
+- `ai_vector_db_milvus_num_queries`: Number of search queries
+- `ai_benchmark_iterations`: Number of benchmark iterations
+- `ai_benchmark_results_dir`: Directory for storing results
+
+## Dependencies
+
+For Docker deployment:
+- Docker Engine
+- docker-compose Python package
+
+For benchmarking:
+- Python 3.8+
+- pymilvus
+- numpy
+
+## Directory Structure
+
+```
+milvus/
+├── defaults/
+│   └── main.yml           # Default variables
+├── tasks/
+│   ├── main.yml          # Task router based on action
+│   ├── install_docker.yml # Docker installation tasks
+│   ├── setup.yml         # Environment setup
+│   ├── benchmark.yml     # Benchmark execution
+│   └── destroy.yml       # Cleanup tasks
+├── templates/
+│   ├── docker-compose.yml.j2      # Docker compose configuration
+│   ├── benchmark_config.json.j2   # Benchmark parameters
+│   └── test_connection.py.j2      # Connection test script
+├── files/
+│   ├── milvus_benchmark.py        # Main benchmark script
+│   └── milvus_utils.py           # Utility functions
+└── meta/
+    └── main.yml                   # Role metadata
+```
+
+## Usage Examples
+
+### Basic Installation
+
+```yaml
+- name: Install Milvus
+  hosts: ai
+  roles:
+    - role: milvus
+      vars:
+        action: install
+```
+
+### Run Benchmarks
+
+```yaml
+- name: Benchmark Milvus
+  hosts: ai
+  roles:
+    - role: milvus
+      vars:
+        action: benchmark
+        ai_vector_db_milvus_dataset_size: 1000000
+        ai_vector_db_milvus_dimension: 768
+```
+
+### Cleanup
+
+```yaml
+- name: Destroy Milvus
+  hosts: ai
+  roles:
+    - role: milvus
+      vars:
+        action: destroy
+```
+
+## Benchmark Metrics
+
+The benchmark collects the following metrics:
+
+1. **Insertion Performance**
+   - Total insertion time
+   - Average throughput (vectors/second)
+   - Batch-level statistics
+
+2. **Search Performance**
+   - Query latency (ms)
+   - Queries per second (QPS)
+   - Top-K accuracy
+
+3. **Index Performance**
+   - Index build time
+   - Index memory usage
+   - Search performance by index type
+
+## Results
+
+Benchmark results are stored in JSON format:
+
+```json
+{
+  "timestamp": "2024-01-20T10:30:00",
+  "configuration": {
+    "dataset_size": 1000000,
+    "dimension": 768,
+    "index_type": "HNSW"
+  },
+  "insertion": {
+    "total_time": 120.5,
+    "throughput": 8298.75
+  },
+  "search": {
+    "avg_latency": 2.3,
+    "qps": 434.78
+  }
+}
+```
+
+## Troubleshooting
+
+### Container Issues
+
+Check container status:
+```bash
+docker ps -a | grep milvus
+docker logs milvus-ai-benchmark
+```
+
+### Connection Issues
+
+Test connectivity:
+```bash
+python3 /tmp/test_milvus_connection.py
+```
+
+### Performance Issues
+
+For large datasets:
+- Increase memory limits in Kconfig
+- Use SSD storage for better performance
+- Adjust batch sizes based on available memory
+
+## Contributing
+
+When modifying this role:
+
+1. Follow Ansible best practices
+2. Update documentation for new features
+3. Test with both small and large datasets
+4. Ensure idempotency of all tasks
diff --git a/playbooks/roles/milvus/defaults/main.yml b/playbooks/roles/milvus/defaults/main.yml
new file mode 100644
index 00000000..a002196c
--- /dev/null
+++ b/playbooks/roles/milvus/defaults/main.yml
@@ -0,0 +1,74 @@
+---
+# Milvus vector database defaults
+ai_vector_db_milvus_version: "2.3.0"
+ai_vector_db_milvus_docker: true
+ai_vector_db_milvus_compose_version: "v2.3.0"
+
+# Deployment options
+ai_vector_db_milvus_data_dir: "/data/milvus"
+ai_vector_db_milvus_config_dir: "/etc/milvus"
+ai_vector_db_milvus_log_dir: "/var/log/milvus"
+
+# Network configuration
+ai_vector_db_milvus_port: 19530
+ai_vector_db_milvus_grpc_port: 19530
+ai_vector_db_milvus_metrics_port: 9091
+ai_vector_db_milvus_web_ui_port: 9091
+ai_vector_db_milvus_etcd_client_port: 2379
+ai_vector_db_milvus_minio_api_port: 9000
+ai_vector_db_milvus_minio_console_port: 9001
+
+# Resource limits
+ai_vector_db_milvus_memory_limit: "8Gi"
+ai_vector_db_milvus_cpu_limit: "4"
+
+# Storage backend
+ai_vector_db_milvus_storage_type: "local"  # local, s3, minio
+ai_vector_db_milvus_storage_path: "{{ ai_vector_db_milvus_data_dir }}/storage"
+
+# Index configuration
+ai_vector_db_milvus_index_type: "IVF_FLAT"
+ai_vector_db_milvus_metric_type: "L2"
+ai_vector_db_milvus_nlist: 1024
+
+# Collection defaults
+ai_vector_db_milvus_default_collection: "benchmark_collection"
+ai_vector_db_milvus_default_dim: 768
+ai_vector_db_milvus_default_shards: 2
+
+# Benchmark configuration
+ai_vector_db_milvus_benchmark_enable: true
+ai_vector_db_milvus_benchmark_datasets:
+  - sift1m
+  - gist1m
+ai_vector_db_milvus_benchmark_batch_size: 10000
+ai_vector_db_milvus_benchmark_num_queries: 10000
+
+# Results and filesystem configuration
+ai_benchmark_results_dir: "/data/benchmark-results"
+ai_filesystem: "{{ kdevops_filesystem | default('xfs') }}"
+ai_data_device_path: "/data"
+ai_mkfs_opts: ""
+ai_mount_opts: "defaults"
+
+# Docker container configuration
+ai_vector_db_milvus_container_name: "milvus-standalone"
+ai_vector_db_milvus_etcd_container_name: "milvus-etcd"
+ai_vector_db_milvus_minio_container_name: "milvus-minio"
+
+# Docker image configuration
+ai_vector_db_milvus_container_image_string: "milvusdb/milvus:{{ ai_vector_db_milvus_version }}"
+ai_vector_db_milvus_etcd_container_image_string: "quay.io/coreos/etcd:v3.5.5"
+ai_vector_db_milvus_minio_container_image_string: "minio/minio:RELEASE.2023-03-20T20-16-18Z"
+
+# Docker volume paths
+ai_vector_db_milvus_docker_data_path: "{{ ai_vector_db_milvus_data_dir }}/volumes/milvus"
+ai_vector_db_milvus_docker_etcd_data_path: "{{ ai_vector_db_milvus_data_dir }}/volumes/etcd"
+ai_vector_db_milvus_docker_minio_data_path: "{{ ai_vector_db_milvus_data_dir }}/volumes/minio"
+
+# MinIO configuration
+ai_vector_db_milvus_minio_access_key: "minioadmin"
+ai_vector_db_milvus_minio_secret_key: "minioadmin"
+
+# Docker network
+ai_vector_db_milvus_docker_network_name: "milvus"
diff --git a/playbooks/roles/milvus/files/milvus_benchmark.py b/playbooks/roles/milvus/files/milvus_benchmark.py
new file mode 100644
index 00000000..bd7d5ead
--- /dev/null
+++ b/playbooks/roles/milvus/files/milvus_benchmark.py
@@ -0,0 +1,348 @@
+#!/usr/bin/env python3
+"""
+Milvus Vector Database Benchmark Script
+
+This script performs comprehensive benchmarking of Milvus vector database
+including vector insertion, index creation, and query performance testing.
+"""
+
+import json
+import numpy as np
+import time
+import argparse
+import sys
+from datetime import datetime
+from typing import List, Dict, Any, Tuple
+import logging
+
+try:
+    from pymilvus import (
+        connections,
+        Collection,
+        CollectionSchema,
+        FieldSchema,
+        DataType,
+        utility,
+    )
+except ImportError:
+    print("Error: pymilvus not installed. Please install with: pip install pymilvus")
+    sys.exit(1)
+
+
+class MilvusBenchmark:
+    def __init__(self, config: Dict[str, Any]):
+        self.config = config
+        self.collection = None
+        self.results = {
+            "config": config,
+            "timestamp": datetime.now().isoformat(),
+            "insert_performance": {},
+            "index_performance": {},
+            "query_performance": {},
+            "system_info": {},
+        }
+
+        # Setup logging
+        logging.basicConfig(
+            level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s"
+        )
+        self.logger = logging.getLogger(__name__)
+
+    def connect_to_milvus(self) -> bool:
+        """Connect to Milvus server"""
+        try:
+            connections.connect(
+                alias="default",
+                host=self.config["milvus"]["host"],
+                port=self.config["milvus"]["port"],
+            )
+            self.logger.info(
+                f"Connected to Milvus at {self.config['milvus']['host']}:{self.config['milvus']['port']}"
+            )
+            return True
+        except Exception as e:
+            self.logger.error(f"Failed to connect to Milvus: {e}")
+            return False
+
+    def create_collection(self) -> bool:
+        """Create benchmark collection"""
+        try:
+            collection_name = self.config["milvus"]["collection_name"]
+
+            # Drop collection if exists
+            if utility.has_collection(collection_name):
+                utility.drop_collection(collection_name)
+                self.logger.info(f"Dropped existing collection: {collection_name}")
+
+            # Define schema
+            fields = [
+                FieldSchema(
+                    name="id", dtype=DataType.INT64, is_primary=True, auto_id=False
+                ),
+                FieldSchema(
+                    name="vector",
+                    dtype=DataType.FLOAT_VECTOR,
+                    dim=self.config["milvus"]["dimension"],
+                ),
+            ]
+            schema = CollectionSchema(
+                fields,
+                f"Benchmark collection with {self.config['milvus']['dimension']}D vectors",
+            )
+
+            # Create collection
+            self.collection = Collection(collection_name, schema)
+            self.logger.info(f"Created collection: {collection_name}")
+            return True
+        except Exception as e:
+            self.logger.error(f"Failed to create collection: {e}")
+            return False
+
+    def generate_vectors(self, count: int) -> Tuple[List[int], List[List[float]]]:
+        """Generate random vectors for benchmarking"""
+        ids = list(range(count))
+        vectors = (
+            np.random.random((count, self.config["milvus"]["dimension"]))
+            .astype(np.float32)
+            .tolist()
+        )
+        return ids, vectors
+
+    def benchmark_insert(self) -> bool:
+        """Benchmark vector insertion performance"""
+        try:
+            self.logger.info("Starting insert benchmark...")
+
+            batch_size = self.config["benchmark"]["batch_size"]
+            total_vectors = self.config["benchmark"][
+                "num_queries"
+            ]  # Use num_queries as dataset size
+
+            insert_times = []
+
+            for i in range(0, total_vectors, batch_size):
+                current_batch_size = min(batch_size, total_vectors - i)
+
+                # Generate batch data
+                ids, vectors = self.generate_vectors(current_batch_size)
+                ids = [id + i for id in ids]  # Ensure unique IDs
+
+                # Insert batch
+                start_time = time.time()
+                self.collection.insert([ids, vectors])
+                insert_time = time.time() - start_time
+                insert_times.append(insert_time)
+
+                if (i // batch_size) % 100 == 0:
+                    self.logger.info(
+                        f"Inserted {i + current_batch_size}/{total_vectors} vectors"
+                    )
+
+            # Flush to ensure data is persisted
+            self.logger.info("Flushing collection...")
+            flush_start = time.time()
+            self.collection.flush()
+            flush_time = time.time() - flush_start
+
+            # Calculate statistics
+            total_insert_time = sum(insert_times)
+            avg_insert_time = total_insert_time / len(insert_times)
+            vectors_per_second = total_vectors / total_insert_time
+
+            self.results["insert_performance"] = {
+                "total_vectors": total_vectors,
+                "total_time_seconds": total_insert_time,
+                "flush_time_seconds": flush_time,
+                "average_batch_time_seconds": avg_insert_time,
+                "vectors_per_second": vectors_per_second,
+                "batch_size": batch_size,
+            }
+
+            self.logger.info(
+                f"Insert benchmark completed: {vectors_per_second:.2f} vectors/sec"
+            )
+            return True
+
+        except Exception as e:
+            self.logger.error(f"Insert benchmark failed: {e}")
+            return False
+
+    def benchmark_index_creation(self) -> bool:
+        """Benchmark index creation performance"""
+        try:
+            self.logger.info("Starting index creation benchmark...")
+
+            index_params = {
+                "metric_type": "L2",
+                "index_type": self.config["milvus"]["index_type"],
+                "params": {},
+            }
+
+            if self.config["milvus"]["index_type"] == "HNSW":
+                index_params["params"] = {
+                    "M": self.config.get("index_hnsw_m", 16),
+                    "efConstruction": self.config.get(
+                        "index_hnsw_ef_construction", 200
+                    ),
+                }
+            elif self.config["milvus"]["index_type"] == "IVF_FLAT":
+                index_params["params"] = {
+                    "nlist": self.config.get("index_ivf_nlist", 1024)
+                }
+
+            start_time = time.time()
+            self.collection.create_index("vector", index_params)
+            index_time = time.time() - start_time
+
+            self.results["index_performance"] = {
+                "index_type": self.config["milvus"]["index_type"],
+                "index_params": index_params,
+                "creation_time_seconds": index_time,
+            }
+
+            self.logger.info(f"Index creation completed in {index_time:.2f} seconds")
+            return True
+
+        except Exception as e:
+            self.logger.error(f"Index creation failed: {e}")
+            return False
+
+    def benchmark_queries(self) -> bool:
+        """Benchmark query performance"""
+        try:
+            self.logger.info("Starting query benchmark...")
+
+            # Load collection
+            self.collection.load()
+
+            # Generate query vectors
+            query_count = 1000
+            _, query_vectors = self.generate_vectors(query_count)
+
+            query_results = {}
+
+            # Test different top-k values
+            topk_values = []
+            if self.config.get("benchmark_query_topk_1", False):
+                topk_values.append(1)
+            if self.config.get("benchmark_query_topk_10", False):
+                topk_values.append(10)
+            if self.config.get("benchmark_query_topk_100", False):
+                topk_values.append(100)
+
+            # Test different batch sizes
+            batch_sizes = []
+            if self.config.get("benchmark_batch_1", False):
+                batch_sizes.append(1)
+            if self.config.get("benchmark_batch_10", False):
+                batch_sizes.append(10)
+            if self.config.get("benchmark_batch_100", False):
+                batch_sizes.append(100)
+
+            for topk in topk_values:
+                query_results[f"topk_{topk}"] = {}
+
+                search_params = {"metric_type": "L2", "params": {}}
+                if self.config["milvus"]["index_type"] == "HNSW":
+                    search_params["params"]["ef"] = self.config.get("index_hnsw_ef", 64)
+                elif self.config["milvus"]["index_type"] == "IVF_FLAT":
+                    search_params["params"]["nprobe"] = self.config.get(
+                        "index_ivf_nprobe", 16
+                    )
+
+                for batch_size in batch_sizes:
+                    self.logger.info(f"Testing topk={topk}, batch_size={batch_size}")
+
+                    times = []
+                    for i in range(
+                        0, min(query_count, 100), batch_size
+                    ):  # Limit to 100 queries for speed
+                        batch_vectors = query_vectors[i : i + batch_size]
+
+                        start_time = time.time()
+                        results = self.collection.search(
+                            batch_vectors,
+                            "vector",
+                            search_params,
+                            limit=topk,
+                            output_fields=["id"],
+                        )
+                        query_time = time.time() - start_time
+                        times.append(query_time)
+
+                    avg_time = sum(times) / len(times)
+                    qps = batch_size / avg_time
+
+                    query_results[f"topk_{topk}"][f"batch_{batch_size}"] = {
+                        "average_time_seconds": avg_time,
+                        "queries_per_second": qps,
+                        "total_queries": len(times) * batch_size,
+                    }
+
+            self.results["query_performance"] = query_results
+            self.logger.info("Query benchmark completed")
+            return True
+
+        except Exception as e:
+            self.logger.error(f"Query benchmark failed: {e}")
+            return False
+
+    def run_benchmark(self) -> bool:
+        """Run complete benchmark suite"""
+        self.logger.info("Starting Milvus benchmark suite...")
+
+        if not self.connect_to_milvus():
+            return False
+
+        if not self.create_collection():
+            return False
+
+        if not self.benchmark_insert():
+            return False
+
+        if not self.benchmark_index_creation():
+            return False
+
+        if not self.benchmark_queries():
+            return False
+
+        self.logger.info("Benchmark suite completed successfully")
+        return True
+
+    def save_results(self, output_file: str):
+        """Save benchmark results to file"""
+        try:
+            with open(output_file, "w") as f:
+                json.dump(self.results, f, indent=2)
+            self.logger.info(f"Results saved to {output_file}")
+        except Exception as e:
+            self.logger.error(f"Failed to save results: {e}")
+
+
+def main():
+    parser = argparse.ArgumentParser(description="Milvus Vector Database Benchmark")
+    parser.add_argument("--config", required=True, help="JSON configuration file")
+    parser.add_argument("--output", required=True, help="Output results file")
+
+    args = parser.parse_args()
+
+    # Load configuration
+    try:
+        with open(args.config, "r") as f:
+            config = json.load(f)
+    except Exception as e:
+        print(f"Error loading config file: {e}")
+        return 1
+
+    # Run benchmark
+    benchmark = MilvusBenchmark(config)
+    success = benchmark.run_benchmark()
+
+    # Save results
+    benchmark.save_results(args.output)
+
+    return 0 if success else 1
+
+
+if __name__ == "__main__":
+    sys.exit(main())
diff --git a/playbooks/roles/milvus/files/milvus_utils.py b/playbooks/roles/milvus/files/milvus_utils.py
new file mode 100644
index 00000000..15b8af4f
--- /dev/null
+++ b/playbooks/roles/milvus/files/milvus_utils.py
@@ -0,0 +1,134 @@
+#!/usr/bin/env python3
+"""
+Utility functions for Milvus benchmarking
+"""
+
+import numpy as np
+import time
+from typing import List, Dict, Any
+from pymilvus import Collection, utility
+
+
+def generate_random_vectors(dim: int, count: int) -> np.ndarray:
+    """Generate random vectors for testing"""
+    return np.random.random((count, dim)).astype("float32")
+
+
+def create_collection(name: str, dim: int, metric_type: str = "L2") -> Collection:
+    """Create a Milvus collection with specified parameters"""
+    from pymilvus import CollectionSchema, FieldSchema, DataType
+
+    fields = [
+        FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
+        FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=dim),
+    ]
+
+    schema = CollectionSchema(
+        fields=fields, description=f"Benchmark collection dim={dim}"
+    )
+    collection = Collection(name=name, schema=schema)
+
+    return collection
+
+
+def create_index(
+    collection: Collection, index_type: str = "IVF_FLAT", nlist: int = 1024
+):
+    """Create an index on the collection"""
+    index_params = {
+        "metric_type": "L2",
+        "index_type": index_type,
+        "params": {"nlist": nlist},
+    }
+
+    collection.create_index(field_name="embedding", index_params=index_params)
+    collection.load()
+
+
+def benchmark_insert(
+    collection: Collection, vectors: np.ndarray, batch_size: int = 10000
+) -> Dict[str, Any]:
+    """Benchmark vector insertion"""
+    total_vectors = len(vectors)
+    results = {
+        "total_vectors": total_vectors,
+        "batch_size": batch_size,
+        "batches": [],
+        "total_time": 0,
+    }
+
+    start_time = time.time()
+
+    for i in range(0, total_vectors, batch_size):
+        batch_start = time.time()
+        batch_vectors = vectors[i : i + batch_size].tolist()
+
+        collection.insert([batch_vectors])
+
+        batch_time = time.time() - batch_start
+        results["batches"].append(
+            {
+                "batch_idx": i // batch_size,
+                "vectors": len(batch_vectors),
+                "time": batch_time,
+                "throughput": len(batch_vectors) / batch_time,
+            }
+        )
+
+    collection.flush()
+
+    results["total_time"] = time.time() - start_time
+    results["avg_throughput"] = total_vectors / results["total_time"]
+
+    return results
+
+
+def benchmark_search(
+    collection: Collection, query_vectors: np.ndarray, top_k: int = 10, nprobe: int = 10
+) -> Dict[str, Any]:
+    """Benchmark vector search"""
+    search_params = {"metric_type": "L2", "params": {"nprobe": nprobe}}
+
+    results = {
+        "num_queries": len(query_vectors),
+        "top_k": top_k,
+        "nprobe": nprobe,
+        "queries": [],
+        "total_time": 0,
+    }
+
+    start_time = time.time()
+
+    for i, query in enumerate(query_vectors):
+        query_start = time.time()
+
+        search_results = collection.search(
+            data=[query.tolist()],
+            anns_field="embedding",
+            param=search_params,
+            limit=top_k,
+        )
+
+        query_time = time.time() - query_start
+        results["queries"].append(
+            {"query_idx": i, "time": query_time, "num_results": len(search_results[0])}
+        )
+
+    results["total_time"] = time.time() - start_time
+    results["avg_latency"] = results["total_time"] / len(query_vectors)
+    results["qps"] = len(query_vectors) / results["total_time"]
+
+    return results
+
+
+def get_collection_stats(collection: Collection) -> Dict[str, Any]:
+    """Get collection statistics"""
+    collection.flush()
+    stats = collection.num_entities
+
+    return {
+        "name": collection.name,
+        "num_entities": stats,
+        "loaded": utility.load_state(collection.name).name,
+        "index": collection.indexes,
+    }
diff --git a/playbooks/roles/milvus/meta/main.yml b/playbooks/roles/milvus/meta/main.yml
new file mode 100644
index 00000000..6af514b7
--- /dev/null
+++ b/playbooks/roles/milvus/meta/main.yml
@@ -0,0 +1,30 @@
+---
+galaxy_info:
+  author: kdevops AI team
+  description: Milvus vector database installation and setup for AI workflows
+  company: kdevops
+  license: copyleft-next-0.3.1
+  min_ansible_version: 2.9
+  platforms:
+    - name: Debian
+      versions:
+        - bookworm
+        - bullseye
+    - name: Ubuntu
+      versions:
+        - jammy
+        - focal
+    - name: Fedora
+      versions:
+        - all
+    - name: EL
+      versions:
+        - 8
+        - 9
+  galaxy_tags:
+    - ai
+    - vector_database
+    - milvus
+    - machine_learning
+
+dependencies: []
diff --git a/playbooks/roles/milvus/tasks/benchmark.yml b/playbooks/roles/milvus/tasks/benchmark.yml
new file mode 100644
index 00000000..222a00e9
--- /dev/null
+++ b/playbooks/roles/milvus/tasks/benchmark.yml
@@ -0,0 +1,61 @@
+---
+# Check if Milvus is actually running before attempting benchmarks
+- name: Check if Milvus is accessible
+  ansible.builtin.wait_for:
+    port: "{{ ai_vector_db_milvus_port }}"
+    host: localhost
+    timeout: 5
+    state: started
+  register: milvus_running
+  # TODO: Review - was ignore_errors: true
+  failed_when: false  # Always succeed - review this condition
+
+- name: Set Milvus availability flag
+  ansible.builtin.set_fact:
+    milvus_is_available: "{{ milvus_running.failed is not defined or not milvus_running.failed }}"
+
+- name: Debug Milvus check result
+  ansible.builtin.debug:
+    msg: |
+      Milvus check result: {{ milvus_running }}
+      Is succeeded: {{ milvus_running is succeeded }}
+      Is failed: {{ milvus_running is failed }}
+      Milvus is available: {{ milvus_is_available }}
+
+- name: Skip benchmarks if Milvus is not running
+  ansible.builtin.debug:
+    msg: |
+      Milvus is not running on port {{ ai_vector_db_milvus_port }}.
+      In native mode, Milvus server is not available.
+      Skipping benchmarks. Use Docker mode for full functionality.
+  when: not milvus_is_available
+
+- name: Run benchmark tasks only if Milvus is available
+  block:
+    - name: Create benchmark results directory
+      ansible.builtin.file:
+        path: "{{ ai_benchmark_results_dir }}/milvus"
+        state: directory
+        mode: '0755'
+
+    - name: Generate benchmark configuration
+      ansible.builtin.template:
+        src: benchmark_config.json.j2
+        dest: "{{ ai_vector_db_milvus_data_dir }}/scripts/benchmark_config.json"
+        mode: '0644'
+
+    - name: Run Milvus benchmarks
+      ansible.builtin.command: >
+        python3 {{ ai_vector_db_milvus_data_dir }}/scripts/milvus_benchmark.py
+        --config {{ ai_vector_db_milvus_data_dir }}/scripts/benchmark_config.json
+        --output {{ ai_benchmark_results_dir }}/milvus/results_{{ ansible_date_time.epoch }}.json
+      register: benchmark_result
+      when: ai_vector_db_milvus_benchmark_enable | bool
+
+    - name: Display benchmark summary
+      ansible.builtin.debug:
+        msg: "{{ benchmark_result.stdout_lines[-20:] }}"
+      when:
+        - ai_vector_db_milvus_benchmark_enable|bool
+        - benchmark_result is defined
+  when: milvus_is_available
diff --git a/playbooks/roles/milvus/tasks/benchmark_setup.yml b/playbooks/roles/milvus/tasks/benchmark_setup.yml
new file mode 100644
index 00000000..68ce18e4
--- /dev/null
+++ b/playbooks/roles/milvus/tasks/benchmark_setup.yml
@@ -0,0 +1,58 @@
+---
+# Setup benchmark scripts and directories only
+# This is used when running benchmarks on already-setup infrastructure
+
+- name: Ensure Python dependencies are installed
+  ansible.builtin.package:
+    name:
+      - python3-numpy
+      - python3-pandas
+      - python3-tqdm
+      - python3-pip
+    state: present
+  become: true
+
+- name: Check if pymilvus is installed
+  ansible.builtin.command: python3 -c "import pymilvus; print(pymilvus.__version__)"
+    changed_when: false
+  register: pymilvus_check
+  changed_when: false
+  failed_when: false
+
+- name: Install Python Milvus client with pip
+  ansible.builtin.pip:
+    name:
+      - pymilvus>={{ ai_vector_db_milvus_version }}
+    state: present
+    extra_args: --break-system-packages
+  become: true
+  when: pymilvus_check.rc != 0 or pymilvus_check.stdout is version(ai_vector_db_milvus_version, '<')
+
+- name: Create benchmark scripts directory
+  ansible.builtin.file:
+    path: "{{ ai_vector_db_milvus_data_dir }}/scripts"
+    state: directory
+    mode: '0755'
+  register: scripts_dir_result
+
+- name: Check if benchmark scripts exist
+  ansible.builtin.stat:
+    path: "{{ ai_vector_db_milvus_data_dir }}/scripts/{{ item }}"
+  loop:
+    - milvus_benchmark.py
+    - milvus_utils.py
+  register: benchmark_scripts_check
+
+- name: Copy benchmark scripts
+  ansible.builtin.copy:
+    src: "{{ item.item }}"
+    dest: "{{ ai_vector_db_milvus_data_dir }}/scripts/"
+    mode: '0755'
+  loop: "{{ benchmark_scripts_check.results }}"
+  when: not item.stat.exists or scripts_dir_result is changed
+
+- name: Create initial connection test script
+  ansible.builtin.template:
+    src: test_connection.py.j2
+    dest: "{{ ai_vector_db_milvus_data_dir }}/scripts/test_connection.py"
+    mode: '0755'
diff --git a/playbooks/roles/milvus/tasks/install_docker.yml b/playbooks/roles/milvus/tasks/install_docker.yml
new file mode 100644
index 00000000..e1e1d911
--- /dev/null
+++ b/playbooks/roles/milvus/tasks/install_docker.yml
@@ -0,0 +1,97 @@
+---
+- name: Check if Docker packages are installed (Debian)
+  ansible.builtin.command: dpkg -l docker.io docker-compose
+  register: docker_packages_check
+  changed_when: false
+  failed_when: false
+  when: ansible_os_family == "Debian"
+
+- name: Install Docker and Python dependencies
+  ansible.builtin.package:
+    name:
+      - docker.io
+      - docker-compose
+      - python3-pip
+      - python3-setuptools
+      - python3-packaging
+    state: present
+  become: true
+  when:
+    - ansible_os_family == "Debian"
+    - docker_packages_check.rc != 0
+
+- name: Check if Docker packages are installed (RedHat)
+  # TODO: Consider using package_facts module instead of rpm command
+  ansible.builtin.command: rpm -q docker docker-compose
+  register: docker_packages_check_rh
+  changed_when: false
+  failed_when: false
+  when: ansible_os_family == "RedHat"
+
+- name: Install Docker and Python dependencies (RedHat)
+  ansible.builtin.package:
+    name:
+      - docker
+      - docker-compose
+      - python3-pip
+      - python3-setuptools
+    state: present
+  become: true
+  when:
+    - ansible_os_family == "RedHat"
+    - docker_packages_check_rh.rc != 0
+
+- name: Check if user is in docker group
+  ansible.builtin.shell: groups {{ data_user | default(ansible_user_id) }} | grep -q docker
+  register: user_docker_group_check
+  changed_when: false
+  failed_when: false
+
+- name: Add user to docker group
+  ansible.builtin.user:
+    name: "{{ data_user | default(ansible_user_id) }}"
+    groups: docker
+    append: true
+  become: true
+  when: user_docker_group_check.rc != 0
+
+- name: Ensure Docker service is started
+  ansible.builtin.systemd:
+    name: docker
+    state: started
+    enabled: true
+  become: true
+
+- name: Create Milvus directories
+  ansible.builtin.file:
+    path: "{{ item }}"
+    state: directory
+    mode: '0755'
+    owner: "{{ data_user | default(ansible_user_id) }}"
+  become: true
+  loop:
+    - "{{ ai_vector_db_milvus_data_dir }}"
+    - "{{ ai_vector_db_milvus_config_dir }}"
+    - "{{ ai_vector_db_milvus_log_dir }}"
+    - "{{ ai_vector_db_milvus_docker_data_path }}"
+    - "{{ ai_vector_db_milvus_docker_etcd_data_path }}"
+    - "{{ ai_vector_db_milvus_docker_minio_data_path }}"
+
+- name: Check if docker-compose.yml exists
+  ansible.builtin.stat:
+    path: "{{ ai_vector_db_milvus_config_dir }}/docker-compose.yml"
+  register: docker_compose_exists
+
+- name: Remove old docker-compose override file if exists
+  ansible.builtin.file:
+    path: "{{ ai_vector_db_milvus_config_dir }}/docker-compose.override.yml"
+    state: absent
+  become: true
+  when: not docker_compose_exists.stat.exists
+
+- name: Create Milvus docker-compose file
+  ansible.builtin.template:
+    src: docker-compose.yml.j2
+    dest: "{{ ai_vector_db_milvus_config_dir }}/docker-compose.yml"
+    mode: '0644'
+  become: true
diff --git a/playbooks/roles/milvus/tasks/main.yml b/playbooks/roles/milvus/tasks/main.yml
new file mode 100644
index 00000000..4088cb47
--- /dev/null
+++ b/playbooks/roles/milvus/tasks/main.yml
@@ -0,0 +1,52 @@
+---
+- name: Include role create_data_partition
+  include_role:
+    name: create_data_partition
+  tags: ['setup', 'data_partition']
+
+- name: Include role common
+  include_role:
+    name: common
+  when:
+    - infer_uid_and_group|bool
+
+- name: Ensure data_dir has correct ownership
+  tags: ['setup']
+  become: true
+  # become_method: sudo  # sudo is the default, not needed
+  ansible.builtin.file:
+    path: "{{ data_path }}"
+    owner: "{{ data_user }}"
+    group: "{{ data_group }}"
+    recurse: false
+    state: directory
+    mode: '0755'
+
+- name: Ensure Milvus-specific subdirectories have correct ownership
+  tags: ['setup']
+  become: true
+  # become_method: sudo  # sudo is the default, not needed
+  ansible.builtin.file:
+    path: "{{ item }}"
+    owner: "{{ data_user }}"
+    group: "{{ data_group }}"
+    recurse: true
+    state: directory
+    mode: '0755'
+  loop:
+    - "{{ data_path }}/milvus"
+    - "{{ ai_vector_db_milvus_docker_data_path | default(data_path + '/milvus/data') }}"
+    - "{{ ai_vector_db_milvus_docker_etcd_data_path | default(data_path + '/milvus/etcd') }}"
+    - "{{ ai_vector_db_milvus_docker_minio_data_path | default(data_path + '/milvus/minio') }}"
+    - "{{ data_path }}/ai-benchmark"
+  # TODO: Review - was ignore_errors: true
+  failed_when: false  # Always succeed - review this condition
+
+- name: Include Docker installation tasks
+  ansible.builtin.include_tasks: install_docker.yml
+
+- name: Include setup tasks
+  ansible.builtin.include_tasks: setup.yml
+
+# Benchmarks are included via separate playbook call with proper tags
+# They are not run during the initial setup phase
diff --git a/playbooks/roles/milvus/tasks/setup.yml b/playbooks/roles/milvus/tasks/setup.yml
new file mode 100644
index 00000000..e9b8b6d5
--- /dev/null
+++ b/playbooks/roles/milvus/tasks/setup.yml
@@ -0,0 +1,107 @@
+---
+- name: Install Python virtual environment support
+  ansible.builtin.package:
+    name:
+      - python3-venv
+      - python3-pip
+    state: present
+  become: true
+
+- name: Check if virtual environment exists
+  ansible.builtin.stat:
+    path: "{{ data_path }}/ai-benchmark/venv"
+  register: venv_stat
+
+- name: Create Python virtual environment for AI benchmarks
+  ansible.builtin.command: python3 -m venv {{ data_path }}/ai-benchmark/venv
+  when: not venv_stat.stat.exists
+
+- name: Upgrade pip in virtual environment
+  ansible.builtin.command: "{{ data_path }}/ai-benchmark/venv/bin/python -m pip install --upgrade pip"
+  register: pip_upgrade
+  changed_when: "'Successfully installed' in pip_upgrade.stdout"
+
+- name: Install required Python packages in virtual environment
+  ansible.builtin.pip:
+    name:
+      - "pymilvus>={{ ai_vector_db_milvus_version }}"
+      - numpy
+      - pandas
+      - tqdm
+    virtualenv: "{{ data_path }}/ai-benchmark/venv"
+    state: present
+
+- name: Verify pymilvus is installed in virtual environment
+  ansible.builtin.command: "{{ data_path }}/ai-benchmark/venv/bin/python -c 'import pymilvus; print(pymilvus.__version__)'"
+  register: pymilvus_version
+  changed_when: false
+  failed_when: false
+
+- name: Display pymilvus version
+  ansible.builtin.debug:
+    msg: "pymilvus version: {{ pymilvus_version.stdout }}"
+  when: pymilvus_version.rc == 0
+
+- name: Check Docker Compose services status
+  ansible.builtin.shell: |
+    cd {{ ai_vector_db_milvus_config_dir }}
+    docker-compose ps --format json
+  when: ai_vector_db_milvus_docker | bool
+  become: true
+  register: docker_status_check
+  changed_when: false
+  failed_when: false
+
+- name: Start Milvus with Docker Compose
+  ansible.builtin.shell: |
+    cd {{ ai_vector_db_milvus_config_dir }}
+    docker-compose up -d
+  when: docker_status_check.rc != 0 or "running" not in docker_status_check.stdout
+  become: true
+  register: docker_compose_result
+  changed_when: "'Started' in docker_compose_result.stderr or 'Created' in docker_compose_result.stderr"
+
+- name: Wait for Milvus to be ready
+  ansible.builtin.wait_for:
+    port: "{{ ai_vector_db_milvus_port }}"
+    host: localhost
+    delay: 60
+    timeout: 300
+
+- name: Create benchmark scripts directory
+  ansible.builtin.file:
+    path: "{{ ai_vector_db_milvus_data_dir }}/scripts"
+    state: directory
+    mode: '0755'
+  register: scripts_dir_result
+
+- name: Check if benchmark scripts exist
+  ansible.builtin.stat:
+    path: "{{ ai_vector_db_milvus_data_dir }}/scripts/{{ item }}"
+  loop:
+    - milvus_benchmark.py
+    - milvus_utils.py
+  register: benchmark_scripts_check
+
+- name: Copy benchmark scripts
+  ansible.builtin.copy:
+    src: "{{ item.item }}"
+    dest: "{{ ai_vector_db_milvus_data_dir }}/scripts/"
+    mode: '0755'
+  loop: "{{ benchmark_scripts_check.results }}"
+  when: not item.stat.exists or scripts_dir_result is changed
+
+- name: Create initial connection test script
+  ansible.builtin.template:
+    src: test_connection.py.j2
+    dest: "{{ ai_vector_db_milvus_data_dir }}/scripts/test_connection.py"
+    mode: '0755'
+
+- name: Test Milvus connection
+  ansible.builtin.command: "{{ data_path }}/ai-benchmark/venv/bin/python {{ ai_vector_db_milvus_data_dir }}/scripts/test_connection.py"
+  register: connection_test
+  changed_when: false
+
+- name: Display connection test result
+  ansible.builtin.debug:
+    msg: "{{ connection_test.stdout }}"
diff --git a/playbooks/roles/milvus/templates/benchmark_config.json.j2 b/playbooks/roles/milvus/templates/benchmark_config.json.j2
new file mode 100644
index 00000000..f3ed04a0
--- /dev/null
+++ b/playbooks/roles/milvus/templates/benchmark_config.json.j2
@@ -0,0 +1,25 @@
+{
+    "milvus": {
+        "host": "localhost",
+        "port": {{ ai_vector_db_milvus_port }},
+        "collection_name": "{{ ai_vector_db_milvus_default_collection }}",
+        "dimension": {{ ai_vector_db_milvus_default_dim }},
+        "index_type": "{{ ai_vector_db_milvus_index_type }}",
+        "metric_type": "{{ ai_vector_db_milvus_metric_type }}",
+        "nlist": {{ ai_vector_db_milvus_nlist }},
+        "num_shards": {{ ai_vector_db_milvus_default_shards }}
+    },
+    "benchmark": {
+        "datasets": {{ ai_vector_db_milvus_benchmark_datasets | to_json }},
+        "batch_size": {{ ai_vector_db_milvus_benchmark_batch_size }},
+        "num_queries": {{ ai_vector_db_milvus_benchmark_num_queries }},
+        "top_k": [1, 10, 100],
+        "nprobe": [1, 10, 50, 100]
+    },
+    "filesystem": {
+        "type": "{{ ai_filesystem }}",
+        "mount_point": "{{ ai_data_device_path }}",
+        "mkfs_opts": "{{ ai_mkfs_opts | default('') }}",
+        "mount_opts": "{{ ai_mount_opts | default('defaults') }}"
+    }
+}
diff --git a/playbooks/roles/milvus/templates/docker-compose.override.yml.j2 b/playbooks/roles/milvus/templates/docker-compose.override.yml.j2
new file mode 100644
index 00000000..b4f96a44
--- /dev/null
+++ b/playbooks/roles/milvus/templates/docker-compose.override.yml.j2
@@ -0,0 +1,24 @@
+services:
+  milvus-standalone:
+    environment:
+      MILVUS_DATA_DIR: /var/lib/milvus
+      MILVUS_LOG_DIR: /var/log/milvus
+    volumes:
+      - {{ ai_vector_db_milvus_data_dir }}/volumes/milvus:/var/lib/milvus
+      - {{ ai_vector_db_milvus_log_dir }}:/var/log/milvus
+    ports:
+      - "{{ ai_vector_db_milvus_port }}:19530"
+      - "{{ ai_vector_db_milvus_metrics_port }}:9091"
+    deploy:
+      resources:
+        limits:
+          memory: {{ ai_vector_db_milvus_memory_limit }}
+          cpus: '{{ ai_vector_db_milvus_cpu_limit }}'
+
+  etcd:
+    volumes:
+      - {{ ai_vector_db_milvus_data_dir }}/volumes/etcd:/etcd
+
+  minio:
+    volumes:
+      - {{ ai_vector_db_milvus_data_dir }}/volumes/minio:/minio_data
diff --git a/playbooks/roles/milvus/templates/docker-compose.yml.j2 b/playbooks/roles/milvus/templates/docker-compose.yml.j2
new file mode 100644
index 00000000..6a611c51
--- /dev/null
+++ b/playbooks/roles/milvus/templates/docker-compose.yml.j2
@@ -0,0 +1,64 @@
+services:
+  etcd:
+    container_name: {{ ai_vector_db_milvus_etcd_container_name }}
+    image: {{ ai_vector_db_milvus_etcd_container_image_string }}
+    environment:
+      - ETCD_AUTO_COMPACTION_MODE=revision
+      - ETCD_AUTO_COMPACTION_RETENTION=1000
+      - ETCD_QUOTA_BACKEND_BYTES=4294967296
+      - ETCD_SNAPSHOT_COUNT=50000
+    volumes:
+      - {{ ai_vector_db_milvus_docker_etcd_data_path }}:/etcd
+    command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd
+    # Health check disabled - etcd container doesn't have curl or etcdctl in PATH
+    # healthcheck:
+    #   test: ["CMD", "curl", "-f", "http://localhost:2379/health"]
+    #   interval: 30s
+    #   timeout: 20s
+    #   retries: 3
+    restart: unless-stopped
+
+  minio:
+    container_name: {{ ai_vector_db_milvus_minio_container_name }}
+    image: {{ ai_vector_db_milvus_minio_container_image_string }}
+    environment:
+      MINIO_ACCESS_KEY: {{ ai_vector_db_milvus_minio_access_key }}
+      MINIO_SECRET_KEY: {{ ai_vector_db_milvus_minio_secret_key }}
+    volumes:
+      - {{ ai_vector_db_milvus_docker_minio_data_path }}:/minio_data
+    command: minio server /minio_data --console-address ":{{ ai_vector_db_milvus_minio_console_port }}"
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://localhost:{{ ai_vector_db_milvus_minio_api_port }}/minio/health/live"]
+      interval: 30s
+      timeout: 20s
+      retries: 3
+    restart: unless-stopped
+    ports:
+      - "{{ ai_vector_db_milvus_minio_api_port }}:{{ ai_vector_db_milvus_minio_api_port }}"
+      - "{{ ai_vector_db_milvus_minio_console_port }}:{{ ai_vector_db_milvus_minio_console_port }}"
+
+  milvus:
+    container_name: {{ ai_vector_db_milvus_container_name }}
+    image: {{ ai_vector_db_milvus_container_image_string }}
+    command: ["milvus", "run", "standalone"]
+    environment:
+      ETCD_ENDPOINTS: etcd:{{ ai_vector_db_milvus_etcd_client_port }}
+      MINIO_ADDRESS: minio:{{ ai_vector_db_milvus_minio_api_port }}
+    volumes:
+      - {{ ai_vector_db_milvus_docker_data_path }}:/var/lib/milvus
+    depends_on:
+      - etcd
+      - minio
+    ports:
+      - "{{ ai_vector_db_milvus_port }}:19530"
+      - "{{ ai_vector_db_milvus_web_ui_port }}:9091"
+    restart: unless-stopped
+    deploy:
+      resources:
+        limits:
+          memory: {{ ai_vector_db_milvus_memory_limit }}
+          cpus: '{{ ai_vector_db_milvus_cpu_limit }}'
+
+networks:
+  default:
+    name: {{ ai_vector_db_milvus_docker_network_name }}
diff --git a/playbooks/roles/milvus/templates/milvus.yaml.j2 b/playbooks/roles/milvus/templates/milvus.yaml.j2
new file mode 100644
index 00000000..f843ec4b
--- /dev/null
+++ b/playbooks/roles/milvus/templates/milvus.yaml.j2
@@ -0,0 +1,30 @@
+# Milvus configuration file
+etcd:
+  endpoints:
+    - {{ ai_vector_db_milvus_etcd_native_client_url }}
+  rootPath: milvus
+
+minio:
+  address: localhost
+  port: 9000
+  accessKeyID: {{ ai_vector_db_milvus_minio_native_access_key }}
+  secretAccessKey: {{ ai_vector_db_milvus_minio_native_secret_key }}
+  bucketName: milvus-bucket
+  useSSL: false
+
+proxy:
+  port: {{ ai_vector_db_milvus_port }}
+
+log:
+  level: info
+  path: {{ ai_vector_db_milvus_native_log_path }}
+
+dataNode:
+  dataPath: {{ ai_vector_db_milvus_native_data_path }}
+
+indexNode:
+  enableDisk: true
+
+common:
+  security:
+    authorizationEnabled: false
diff --git a/playbooks/roles/milvus/templates/test_connection.py.j2 b/playbooks/roles/milvus/templates/test_connection.py.j2
new file mode 100644
index 00000000..d85423ba
--- /dev/null
+++ b/playbooks/roles/milvus/templates/test_connection.py.j2
@@ -0,0 +1,25 @@
+#!{{ data_path }}/ai-benchmark/venv/bin/python
+"""Test Milvus connection"""
+
+from pymilvus import connections, utility
+
+try:
+    # Connect to Milvus
+    connections.connect(
+        alias="default",
+        host="localhost",
+        port="{{ ai_vector_db_milvus_port }}"
+    )
+
+    # Check if connected
+    if utility.list_collections():
+        print("✓ Successfully connected to Milvus")
+        print(f"  Server version: {utility.get_server_version()}")
+        print(f"  Collections: {utility.list_collections()}")
+    else:
+        print("✓ Successfully connected to Milvus (no collections yet)")
+        print(f"  Server version: {utility.get_server_version()}")
+
+except Exception as e:
+    print(f"✗ Failed to connect to Milvus: {e}")
+    exit(1)
diff --git a/workflows/Makefile b/workflows/Makefile
index b5f54ff5..fe35707b 100644
--- a/workflows/Makefile
+++ b/workflows/Makefile
@@ -66,6 +66,10 @@ ifeq (y,$(CONFIG_KDEVOPS_WORKFLOW_ENABLE_FIO_TESTS))
 include workflows/fio-tests/Makefile
 endif # CONFIG_KDEVOPS_WORKFLOW_ENABLE_FIO_TESTS == y
 
+ifeq (y,$(CONFIG_KDEVOPS_WORKFLOW_ENABLE_AI))
+include workflows/ai/Makefile
+endif # CONFIG_KDEVOPS_WORKFLOW_ENABLE_AI == y
+
 ANSIBLE_EXTRA_ARGS += $(WORKFLOW_ARGS)
 ANSIBLE_EXTRA_ARGS_SEPARATED += $(WORKFLOW_ARGS_SEPARATED)
 ANSIBLE_EXTRA_ARGS_DIRECT += $(WORKFLOW_ARGS_DIRECT)
diff --git a/workflows/ai/Kconfig b/workflows/ai/Kconfig
new file mode 100644
index 00000000..2ffc6b65
--- /dev/null
+++ b/workflows/ai/Kconfig
@@ -0,0 +1,164 @@
+if KDEVOPS_WORKFLOW_ENABLE_AI
+
+choice
+	prompt "What type of AI testing do you want to run?"
+	default AI_TESTS_VECTOR_DATABASE
+
+config AI_TESTS_VECTOR_DATABASE
+	bool "Vector database performance tests"
+	select KDEVOPS_BASELINE_AND_DEV
+	output yaml
+	help
+	  Run vector database performance analysis tests.
+	  This includes testing various vector dimensions, batch sizes,
+	  and query patterns to generate performance profiles for AI workloads.
+
+	  A/B testing is enabled to compare performance across different
+	  configurations using baseline and development nodes.
+
+endchoice
+
+# Vector Database Configuration
+if AI_TESTS_VECTOR_DATABASE
+
+choice
+	prompt "Select vector database system"
+	default AI_VECTOR_DB_MILVUS
+
+config AI_VECTOR_DB_MILVUS
+	bool "Milvus - Open-source vector database"
+	output yaml
+	help
+	  Milvus is a cloud-native vector database built for scalable
+	  similarity search and AI applications. It provides high
+	  performance vector indexing and querying capabilities.
+
+endchoice
+
+# Milvus-specific configuration
+if AI_VECTOR_DB_MILVUS
+
+# CLI override support for CI testing
+config AI_VECTOR_DB_MILVUS_QUICK_TEST_SET_BY_CLI
+	bool
+	output yaml
+	default $(shell, scripts/check-cli-set-var.sh AI_VECTOR_DB_MILVUS_QUICK_TEST)
+
+config AI_VECTOR_DB_MILVUS_QUICK_TEST
+	bool "Enable quick test mode for CI/demo"
+	default y if AI_VECTOR_DB_MILVUS_QUICK_TEST_SET_BY_CLI
+	output yaml
+	help
+	  Quick test mode reduces dataset sizes and runtime for rapid validation.
+	  This is useful for CI pipelines and demonstrations.
+
+# Milvus runs in Docker containers only
+config AI_VECTOR_DB_MILVUS_DOCKER
+	bool
+	output yaml
+	default y
+	help
+	  Milvus runs inside Docker containers with embedded etcd and MinIO storage.
+	  Native installation is not supported due to complex build requirements.
+
+config AI_VECTOR_DB_MILVUS_VERSION
+	string "Milvus version"
+	output yaml
+	default "2.3.0"
+	help
+	  The version of Milvus to install and use.
+
+config AI_VECTOR_DB_MILVUS_PORT
+	int "Milvus server port"
+	output yaml
+	default 19530
+	help
+	  The port number where Milvus server is listening.
+	  Default is 19530 for standard Milvus deployment.
+
+config AI_VECTOR_DB_MILVUS_COLLECTION_NAME
+	string "Default collection name"
+	output yaml
+	default "benchmark_collection"
+	help
+	  The default collection name to use for benchmarking tests.
+
+config AI_VECTOR_DB_MILVUS_DIMENSION
+	int "Vector dimension"
+	output yaml
+	default 768
+	range 1 4096
+	help
+	  The dimension of vectors to use in benchmarks.
+	  Common dimensions: 128, 384, 768, 1536
+
+config AI_VECTOR_DB_MILVUS_DATASET_SIZE
+	int "Dataset size (number of vectors)"
+	output yaml
+	default 100000 if AI_VECTOR_DB_MILVUS_QUICK_TEST
+	default 1000000 if !AI_VECTOR_DB_MILVUS_QUICK_TEST
+	help
+	  The number of vectors to insert for benchmarking.
+	  Quick test mode uses smaller dataset for faster execution.
+
+config AI_VECTOR_DB_MILVUS_BATCH_SIZE
+	int "Batch size for insertions"
+	output yaml
+	default 10000
+	help
+	  The batch size to use when inserting vectors.
+
+config AI_VECTOR_DB_MILVUS_NUM_QUERIES
+	int "Number of search queries"
+	output yaml
+	default 1000 if AI_VECTOR_DB_MILVUS_QUICK_TEST
+	default 10000
+	help
+	  The number of search queries to execute during benchmarking.
+
+if AI_VECTOR_DB_MILVUS_DOCKER
+source "workflows/ai/Kconfig.docker"
+endif # AI_VECTOR_DB_MILVUS_DOCKER
+
+if AI_VECTOR_DB_MILVUS_NATIVE
+source "workflows/ai/Kconfig.native"
+endif # AI_VECTOR_DB_MILVUS_NATIVE
+
+endif # AI_VECTOR_DB_MILVUS
+
+endif # AI_TESTS_VECTOR_DATABASE
+
+# Common AI Benchmark Configuration
+config AI_BENCHMARK_RESULTS_DIR
+	string "Benchmark results directory"
+	output yaml
+	default "/data/ai-benchmark"
+	help
+	  Directory where benchmark results will be stored.
+
+config AI_BENCHMARK_ENABLE_GRAPHING
+	bool "Enable performance graphing"
+	output yaml
+	default y
+	help
+	  Generate performance graphs and visualizations from benchmark results.
+
+config AI_BENCHMARK_ITERATIONS
+	int "Number of benchmark iterations"
+	output yaml
+	default 3 if AI_VECTOR_DB_MILVUS_QUICK_TEST
+	default 40 if !AI_VECTOR_DB_MILVUS_QUICK_TEST
+	range 1 100
+	help
+	  The number of iterations to run for each benchmark configuration.
+	  Multiple iterations help ensure consistent results. The default
+	  of 40 is used, that will use up about 100 GiB of storage space
+	  if you use 1,000,000 vectors. This will work for existing defaults
+	  on kdevops taret nodes, as our min drive use is 100 GiB per extra
+	  drive. This should take about 1 full day of testing. If you want
+	  more than 40, be sure to account for increasing your storage drive.
+
+# Docker storage configuration
+source "workflows/ai/Kconfig.docker-storage"
+
+endif # KDEVOPS_WORKFLOW_ENABLE_AI
diff --git a/workflows/ai/Kconfig.docker b/workflows/ai/Kconfig.docker
new file mode 100644
index 00000000..012fc0b9
--- /dev/null
+++ b/workflows/ai/Kconfig.docker
@@ -0,0 +1,172 @@
+choice
+	prompt "Which Milvus container image to use?"
+	default AI_VECTOR_DB_MILVUS_CONTAINER_IMAGE_2_5
+
+config AI_VECTOR_DB_MILVUS_CONTAINER_IMAGE_2_5
+	bool "milvusdb/milvus:v2.5.10"
+	output yaml
+	help
+	  Use the latest stable Milvus 2.5.x release with enhanced
+	  performance and stability features.
+
+config AI_VECTOR_DB_MILVUS_CONTAINER_IMAGE_2_4
+	bool "milvusdb/milvus:v2.4.17"
+	output yaml
+	help
+	  Use Milvus 2.4.x for compatibility with existing workloads
+	  or when specific 2.4 features are required.
+
+endchoice
+
+config AI_VECTOR_DB_MILVUS_CONTAINER_IMAGE_STRING
+	string
+	output yaml
+	default "milvusdb/milvus:v2.5.10" if AI_VECTOR_DB_MILVUS_CONTAINER_IMAGE_2_5
+	default "milvusdb/milvus:v2.4.17" if AI_VECTOR_DB_MILVUS_CONTAINER_IMAGE_2_4
+
+config AI_VECTOR_DB_MILVUS_CONTAINER_NAME
+	string "The local Milvus container name"
+	default "milvus-ai-benchmark"
+	output yaml
+	help
+	  Set the name for the Milvus Docker container.
+
+config AI_VECTOR_DB_MILVUS_ETCD_CONTAINER_IMAGE_STRING
+	string "etcd container image"
+	output yaml
+	default "quay.io/coreos/etcd:v3.5.18" if AI_VECTOR_DB_MILVUS_CONTAINER_IMAGE_2_5
+	default "quay.io/coreos/etcd:v3.5.5" if AI_VECTOR_DB_MILVUS_CONTAINER_IMAGE_2_4
+	help
+	  The etcd container image to use for Milvus metadata storage.
+
+config AI_VECTOR_DB_MILVUS_ETCD_CONTAINER_NAME
+	string "The local etcd container name"
+	default "milvus-etcd"
+	output yaml
+	help
+	  Set the name for the etcd Docker container.
+
+config AI_VECTOR_DB_MILVUS_MINIO_CONTAINER_IMAGE_STRING
+	string "MinIO container image"
+	output yaml
+	default "minio/minio:RELEASE.2023-03-20T20-16-18Z"
+	help
+	  The MinIO container image to use for Milvus object storage.
+
+config AI_VECTOR_DB_MILVUS_MINIO_CONTAINER_NAME
+	string "The local MinIO container name"
+	default "milvus-minio"
+	output yaml
+	help
+	  Set the name for the MinIO Docker container.
+
+config AI_VECTOR_DB_MILVUS_MINIO_ACCESS_KEY
+	string "MinIO access key"
+	output yaml
+	default "minioadmin"
+	help
+	  Access key for MinIO object storage.
+
+config AI_VECTOR_DB_MILVUS_MINIO_SECRET_KEY
+	string "MinIO secret key"
+	output yaml
+	default "minioadmin"
+	help
+	  Secret key for MinIO object storage.
+
+config AI_VECTOR_DB_MILVUS_DOCKER_DATA_PATH
+	string "Host path for persistent data storage"
+	output yaml
+	default "/data/milvus/data"
+	help
+	  Directory on the host where Milvus data will be persisted.
+	  This includes vector data, metadata, and logs.
+
+config AI_VECTOR_DB_MILVUS_DOCKER_ETCD_DATA_PATH
+	string "Host path for etcd data storage"
+	output yaml
+	default "/data/milvus/etcd"
+	help
+	  Directory on the host where etcd data will be persisted.
+
+config AI_VECTOR_DB_MILVUS_DOCKER_MINIO_DATA_PATH
+	string "Host path for MinIO data storage"
+	output yaml
+	default "/data/milvus/minio"
+	help
+	  Directory on the host where MinIO data will be persisted.
+
+config AI_VECTOR_DB_MILVUS_DOCKER_NETWORK_NAME
+	string "Docker network name"
+	output yaml
+	default "milvus-network"
+	help
+	  Name of the Docker network to create for Milvus containers.
+
+config AI_VECTOR_DB_MILVUS_WEB_UI_PORT
+	int "Milvus web UI port"
+	output yaml
+	default "9091"
+	help
+	  Port for accessing the Milvus web UI interface.
+
+config AI_VECTOR_DB_MILVUS_MINIO_API_PORT
+	int "MinIO API port"
+	output yaml
+	default "9000"
+	help
+	  Port for MinIO API access.
+
+config AI_VECTOR_DB_MILVUS_MINIO_CONSOLE_PORT
+	int "MinIO console port"
+	output yaml
+	default "9001"
+	help
+	  Port for MinIO web console access.
+
+config AI_VECTOR_DB_MILVUS_ETCD_CLIENT_PORT
+	int "etcd client port"
+	output yaml
+	default "2379"
+	help
+	  Port for etcd client connections.
+
+config AI_VECTOR_DB_MILVUS_ETCD_PEER_PORT
+	int "etcd peer port"
+	output yaml
+	default "2380"
+	help
+	  Port for etcd peer connections.
+
+menu "Docker resource limits"
+
+config AI_VECTOR_DB_MILVUS_MEMORY_LIMIT
+	string "Milvus container memory limit"
+	output yaml
+	default "8g"
+	help
+	  Memory limit for the Milvus container. Adjust based on
+	  your system resources and dataset size.
+
+config AI_VECTOR_DB_MILVUS_CPU_LIMIT
+	string "Milvus container CPU limit"
+	output yaml
+	default "4.0"
+	help
+	  CPU limit for the Milvus container (number of CPUs).
+
+config AI_VECTOR_DB_MILVUS_ETCD_MEMORY_LIMIT
+	string "etcd container memory limit"
+	output yaml
+	default "1g"
+	help
+	  Memory limit for the etcd container.
+
+config AI_VECTOR_DB_MILVUS_MINIO_MEMORY_LIMIT
+	string "MinIO container memory limit"
+	output yaml
+	default "2g"
+	help
+	  Memory limit for the MinIO container.
+
+endmenu
diff --git a/workflows/ai/Kconfig.docker-storage b/workflows/ai/Kconfig.docker-storage
new file mode 100644
index 00000000..33efce4f
--- /dev/null
+++ b/workflows/ai/Kconfig.docker-storage
@@ -0,0 +1,201 @@
+menu "Docker Storage Configuration for AI Workloads"
+
+config AI_DOCKER_STORAGE_ENABLE
+	bool "Enable dedicated Docker storage for AI workloads"
+	default y
+	output yaml
+	help
+	  Configure a dedicated storage device for Docker containers
+	  and images used in AI workloads. This prevents Docker from
+	  filling up the root filesystem and provides better performance
+	  isolation for container operations.
+
+	  When enabled, Docker data will be stored on a dedicated device
+	  and filesystem optimized for container workloads.
+
+if AI_DOCKER_STORAGE_ENABLE
+
+config AI_DOCKER_DEVICE
+	string "Device to use for Docker storage"
+	output yaml
+	default "/dev/disk/by-id/nvme-QEMU_NVMe_Ctrl_kdevops1" if LIBVIRT && LIBVIRT_EXTRA_STORAGE_DRIVE_NVME
+	default "/dev/disk/by-id/virtio-kdevops1" if LIBVIRT && LIBVIRT_EXTRA_STORAGE_DRIVE_VIRTIO
+	default "/dev/disk/by-id/ata-QEMU_HARDDISK_kdevops1" if LIBVIRT && LIBVIRT_EXTRA_STORAGE_DRIVE_IDE
+	default "/dev/nvme2n1" if TERRAFORM_AWS_INSTANCE_M5AD_2XLARGE
+	default "/dev/nvme2n1" if TERRAFORM_AWS_INSTANCE_M5AD_4XLARGE
+	default "/dev/nvme1n1" if TERRAFORM_GCE
+	default "/dev/sdd" if TERRAFORM_AZURE
+	default TERRAFORM_OCI_SPARSE_VOLUME_DEVICE_FILE_NAME if TERRAFORM_OCI
+	help
+	  The device to use for Docker storage. This device will be
+	  formatted and mounted to store Docker containers, images,
+	  and volumes for AI workloads.
+
+config AI_DOCKER_MOUNT_POINT
+	string "Mount point for Docker storage"
+	output yaml
+	default "/var/lib/docker/"
+	help
+	  The path where the Docker storage filesystem will be mounted.
+	  Docker will be configured to use this path via a symlink from
+	  /var/lib/docker.
+
+choice
+	prompt "Docker storage filesystem"
+	default AI_DOCKER_FSTYPE_XFS
+
+config AI_DOCKER_FSTYPE_XFS
+	bool "XFS"
+	help
+	  Use XFS filesystem for Docker storage. XFS provides excellent
+	  performance for large files and is recommended for production
+	  Docker deployments. Supports various block sizes for testing
+	  large block size (LBS) configurations.
+
+config AI_DOCKER_FSTYPE_BTRFS
+	bool "Btrfs"
+	help
+	  Use Btrfs filesystem for Docker storage. Btrfs provides
+	  advanced features like snapshots and compression, which can
+	  be beneficial for Docker layer management.
+
+config AI_DOCKER_FSTYPE_EXT4
+	bool "ext4"
+	help
+	  Use ext4 filesystem for Docker storage. Ext4 is a mature
+	  and reliable filesystem with good all-around performance.
+
+endchoice
+
+config AI_DOCKER_FSTYPE
+	string
+	output yaml
+	default "xfs" if AI_DOCKER_FSTYPE_XFS
+	default "btrfs" if AI_DOCKER_FSTYPE_BTRFS
+	default "ext4" if AI_DOCKER_FSTYPE_EXT4
+
+if AI_DOCKER_FSTYPE_XFS
+
+choice
+	prompt "XFS block size configuration"
+	default AI_DOCKER_XFS_BLOCKSIZE_4K
+
+config AI_DOCKER_XFS_BLOCKSIZE_4K
+	bool "4K block size (default)"
+	help
+	  Use 4K (4096 bytes) block size. This is the default and most
+	  compatible configuration.
+
+config AI_DOCKER_XFS_BLOCKSIZE_8K
+	bool "8K block size"
+	help
+	  Use 8K (8192 bytes) block size for improved performance with
+	  larger I/O operations.
+
+config AI_DOCKER_XFS_BLOCKSIZE_16K
+	bool "16K block size (LBS)"
+	help
+	  Use 16K (16384 bytes) block size. This is a large block size
+	  configuration that may require kernel LBS support.
+
+config AI_DOCKER_XFS_BLOCKSIZE_32K
+	bool "32K block size (LBS)"
+	help
+	  Use 32K (32768 bytes) block size. This is a large block size
+	  configuration that requires kernel LBS support.
+
+config AI_DOCKER_XFS_BLOCKSIZE_64K
+	bool "64K block size (LBS)"
+	help
+	  Use 64K (65536 bytes) block size. This is the maximum XFS block
+	  size and requires kernel LBS support.
+
+endchoice
+
+config AI_DOCKER_XFS_BLOCKSIZE
+	int
+	output yaml
+	default 4096 if AI_DOCKER_XFS_BLOCKSIZE_4K
+	default 8192 if AI_DOCKER_XFS_BLOCKSIZE_8K
+	default 16384 if AI_DOCKER_XFS_BLOCKSIZE_16K
+	default 32768 if AI_DOCKER_XFS_BLOCKSIZE_32K
+	default 65536 if AI_DOCKER_XFS_BLOCKSIZE_64K
+
+choice
+	prompt "XFS sector size"
+	default AI_DOCKER_XFS_SECTORSIZE_4K
+
+config AI_DOCKER_XFS_SECTORSIZE_4K
+	bool "4K sector size (default)"
+	help
+	  Use 4K (4096 bytes) sector size. This is the standard
+	  configuration for most modern drives.
+
+config AI_DOCKER_XFS_SECTORSIZE_512
+	bool "512 byte sector size"
+	depends on AI_DOCKER_XFS_BLOCKSIZE_4K
+	help
+	  Use legacy 512 byte sector size. Only available with 4K block size.
+
+config AI_DOCKER_XFS_SECTORSIZE_8K
+	bool "8K sector size"
+	depends on AI_DOCKER_XFS_BLOCKSIZE_8K || AI_DOCKER_XFS_BLOCKSIZE_16K || AI_DOCKER_XFS_BLOCKSIZE_32K || AI_DOCKER_XFS_BLOCKSIZE_64K
+	help
+	  Use 8K (8192 bytes) sector size. Requires block size >= 8K.
+
+config AI_DOCKER_XFS_SECTORSIZE_16K
+	bool "16K sector size (LBS)"
+	depends on AI_DOCKER_XFS_BLOCKSIZE_16K || AI_DOCKER_XFS_BLOCKSIZE_32K || AI_DOCKER_XFS_BLOCKSIZE_64K
+	help
+	  Use 16K (16384 bytes) sector size. Requires block size >= 16K
+	  and kernel LBS support.
+
+config AI_DOCKER_XFS_SECTORSIZE_32K
+	bool "32K sector size (LBS)"
+	depends on AI_DOCKER_XFS_BLOCKSIZE_32K || AI_DOCKER_XFS_BLOCKSIZE_64K
+	help
+	  Use 32K (32768 bytes) sector size. Requires block size >= 32K
+	  and kernel LBS support.
+
+endchoice
+
+config AI_DOCKER_XFS_SECTORSIZE
+	int
+	output yaml
+	default 512 if AI_DOCKER_XFS_SECTORSIZE_512
+	default 4096 if AI_DOCKER_XFS_SECTORSIZE_4K
+	default 8192 if AI_DOCKER_XFS_SECTORSIZE_8K
+	default 16384 if AI_DOCKER_XFS_SECTORSIZE_16K
+	default 32768 if AI_DOCKER_XFS_SECTORSIZE_32K
+
+config AI_DOCKER_XFS_MKFS_OPTS
+	string "Additional XFS mkfs options for Docker storage"
+	output yaml
+	default ""
+	help
+	  Additional options to pass to mkfs.xfs when creating the Docker
+	  storage filesystem. Block and sector sizes are configured above.
+
+endif # AI_DOCKER_FSTYPE_XFS
+
+config AI_DOCKER_BTRFS_MKFS_OPTS
+	string "Btrfs mkfs options for Docker storage"
+	output yaml
+	default "-f"
+	depends on AI_DOCKER_FSTYPE_BTRFS
+	help
+	  Options to pass to mkfs.btrfs when creating the Docker storage
+	  filesystem.
+
+config AI_DOCKER_EXT4_MKFS_OPTS
+	string "ext4 mkfs options for Docker storage"
+	output yaml
+	default "-F"
+	depends on AI_DOCKER_FSTYPE_EXT4
+	help
+	  Options to pass to mkfs.ext4 when creating the Docker storage
+	  filesystem.
+
+endif # AI_DOCKER_STORAGE_ENABLE
+
+endmenu
diff --git a/workflows/ai/Kconfig.native b/workflows/ai/Kconfig.native
new file mode 100644
index 00000000..ef9768c3
--- /dev/null
+++ b/workflows/ai/Kconfig.native
@@ -0,0 +1,184 @@
+choice
+	prompt "Native Milvus installation method"
+	default AI_VECTOR_DB_MILVUS_NATIVE_BINARY
+
+config AI_VECTOR_DB_MILVUS_NATIVE_BINARY
+	bool "Install from pre-built binaries"
+	output yaml
+	help
+	  Install Milvus from official pre-built binaries. This is
+	  the recommended approach for production deployments and
+	  provides optimal performance.
+
+config AI_VECTOR_DB_MILVUS_NATIVE_SOURCE
+	bool "Build from source"
+	output yaml
+	help
+	  Build Milvus from source code. This allows for custom
+	  optimizations but requires longer build times and more
+	  dependencies.
+
+endchoice
+
+config AI_VECTOR_DB_MILVUS_NATIVE_VERSION
+	string "Milvus version to install"
+	output yaml
+	default "v2.5.10" if AI_VECTOR_DB_MILVUS_NATIVE_BINARY
+	default "master" if AI_VECTOR_DB_MILVUS_NATIVE_SOURCE
+	help
+	  The Milvus version to install. For binary installation,
+	  use release tags like v2.5.10. For source builds, you
+	  can use branch names or commit hashes.
+
+config AI_VECTOR_DB_MILVUS_NATIVE_INSTALL_PATH
+	string "Installation directory"
+	output yaml
+	default "/opt/milvus"
+	help
+	  Directory where Milvus will be installed.
+
+config AI_VECTOR_DB_MILVUS_NATIVE_DATA_PATH
+	string "Data storage directory"
+	output yaml
+	default "/data/milvus"
+	help
+	  Directory where Milvus will store vector data and metadata.
+
+config AI_VECTOR_DB_MILVUS_NATIVE_LOG_PATH
+	string "Log directory"
+	output yaml
+	default "/var/log/milvus"
+	help
+	  Directory where Milvus will write log files.
+
+menu "Native dependencies configuration"
+
+config AI_VECTOR_DB_MILVUS_ETCD_NATIVE_INSTALL
+	bool "Install etcd natively"
+	output yaml
+	default y
+	help
+	  Install etcd as a native service for Milvus metadata storage.
+
+if AI_VECTOR_DB_MILVUS_ETCD_NATIVE_INSTALL
+
+config AI_VECTOR_DB_MILVUS_ETCD_NATIVE_VERSION
+	string "etcd version"
+	output yaml
+	default "v3.5.18"
+	help
+	  Version of etcd to install.
+
+config AI_VECTOR_DB_MILVUS_ETCD_NATIVE_DATA_DIR
+	string "etcd data directory"
+	output yaml
+	default "/data/etcd"
+	help
+	  Directory where etcd will store its data.
+
+config AI_VECTOR_DB_MILVUS_ETCD_NATIVE_CLIENT_URL
+	string "etcd client URL"
+	output yaml
+	default "http://localhost:2379"
+	help
+	  URL for etcd client connections.
+
+endif # AI_VECTOR_DB_MILVUS_ETCD_NATIVE_INSTALL
+
+config AI_VECTOR_DB_MILVUS_MINIO_NATIVE_INSTALL
+	bool "Install MinIO natively"
+	output yaml
+	default y
+	help
+	  Install MinIO as a native service for Milvus object storage.
+
+if AI_VECTOR_DB_MILVUS_MINIO_NATIVE_INSTALL
+
+config AI_VECTOR_DB_MILVUS_MINIO_NATIVE_VERSION
+	string "MinIO version"
+	output yaml
+	default "RELEASE.2023-03-20T20-16-18Z"
+	help
+	  Version of MinIO to install.
+
+config AI_VECTOR_DB_MILVUS_MINIO_NATIVE_DATA_DIR
+	string "MinIO data directory"
+	output yaml
+	default "/data/minio"
+	help
+	  Directory where MinIO will store object data.
+
+config AI_VECTOR_DB_MILVUS_MINIO_NATIVE_ACCESS_KEY
+	string "MinIO access key"
+	output yaml
+	default "minioadmin"
+	help
+	  Access key for MinIO authentication.
+
+config AI_VECTOR_DB_MILVUS_MINIO_NATIVE_SECRET_KEY
+	string "MinIO secret key"
+	output yaml
+	default "minioadmin"
+	help
+	  Secret key for MinIO authentication.
+
+endif # AI_VECTOR_DB_MILVUS_MINIO_NATIVE_INSTALL
+
+endmenu
+
+menu "Native service configuration"
+
+config AI_VECTOR_DB_MILVUS_NATIVE_USER
+	string "Milvus service user"
+	output yaml
+	default "milvus"
+	help
+	  System user to run the Milvus service.
+
+config AI_VECTOR_DB_MILVUS_NATIVE_GROUP
+	string "Milvus service group"
+	output yaml
+	default "milvus"
+	help
+	  System group for the Milvus service.
+
+config AI_VECTOR_DB_MILVUS_NATIVE_ENABLE_SYSTEMD
+	bool "Create systemd service files"
+	output yaml
+	default y
+	help
+	  Create systemd service files for automatic startup and
+	  service management.
+
+endmenu
+
+if AI_VECTOR_DB_MILVUS_NATIVE_SOURCE
+
+menu "Source build configuration"
+
+config AI_VECTOR_DB_MILVUS_BUILD_DEPENDENCIES
+	bool "Install build dependencies"
+	output yaml
+	default y
+	help
+	  Automatically install required build dependencies including
+	  Go compiler, CMake, and other development tools.
+
+config AI_VECTOR_DB_MILVUS_BUILD_JOBS
+	int "Number of parallel build jobs"
+	output yaml
+	default 0
+	help
+	  Number of parallel jobs for building Milvus. Set to 0
+	  to use all available CPU cores.
+
+config AI_VECTOR_DB_MILVUS_BUILD_TYPE
+	string "Build type"
+	output yaml
+	default "Release"
+	help
+	  CMake build type. Options: Release, Debug, RelWithDebInfo.
+
+endmenu
+
+endif # AI_VECTOR_DB_MILVUS_NATIVE_SOURCE
diff --git a/workflows/ai/Makefile b/workflows/ai/Makefile
new file mode 100644
index 00000000..1c297edd
--- /dev/null
+++ b/workflows/ai/Makefile
@@ -0,0 +1,160 @@
+PHONY += ai ai-baseline ai-dev ai-results ai-results-baseline ai-results-dev
+PHONY += ai-setup ai-uninstall ai-destroy ai-help-menu
+PHONY += ai-tests ai-tests-baseline ai-tests-dev
+PHONY += ai-tests-results
+
+ifeq (y,$(CONFIG_WORKFLOWS_DEDICATED_WORKFLOW))
+export KDEVOPS_HOSTS_TEMPLATE := hosts.j2
+endif
+
+export AI_DATA_TARGET := $(subst ",,$(CONFIG_AI_BENCHMARK_RESULTS_DIR))
+export AI_ARGS :=
+
+AI_ARGS += ai_benchmark_results_dir='$(AI_DATA_TARGET)'
+
+# Vector Database Configuration
+ifeq (y,$(CONFIG_AI_TESTS_VECTOR_DATABASE))
+AI_ARGS += ai_tests_vector_database=True
+else
+AI_ARGS += ai_tests_vector_database=False
+endif
+
+# Milvus-specific Configuration
+ifeq (y,$(CONFIG_AI_VECTOR_DB_MILVUS))
+AI_ARGS += ai_vector_db_milvus_enable=True
+AI_ARGS += ai_vector_db_milvus_docker=True
+else
+AI_ARGS += ai_vector_db_milvus_enable=False
+endif
+
+AI_MANUAL_ARGS :=
+
+export AI_ARGS_SEPARATED := $(subst $(space),$(comma),$(AI_ARGS))
+
+# Main AI workflow targets
+ai: $(KDEVOPS_NODES) $(ANSIBLE_INVENTORY_FILE)
+	$(Q)ansible-playbook $(ANSIBLE_VERBOSE) \
+		-i hosts \
+		playbooks/ai.yml \
+		-f 10 \
+		--extra-vars=@$(KDEVOPS_EXTRA_VARS) \
+		--extra-vars="$(AI_ARGS) $(AI_MANUAL_ARGS)" \
+		$(LIMIT_HOSTS)
+
+ai-baseline:
+	$(Q)$(MAKE) ai HOSTS="baseline"
+
+ai-dev:
+	$(Q)$(MAKE) ai HOSTS="dev"
+
+# AI Testing/Benchmark targets
+ai-tests: $(KDEVOPS_NODES) $(ANSIBLE_INVENTORY_FILE)
+	$(Q)ansible-playbook $(ANSIBLE_VERBOSE) \
+		-i hosts \
+		playbooks/ai_tests.yml \
+		-f 10 \
+		--extra-vars=@$(KDEVOPS_EXTRA_VARS) \
+		--extra-vars="$(AI_ARGS) $(AI_MANUAL_ARGS)" \
+		$(LIMIT_HOSTS)
+	$(Q)$(MAKE) ai-results
+
+ai-tests-baseline: $(KDEVOPS_NODES) $(ANSIBLE_INVENTORY_FILE)
+	$(Q)ansible-playbook $(ANSIBLE_VERBOSE) \
+		-l baseline \
+		-i hosts \
+		playbooks/ai_tests.yml \
+		-f 10 \
+		--extra-vars=@$(KDEVOPS_EXTRA_VARS) \
+		--extra-vars="$(AI_ARGS) $(AI_MANUAL_ARGS)"
+	$(Q)$(MAKE) ai-results-baseline
+
+ai-tests-dev: $(KDEVOPS_NODES) $(ANSIBLE_INVENTORY_FILE)
+	$(Q)ansible-playbook $(ANSIBLE_VERBOSE) \
+		-l dev \
+		-i hosts \
+		playbooks/ai_tests.yml \
+		-f 10 \
+		--extra-vars=@$(KDEVOPS_EXTRA_VARS) \
+		--extra-vars="$(AI_ARGS) $(AI_MANUAL_ARGS)"
+	$(Q)$(MAKE) ai-results-dev
+
+# Target to only run results analysis and graph generation
+ai-tests-results:
+	$(Q)ansible-playbook $(ANSIBLE_VERBOSE) \
+		-i hosts \
+		playbooks/ai_tests.yml \
+		-f 10 \
+		--extra-vars=@$(KDEVOPS_EXTRA_VARS) \
+		--extra-vars="$(AI_ARGS) $(AI_MANUAL_ARGS)" \
+		--tags="results" \
+		$(LIMIT_HOSTS)
+
+# Results collection targets
+ai-results:
+	$(Q)ansible-playbook $(ANSIBLE_VERBOSE) \
+		-i hosts \
+		playbooks/ai_results.yml \
+		--extra-vars=@$(KDEVOPS_EXTRA_VARS) \
+		--extra-vars="$(AI_ARGS) $(AI_MANUAL_ARGS)" \
+		$(LIMIT_HOSTS)
+
+ai-results-baseline:
+	$(Q)$(MAKE) ai-results HOSTS="baseline"
+
+ai-results-dev:
+	$(Q)$(MAKE) ai-results HOSTS="dev"
+
+ai-setup:
+	$(Q)ansible-playbook $(ANSIBLE_VERBOSE) \
+		-i hosts \
+		playbooks/ai_setup.yml \
+		--extra-vars=@$(KDEVOPS_EXTRA_VARS) \
+		--extra-vars="$(AI_ARGS) $(AI_MANUAL_ARGS)" \
+		$(LIMIT_HOSTS)
+
+ai-uninstall:
+	$(Q)ansible-playbook $(ANSIBLE_VERBOSE) \
+		-i hosts \
+		playbooks/ai_uninstall.yml \
+		--extra-vars=@$(KDEVOPS_EXTRA_VARS) \
+		--extra-vars="$(AI_ARGS) $(AI_MANUAL_ARGS)" \
+		$(LIMIT_HOSTS)
+
+ai-destroy:
+	$(Q)ansible-playbook $(ANSIBLE_VERBOSE) \
+		-i hosts \
+		playbooks/ai_destroy.yml \
+		--extra-vars=@$(KDEVOPS_EXTRA_VARS) \
+		--extra-vars="$(AI_ARGS) $(AI_MANUAL_ARGS)" \
+		$(LIMIT_HOSTS)
+
+ai-help-menu:
+	@echo "AI workflow targets:"
+	@echo ""
+	@echo "Setup targets:"
+	@echo "  ai                     - Setup AI infrastructure (installs and starts services)"
+	@echo "  ai-baseline            - Setup AI infrastructure on baseline nodes only"
+	@echo "  ai-dev                 - Setup AI infrastructure on dev nodes only"
+	@echo ""
+	@echo "Testing/Benchmark targets:"
+	@echo "  ai-tests               - Run AI benchmarks on all nodes"
+	@echo "  ai-tests-baseline      - Run AI benchmarks on baseline nodes only"
+	@echo "  ai-tests-dev           - Run AI benchmarks on dev nodes only"
+	@echo "  ai-tests-results       - Only run results analysis and graph generation"
+	@echo ""
+	@echo "Results collection:"
+	@echo "  ai-results             - Collect and analyze AI benchmark results"
+	@echo "  ai-results-baseline    - Collect results from baseline nodes only"
+	@echo "  ai-results-dev         - Collect results from dev nodes only"
+	@echo ""
+	@echo "Other targets:"
+	@echo "  ai-setup               - Legacy target (use 'make ai' instead)"
+	@echo "  ai-uninstall           - Uninstall AI benchmark components"
+	@echo "  ai-destroy             - Destroy AI benchmark environment"
+	@echo ""
+
+HELP_TARGETS += ai-help-menu
+
+EXTRA_VAR_INPUTS += AI_ARGS_SEPARATED
+
+.PHONY: $(PHONY)
diff --git a/workflows/ai/scripts/analysis_config.json b/workflows/ai/scripts/analysis_config.json
new file mode 100644
index 00000000..2f90f4d5
--- /dev/null
+++ b/workflows/ai/scripts/analysis_config.json
@@ -0,0 +1,6 @@
+{
+  "enable_graphing": true,
+  "graph_format": "png",
+  "graph_dpi": 150,
+  "graph_theme": "seaborn"
+}
diff --git a/workflows/ai/scripts/analyze_results.py b/workflows/ai/scripts/analyze_results.py
new file mode 100755
index 00000000..3d11fb11
--- /dev/null
+++ b/workflows/ai/scripts/analyze_results.py
@@ -0,0 +1,979 @@
+#!/usr/bin/env python3
+"""
+AI Benchmark Results Analysis and Visualization
+
+This script analyzes benchmark results and generates comprehensive graphs
+showing performance characteristics of the AI workload testing.
+"""
+
+import json
+import glob
+import os
+import sys
+import argparse
+import subprocess
+import platform
+from typing import List, Dict, Any
+import logging
+from datetime import datetime
+
+# Optional imports with graceful fallback
+GRAPHING_AVAILABLE = True
+try:
+    import pandas as pd
+    import matplotlib.pyplot as plt
+    import seaborn as sns
+    import numpy as np
+except ImportError as e:
+    GRAPHING_AVAILABLE = False
+    print(f"Warning: Graphing libraries not available: {e}")
+    print("Install with: pip install pandas matplotlib seaborn numpy")
+
+
+class ResultsAnalyzer:
+    def __init__(self, results_dir: str, output_dir: str, config: Dict[str, Any]):
+        self.results_dir = results_dir
+        self.output_dir = output_dir
+        self.config = config
+        self.results_data = []
+
+        # Setup logging
+        logging.basicConfig(
+            level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s"
+        )
+        self.logger = logging.getLogger(__name__)
+
+        # Create output directory
+        os.makedirs(output_dir, exist_ok=True)
+
+        # Collect system information for DUT details
+        self.system_info = self._collect_system_info()
+
+    def _collect_system_info(self) -> Dict[str, Any]:
+        """Collect system information for DUT details in HTML report"""
+        info = {}
+
+        try:
+            # Basic system information
+            info["hostname"] = platform.node()
+            info["platform"] = platform.platform()
+            info["architecture"] = platform.architecture()[0]
+            info["processor"] = platform.processor()
+
+            # Memory information
+            try:
+                with open("/proc/meminfo", "r") as f:
+                    meminfo = f.read()
+                    for line in meminfo.split("\n"):
+                        if "MemTotal:" in line:
+                            info["total_memory"] = line.split()[1] + " kB"
+                            break
+            except:
+                info["total_memory"] = "Unknown"
+
+            # CPU information
+            try:
+                with open("/proc/cpuinfo", "r") as f:
+                    cpuinfo = f.read()
+                    cpu_count = cpuinfo.count("processor")
+                    info["cpu_count"] = cpu_count
+
+                    # Extract CPU model
+                    for line in cpuinfo.split("\n"):
+                        if "model name" in line:
+                            info["cpu_model"] = line.split(":", 1)[1].strip()
+                            break
+            except:
+                info["cpu_count"] = "Unknown"
+                info["cpu_model"] = "Unknown"
+
+            # Storage information
+            info["storage_devices"] = self._get_storage_info()
+
+            # Virtualization detection
+            info["is_vm"] = self._detect_virtualization()
+
+            # Filesystem information for AI data directory
+            info["filesystem_info"] = self._get_filesystem_info()
+
+        except Exception as e:
+            self.logger.warning(f"Error collecting system information: {e}")
+
+        return info
+
+    def _get_storage_info(self) -> List[Dict[str, str]]:
+        """Get storage device information including NVMe details"""
+        devices = []
+
+        try:
+            # Get block devices
+            result = subprocess.run(
+                ["lsblk", "-J", "-o", "NAME,SIZE,TYPE,MOUNTPOINT,FSTYPE"],
+                capture_output=True,
+                text=True,
+            )
+            if result.returncode == 0:
+                lsblk_data = json.loads(result.stdout)
+                for device in lsblk_data.get("blockdevices", []):
+                    if device.get("type") == "disk":
+                        dev_info = {
+                            "name": device.get("name", ""),
+                            "size": device.get("size", ""),
+                            "type": "disk",
+                        }
+
+                        # Check if it's NVMe and get additional details
+                        if device.get("name", "").startswith("nvme"):
+                            nvme_info = self._get_nvme_info(device.get("name", ""))
+                            dev_info.update(nvme_info)
+
+                        devices.append(dev_info)
+        except Exception as e:
+            self.logger.warning(f"Error getting storage info: {e}")
+
+        return devices
+
+    def _get_nvme_info(self, device_name: str) -> Dict[str, str]:
+        """Get detailed NVMe device information"""
+        nvme_info = {}
+
+        try:
+            # Get NVMe identify info
+            result = subprocess.run(
+                ["nvme", "id-ctrl", f"/dev/{device_name}"],
+                capture_output=True,
+                text=True,
+            )
+            if result.returncode == 0:
+                output = result.stdout
+                for line in output.split("\n"):
+                    if "mn :" in line:
+                        nvme_info["model"] = line.split(":", 1)[1].strip()
+                    elif "fr :" in line:
+                        nvme_info["firmware"] = line.split(":", 1)[1].strip()
+                    elif "sn :" in line:
+                        nvme_info["serial"] = line.split(":", 1)[1].strip()
+        except Exception as e:
+            self.logger.debug(f"Could not get NVMe info for {device_name}: {e}")
+
+        return nvme_info
+
+    def _detect_virtualization(self) -> str:
+        """Detect if running in a virtual environment"""
+        try:
+            # Check systemd-detect-virt
+            result = subprocess.run(
+                ["systemd-detect-virt"], capture_output=True, text=True
+            )
+            if result.returncode == 0:
+                virt_type = result.stdout.strip()
+                return virt_type if virt_type != "none" else "Physical"
+        except:
+            pass
+
+        try:
+            # Check dmesg for virtualization hints
+            result = subprocess.run(["dmesg"], capture_output=True, text=True)
+            if result.returncode == 0:
+                dmesg_output = result.stdout.lower()
+                if "kvm" in dmesg_output:
+                    return "KVM"
+                elif "vmware" in dmesg_output:
+                    return "VMware"
+                elif "virtualbox" in dmesg_output:
+                    return "VirtualBox"
+                elif "xen" in dmesg_output:
+                    return "Xen"
+        except:
+            pass
+
+        return "Unknown"
+
+    def _get_filesystem_info(self) -> Dict[str, str]:
+        """Get filesystem information for the AI benchmark directory"""
+        fs_info = {}
+
+        try:
+            # Get filesystem info for the results directory
+            result = subprocess.run(
+                ["df", "-T", self.results_dir], capture_output=True, text=True
+            )
+            if result.returncode == 0:
+                lines = result.stdout.strip().split("\n")
+                if len(lines) > 1:
+                    fields = lines[1].split()
+                    if len(fields) >= 2:
+                        fs_info["filesystem_type"] = fields[1]
+                        fs_info["mount_point"] = (
+                            fields[6] if len(fields) > 6 else "Unknown"
+                        )
+
+            # Get mount options
+            try:
+                with open("/proc/mounts", "r") as f:
+                    for line in f:
+                        parts = line.split()
+                        if (
+                            len(parts) >= 4
+                            and fs_info.get("mount_point", "") in parts[1]
+                        ):
+                            fs_info["mount_options"] = parts[3]
+                            break
+            except:
+                pass
+        except Exception as e:
+            self.logger.warning(f"Error getting filesystem info: {e}")
+
+        return fs_info
+
+    def load_results(self) -> bool:
+        """Load all result files from the results directory"""
+        try:
+            pattern = os.path.join(self.results_dir, "results_*.json")
+            result_files = glob.glob(pattern)
+
+            if not result_files:
+                self.logger.warning(f"No result files found in {self.results_dir}")
+                return False
+
+            self.logger.info(f"Found {len(result_files)} result files")
+
+            for file_path in result_files:
+                try:
+                    with open(file_path, "r") as f:
+                        data = json.load(f)
+                        data["_file"] = os.path.basename(file_path)
+                        self.results_data.append(data)
+                except Exception as e:
+                    self.logger.error(f"Error loading {file_path}: {e}")
+
+            self.logger.info(
+                f"Successfully loaded {len(self.results_data)} result sets"
+            )
+            return len(self.results_data) > 0
+
+        except Exception as e:
+            self.logger.error(f"Error loading results: {e}")
+            return False
+
+    def generate_summary_report(self) -> str:
+        """Generate a text summary report"""
+        try:
+            report = []
+            report.append("=" * 80)
+            report.append("AI BENCHMARK RESULTS SUMMARY")
+            report.append("=" * 80)
+            report.append(f"Generated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
+            report.append(f"Total result sets: {len(self.results_data)}")
+            report.append("")
+
+            if not self.results_data:
+                report.append("No results to analyze.")
+                return "\n".join(report)
+
+            # Configuration summary
+            first_result = self.results_data[0]
+            config = first_result.get("config", {})
+
+            report.append("CONFIGURATION:")
+            report.append(
+                f"  Vector dataset size: {config.get('vector_dataset_size', 'N/A'):,}"
+            )
+            report.append(
+                f"  Vector dimensions: {config.get('vector_dimensions', 'N/A')}"
+            )
+            report.append(f"  Index type: {config.get('index_type', 'N/A')}")
+            report.append(f"  Benchmark iterations: {len(self.results_data)}")
+            report.append("")
+
+            # Insert performance summary
+            insert_times = []
+            insert_rates = []
+            for result in self.results_data:
+                insert_perf = result.get("insert_performance", {})
+                if insert_perf:
+                    insert_times.append(insert_perf.get("total_time_seconds", 0))
+                    insert_rates.append(insert_perf.get("vectors_per_second", 0))
+
+            if insert_times:
+                report.append("INSERT PERFORMANCE:")
+                report.append(
+                    f"  Average insert time: {np.mean(insert_times):.2f} seconds"
+                )
+                report.append(
+                    f"  Average insert rate: {np.mean(insert_rates):.2f} vectors/sec"
+                )
+                report.append(
+                    f"  Insert rate range: {np.min(insert_rates):.2f} - {np.max(insert_rates):.2f} vectors/sec"
+                )
+                report.append("")
+
+            # Index performance summary
+            index_times = []
+            for result in self.results_data:
+                index_perf = result.get("index_performance", {})
+                if index_perf:
+                    index_times.append(index_perf.get("creation_time_seconds", 0))
+
+            if index_times:
+                report.append("INDEX PERFORMANCE:")
+                report.append(
+                    f"  Average index creation time: {np.mean(index_times):.2f} seconds"
+                )
+                report.append(
+                    f"  Index time range: {np.min(index_times):.2f} - {np.max(index_times):.2f} seconds"
+                )
+                report.append("")
+
+            # Query performance summary
+            report.append("QUERY PERFORMANCE:")
+            for result in self.results_data:
+                query_perf = result.get("query_performance", {})
+                if query_perf:
+                    for topk, topk_data in query_perf.items():
+                        report.append(f"  {topk.upper()}:")
+                        for batch, batch_data in topk_data.items():
+                            qps = batch_data.get("queries_per_second", 0)
+                            avg_time = batch_data.get("average_time_seconds", 0)
+                            report.append(
+                                f"    {batch}: {qps:.2f} QPS, {avg_time*1000:.2f}ms avg"
+                            )
+                    break  # Only show first result for summary
+
+            return "\n".join(report)
+
+        except Exception as e:
+            self.logger.error(f"Error generating summary report: {e}")
+            return f"Error generating summary: {e}"
+
+    def generate_html_report(self) -> str:
+        """Generate comprehensive HTML report with DUT details and test configuration"""
+        try:
+            html = []
+
+            # HTML header
+            html.append("<!DOCTYPE html>")
+            html.append("<html lang='en'>")
+            html.append("<head>")
+            html.append("    <meta charset='UTF-8'>")
+            html.append(
+                "    <meta name='viewport' content='width=device-width, initial-scale=1.0'>"
+            )
+            html.append("    <title>AI Benchmark Results Report</title>")
+            html.append("    <style>")
+            html.append(
+                "        body { font-family: Arial, sans-serif; margin: 20px; line-height: 1.6; }"
+            )
+            html.append(
+                "        .header { background-color: #f4f4f4; padding: 20px; border-radius: 5px; margin-bottom: 20px; }"
+            )
+            html.append("        .section { margin-bottom: 30px; }")
+            html.append(
+                "        .section h2 { color: #333; border-bottom: 2px solid #007acc; padding-bottom: 5px; }"
+            )
+            html.append("        .section h3 { color: #555; }")
+            html.append(
+                "        table { border-collapse: collapse; width: 100%; margin-bottom: 20px; }"
+            )
+            html.append(
+                "        th, td { border: 1px solid #ddd; padding: 8px; text-align: left; }"
+            )
+            html.append("        th { background-color: #f2f2f2; font-weight: bold; }")
+            html.append(
+                "        .metric-table td:first-child { font-weight: bold; width: 30%; }"
+            )
+            html.append(
+                "        .config-table td:first-child { font-weight: bold; width: 40%; }"
+            )
+            html.append("        .performance-good { color: #27ae60; }")
+            html.append("        .performance-warning { color: #f39c12; }")
+            html.append("        .performance-poor { color: #e74c3c; }")
+            html.append(
+                "        .highlight { background-color: #fff3cd; padding: 10px; border-radius: 3px; }"
+            )
+            html.append("    </style>")
+            html.append("</head>")
+            html.append("<body>")
+
+            # Report header
+            html.append("    <div class='header'>")
+            html.append("        <h1>AI Benchmark Results Report</h1>")
+            html.append(
+                f"        <p><strong>Generated:</strong> {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}</p>"
+            )
+            html.append(
+                f"        <p><strong>Test Results:</strong> {len(self.results_data)} benchmark iterations</p>"
+            )
+
+            # Test type identification
+            html.append("        <div class='highlight'>")
+            html.append("            <h3>🤖 AI Workflow Test Type</h3>")
+            html.append(
+                "            <p><strong>Vector Database Performance Testing</strong> using <strong>Milvus Vector Database</strong></p>"
+            )
+            html.append(
+                "            <p>This test evaluates AI workload performance including vector insertion, indexing, and similarity search operations.</p>"
+            )
+            html.append("        </div>")
+            html.append("    </div>")
+
+            # Device Under Test (DUT) Section
+            html.append("    <div class='section'>")
+            html.append("        <h2>📋 Device Under Test (DUT) Details</h2>")
+            html.append("        <table class='config-table'>")
+            html.append(
+                "            <tr><td>Hostname</td><td>"
+                + str(self.system_info.get("hostname", "Unknown"))
+                + "</td></tr>"
+            )
+            html.append(
+                "            <tr><td>System Type</td><td>"
+                + str(self.system_info.get("is_vm", "Unknown"))
+                + "</td></tr>"
+            )
+            html.append(
+                "            <tr><td>Platform</td><td>"
+                + str(self.system_info.get("platform", "Unknown"))
+                + "</td></tr>"
+            )
+            html.append(
+                "            <tr><td>Architecture</td><td>"
+                + str(self.system_info.get("architecture", "Unknown"))
+                + "</td></tr>"
+            )
+            html.append(
+                "            <tr><td>CPU Model</td><td>"
+                + str(self.system_info.get("cpu_model", "Unknown"))
+                + "</td></tr>"
+            )
+            html.append(
+                "            <tr><td>CPU Count</td><td>"
+                + str(self.system_info.get("cpu_count", "Unknown"))
+                + " cores</td></tr>"
+            )
+            html.append(
+                "            <tr><td>Total Memory</td><td>"
+                + str(self.system_info.get("total_memory", "Unknown"))
+                + "</td></tr>"
+            )
+            html.append("        </table>")
+
+            # Storage devices section
+            html.append("        <h3>💾 Storage Configuration</h3>")
+            storage_devices = self.system_info.get("storage_devices", [])
+            if storage_devices:
+                html.append("        <table>")
+                html.append(
+                    "            <tr><th>Device</th><th>Size</th><th>Type</th><th>Model</th><th>Firmware</th></tr>"
+                )
+                for device in storage_devices:
+                    model = device.get("model", "N/A")
+                    firmware = device.get("firmware", "N/A")
+                    html.append(f"            <tr>")
+                    html.append(
+                        f"                <td>{device.get('name', 'Unknown')}</td>"
+                    )
+                    html.append(
+                        f"                <td>{device.get('size', 'Unknown')}</td>"
+                    )
+                    html.append(
+                        f"                <td>{device.get('type', 'Unknown')}</td>"
+                    )
+                    html.append(f"                <td>{model}</td>")
+                    html.append(f"                <td>{firmware}</td>")
+                    html.append(f"            </tr>")
+                html.append("        </table>")
+            else:
+                html.append("        <p>No storage device information available.</p>")
+
+            # Filesystem section
+            html.append("        <h3>🗂️ Filesystem Configuration</h3>")
+            fs_info = self.system_info.get("filesystem_info", {})
+            html.append("        <table class='config-table'>")
+            html.append(
+                "            <tr><td>Filesystem Type</td><td>"
+                + str(fs_info.get("filesystem_type", "Unknown"))
+                + "</td></tr>"
+            )
+            html.append(
+                "            <tr><td>Mount Point</td><td>"
+                + str(fs_info.get("mount_point", "Unknown"))
+                + "</td></tr>"
+            )
+            html.append(
+                "            <tr><td>Mount Options</td><td>"
+                + str(fs_info.get("mount_options", "Unknown"))
+                + "</td></tr>"
+            )
+            html.append("        </table>")
+            html.append("    </div>")
+
+            # Test Configuration Section
+            if self.results_data:
+                first_result = self.results_data[0]
+                config = first_result.get("config", {})
+
+                html.append("    <div class='section'>")
+                html.append("        <h2>⚙️ AI Test Configuration</h2>")
+                html.append("        <table class='config-table'>")
+                html.append(
+                    f"            <tr><td>Vector Dataset Size</td><td>{config.get('vector_dataset_size', 'N/A'):,} vectors</td></tr>"
+                )
+                html.append(
+                    f"            <tr><td>Vector Dimensions</td><td>{config.get('vector_dimensions', 'N/A')}</td></tr>"
+                )
+                html.append(
+                    f"            <tr><td>Index Type</td><td>{config.get('index_type', 'N/A')}</td></tr>"
+                )
+                html.append(
+                    f"            <tr><td>Benchmark Iterations</td><td>{len(self.results_data)}</td></tr>"
+                )
+
+                # Add index-specific parameters
+                if config.get("index_type") == "HNSW":
+                    html.append(
+                        f"            <tr><td>HNSW M Parameter</td><td>{config.get('hnsw_m', 'N/A')}</td></tr>"
+                    )
+                    html.append(
+                        f"            <tr><td>HNSW ef Construction</td><td>{config.get('hnsw_ef_construction', 'N/A')}</td></tr>"
+                    )
+                    html.append(
+                        f"            <tr><td>HNSW ef Search</td><td>{config.get('hnsw_ef', 'N/A')}</td></tr>"
+                    )
+                elif config.get("index_type") == "IVF_FLAT":
+                    html.append(
+                        f"            <tr><td>IVF nlist</td><td>{config.get('ivf_nlist', 'N/A')}</td></tr>"
+                    )
+                    html.append(
+                        f"            <tr><td>IVF nprobe</td><td>{config.get('ivf_nprobe', 'N/A')}</td></tr>"
+                    )
+
+                html.append("        </table>")
+                html.append("    </div>")
+
+            # Performance Results Section
+            html.append("    <div class='section'>")
+            html.append("        <h2>📊 Performance Results Summary</h2>")
+
+            if self.results_data:
+                # Insert performance
+                insert_times = [
+                    r.get("insert_performance", {}).get("total_time_seconds", 0)
+                    for r in self.results_data
+                ]
+                insert_rates = [
+                    r.get("insert_performance", {}).get("vectors_per_second", 0)
+                    for r in self.results_data
+                ]
+
+                if insert_times and any(t > 0 for t in insert_times):
+                    html.append("        <h3>📈 Vector Insert Performance</h3>")
+                    html.append("        <table class='metric-table'>")
+                    html.append(
+                        f"            <tr><td>Average Insert Time</td><td>{np.mean(insert_times):.2f} seconds</td></tr>"
+                    )
+                    html.append(
+                        f"            <tr><td>Average Insert Rate</td><td>{np.mean(insert_rates):.2f} vectors/sec</td></tr>"
+                    )
+                    html.append(
+                        f"            <tr><td>Insert Rate Range</td><td>{np.min(insert_rates):.2f} - {np.max(insert_rates):.2f} vectors/sec</td></tr>"
+                    )
+                    html.append("        </table>")
+
+                # Index performance
+                index_times = [
+                    r.get("index_performance", {}).get("creation_time_seconds", 0)
+                    for r in self.results_data
+                ]
+                if index_times and any(t > 0 for t in index_times):
+                    html.append("        <h3>🔗 Index Creation Performance</h3>")
+                    html.append("        <table class='metric-table'>")
+                    html.append(
+                        f"            <tr><td>Average Index Creation Time</td><td>{np.mean(index_times):.2f} seconds</td></tr>"
+                    )
+                    html.append(
+                        f"            <tr><td>Index Time Range</td><td>{np.min(index_times):.2f} - {np.max(index_times):.2f} seconds</td></tr>"
+                    )
+                    html.append("        </table>")
+
+                # Query performance
+                html.append("        <h3>🔍 Query Performance</h3>")
+                first_query_perf = self.results_data[0].get("query_performance", {})
+                if first_query_perf:
+                    html.append("        <table>")
+                    html.append(
+                        "            <tr><th>Query Type</th><th>Batch Size</th><th>QPS</th><th>Avg Latency (ms)</th></tr>"
+                    )
+
+                    for topk, topk_data in first_query_perf.items():
+                        for batch, batch_data in topk_data.items():
+                            qps = batch_data.get("queries_per_second", 0)
+                            avg_time = batch_data.get("average_time_seconds", 0) * 1000
+
+                            # Color coding for performance
+                            qps_class = ""
+                            if qps > 1000:
+                                qps_class = "performance-good"
+                            elif qps > 100:
+                                qps_class = "performance-warning"
+                            else:
+                                qps_class = "performance-poor"
+
+                            html.append(f"            <tr>")
+                            html.append(
+                                f"                <td>{topk.replace('topk_', 'Top-')}</td>"
+                            )
+                            html.append(
+                                f"                <td>{batch.replace('batch_', 'Batch ')}</td>"
+                            )
+                            html.append(
+                                f"                <td class='{qps_class}'>{qps:.2f}</td>"
+                            )
+                            html.append(f"                <td>{avg_time:.2f}</td>")
+                            html.append(f"            </tr>")
+
+                    html.append("        </table>")
+
+                html.append("    </div>")
+
+            # Footer
+            html.append("    <div class='section'>")
+            html.append("        <h2>📝 Notes</h2>")
+            html.append("        <ul>")
+            html.append(
+                "            <li>This report was generated automatically by the AI benchmark analysis tool</li>"
+            )
+            html.append(
+                "            <li>Performance metrics are averaged across all benchmark iterations</li>"
+            )
+            html.append(
+                "            <li>QPS (Queries Per Second) values are color-coded: <span class='performance-good'>Green (>1000)</span>, <span class='performance-warning'>Orange (100-1000)</span>, <span class='performance-poor'>Red (<100)</span></li>"
+            )
+            html.append(
+                "            <li>Storage device information may require root privileges to display NVMe details</li>"
+            )
+            html.append("        </ul>")
+            html.append("    </div>")
+
+            html.append("</body>")
+            html.append("</html>")
+
+            return "\n".join(html)
+
+        except Exception as e:
+            self.logger.error(f"Error generating HTML report: {e}")
+            return (
+                f"<html><body><h1>Error generating HTML report: {e}</h1></body></html>"
+            )
+
+    def generate_graphs(self) -> bool:
+        """Generate performance visualization graphs"""
+        if not GRAPHING_AVAILABLE:
+            self.logger.warning(
+                "Graphing libraries not available, skipping graph generation"
+            )
+            return False
+
+        try:
+            # Set matplotlib style
+            if self.config.get("graph_theme", "default") != "default":
+                plt.style.use(self.config["graph_theme"])
+
+            # Graph 1: Insert Performance
+            self._plot_insert_performance()
+
+            # Graph 2: Query Performance by Top-K
+            self._plot_query_performance()
+
+            # Graph 3: Index Creation Time
+            self._plot_index_performance()
+
+            # Graph 4: Performance Comparison Matrix
+            self._plot_performance_matrix()
+
+            self.logger.info("Graphs generated successfully")
+            return True
+
+        except Exception as e:
+            self.logger.error(f"Error generating graphs: {e}")
+            return False
+
+    def _plot_insert_performance(self):
+        """Plot insert performance metrics"""
+        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
+
+        # Extract insert data
+        iterations = []
+        insert_rates = []
+        insert_times = []
+
+        for i, result in enumerate(self.results_data):
+            insert_perf = result.get("insert_performance", {})
+            if insert_perf:
+                iterations.append(i + 1)
+                insert_rates.append(insert_perf.get("vectors_per_second", 0))
+                insert_times.append(insert_perf.get("total_time_seconds", 0))
+
+        # Plot insert rate
+        ax1.plot(iterations, insert_rates, "b-o", linewidth=2, markersize=6)
+        ax1.set_xlabel("Iteration")
+        ax1.set_ylabel("Vectors/Second")
+        ax1.set_title("Vector Insert Rate Performance")
+        ax1.grid(True, alpha=0.3)
+
+        # Plot insert time
+        ax2.plot(iterations, insert_times, "r-o", linewidth=2, markersize=6)
+        ax2.set_xlabel("Iteration")
+        ax2.set_ylabel("Total Time (seconds)")
+        ax2.set_title("Vector Insert Time Performance")
+        ax2.grid(True, alpha=0.3)
+
+        plt.tight_layout()
+        output_file = os.path.join(
+            self.output_dir,
+            f"insert_performance.{self.config.get('graph_format', 'png')}",
+        )
+        plt.savefig(
+            output_file, dpi=self.config.get("graph_dpi", 300), bbox_inches="tight"
+        )
+        plt.close()
+
+    def _plot_query_performance(self):
+        """Plot query performance metrics"""
+        if not self.results_data:
+            return
+
+        # Collect query performance data
+        query_data = []
+        for result in self.results_data:
+            query_perf = result.get("query_performance", {})
+            for topk, topk_data in query_perf.items():
+                for batch, batch_data in topk_data.items():
+                    query_data.append(
+                        {
+                            "topk": topk.replace("topk_", ""),
+                            "batch": batch.replace("batch_", ""),
+                            "qps": batch_data.get("queries_per_second", 0),
+                            "avg_time": batch_data.get("average_time_seconds", 0)
+                            * 1000,  # Convert to ms
+                        }
+                    )
+
+        if not query_data:
+            return
+
+        df = pd.DataFrame(query_data)
+
+        # Create subplots
+        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
+
+        # QPS heatmap
+        qps_pivot = df.pivot_table(
+            values="qps", index="topk", columns="batch", aggfunc="mean"
+        )
+        sns.heatmap(qps_pivot, annot=True, fmt=".1f", ax=ax1, cmap="YlOrRd")
+        ax1.set_title("Queries Per Second (QPS)")
+        ax1.set_xlabel("Batch Size")
+        ax1.set_ylabel("Top-K")
+
+        # Latency heatmap
+        latency_pivot = df.pivot_table(
+            values="avg_time", index="topk", columns="batch", aggfunc="mean"
+        )
+        sns.heatmap(latency_pivot, annot=True, fmt=".1f", ax=ax2, cmap="YlOrRd")
+        ax2.set_title("Average Query Latency (ms)")
+        ax2.set_xlabel("Batch Size")
+        ax2.set_ylabel("Top-K")
+
+        plt.tight_layout()
+        output_file = os.path.join(
+            self.output_dir,
+            f"query_performance.{self.config.get('graph_format', 'png')}",
+        )
+        plt.savefig(
+            output_file, dpi=self.config.get("graph_dpi", 300), bbox_inches="tight"
+        )
+        plt.close()
+
+    def _plot_index_performance(self):
+        """Plot index creation performance"""
+        iterations = []
+        index_times = []
+
+        for i, result in enumerate(self.results_data):
+            index_perf = result.get("index_performance", {})
+            if index_perf:
+                iterations.append(i + 1)
+                index_times.append(index_perf.get("creation_time_seconds", 0))
+
+        if not index_times:
+            return
+
+        plt.figure(figsize=(10, 6))
+        plt.bar(iterations, index_times, alpha=0.7, color="green")
+        plt.xlabel("Iteration")
+        plt.ylabel("Index Creation Time (seconds)")
+        plt.title("Index Creation Performance")
+        plt.grid(True, alpha=0.3)
+
+        # Add average line
+        avg_time = np.mean(index_times)
+        plt.axhline(
+            y=avg_time, color="red", linestyle="--", label=f"Average: {avg_time:.2f}s"
+        )
+        plt.legend()
+
+        output_file = os.path.join(
+            self.output_dir,
+            f"index_performance.{self.config.get('graph_format', 'png')}",
+        )
+        plt.savefig(
+            output_file, dpi=self.config.get("graph_dpi", 300), bbox_inches="tight"
+        )
+        plt.close()
+
+    def _plot_performance_matrix(self):
+        """Plot comprehensive performance comparison matrix"""
+        if len(self.results_data) < 2:
+            return
+
+        # Extract key metrics for comparison
+        metrics = []
+        for i, result in enumerate(self.results_data):
+            insert_perf = result.get("insert_performance", {})
+            index_perf = result.get("index_performance", {})
+
+            metric = {
+                "iteration": i + 1,
+                "insert_rate": insert_perf.get("vectors_per_second", 0),
+                "index_time": index_perf.get("creation_time_seconds", 0),
+            }
+
+            # Add query metrics
+            query_perf = result.get("query_performance", {})
+            if "topk_10" in query_perf and "batch_1" in query_perf["topk_10"]:
+                metric["query_qps"] = query_perf["topk_10"]["batch_1"].get(
+                    "queries_per_second", 0
+                )
+
+            metrics.append(metric)
+
+        df = pd.DataFrame(metrics)
+
+        # Normalize metrics for comparison
+        numeric_cols = ["insert_rate", "index_time", "query_qps"]
+        for col in numeric_cols:
+            if col in df.columns:
+                df[f"{col}_norm"] = (df[col] - df[col].min()) / (
+                    df[col].max() - df[col].min() + 1e-6
+                )
+
+        # Create radar chart
+        fig, ax = plt.subplots(figsize=(10, 8), subplot_kw=dict(projection="polar"))
+
+        angles = np.linspace(0, 2 * np.pi, len(numeric_cols), endpoint=False).tolist()
+        angles += angles[:1]  # Complete the circle
+
+        for i, row in df.iterrows():
+            values = [row.get(f"{col}_norm", 0) for col in numeric_cols]
+            values += values[:1]  # Complete the circle
+
+            ax.plot(
+                angles, values, "o-", linewidth=2, label=f'Iteration {row["iteration"]}'
+            )
+            ax.fill(angles, values, alpha=0.25)
+
+        ax.set_xticks(angles[:-1])
+        ax.set_xticklabels(["Insert Rate", "Index Time (inv)", "Query QPS"])
+        ax.set_ylim(0, 1)
+        ax.set_title("Performance Comparison Matrix (Normalized)", y=1.08)
+        ax.legend(loc="upper right", bbox_to_anchor=(1.3, 1.0))
+
+        output_file = os.path.join(
+            self.output_dir,
+            f"performance_matrix.{self.config.get('graph_format', 'png')}",
+        )
+        plt.savefig(
+            output_file, dpi=self.config.get("graph_dpi", 300), bbox_inches="tight"
+        )
+        plt.close()
+
+    def analyze(self) -> bool:
+        """Run complete analysis"""
+        self.logger.info("Starting results analysis...")
+
+        if not self.load_results():
+            return False
+
+        # Generate summary report
+        summary = self.generate_summary_report()
+        summary_file = os.path.join(self.output_dir, "benchmark_summary.txt")
+        with open(summary_file, "w") as f:
+            f.write(summary)
+        self.logger.info(f"Summary report saved to {summary_file}")
+
+        # Generate HTML report
+        html_report = self.generate_html_report()
+        html_file = os.path.join(self.output_dir, "benchmark_report.html")
+        with open(html_file, "w") as f:
+            f.write(html_report)
+        self.logger.info(f"HTML report saved to {html_file}")
+
+        # Generate graphs if enabled
+        if self.config.get("enable_graphing", True):
+            self.generate_graphs()
+
+        # Create consolidated JSON report
+        consolidated_file = os.path.join(self.output_dir, "consolidated_results.json")
+        with open(consolidated_file, "w") as f:
+            json.dump(
+                {
+                    "summary": summary.split("\n"),
+                    "raw_results": self.results_data,
+                    "analysis_timestamp": datetime.now().isoformat(),
+                    "system_info": self.system_info,
+                },
+                f,
+                indent=2,
+            )
+
+        self.logger.info("Analysis completed successfully")
+        return True
+
+
+def main():
+    parser = argparse.ArgumentParser(description="Analyze AI benchmark results")
+    parser.add_argument(
+        "--results-dir", required=True, help="Directory containing result files"
+    )
+    parser.add_argument(
+        "--output-dir", required=True, help="Directory for analysis output"
+    )
+    parser.add_argument("--config", help="Analysis configuration file (JSON)")
+
+    args = parser.parse_args()
+
+    # Load configuration
+    config = {
+        "enable_graphing": True,
+        "graph_format": "png",
+        "graph_dpi": 300,
+        "graph_theme": "default",
+    }
+
+    if args.config:
+        try:
+            with open(args.config, "r") as f:
+                config.update(json.load(f))
+        except Exception as e:
+            print(f"Error loading config file: {e}")
+
+    # Run analysis
+    analyzer = ResultsAnalyzer(args.results_dir, args.output_dir, config)
+    success = analyzer.analyze()
+
+    return 0 if success else 1
+
+
+if __name__ == "__main__":
+    sys.exit(main())
diff --git a/workflows/ai/scripts/generate_graphs.py b/workflows/ai/scripts/generate_graphs.py
new file mode 100755
index 00000000..2e183e86
--- /dev/null
+++ b/workflows/ai/scripts/generate_graphs.py
@@ -0,0 +1,1174 @@
+#!/usr/bin/env python3
+"""
+Generate graphs and analysis for AI benchmark results
+"""
+
+import json
+import os
+import sys
+import glob
+import numpy as np
+import matplotlib
+
+matplotlib.use("Agg")  # Use non-interactive backend
+import matplotlib.pyplot as plt
+from datetime import datetime
+from pathlib import Path
+from collections import defaultdict
+
+
+def load_results(results_dir):
+    """Load all JSON result files from the directory"""
+    results = []
+    # Only load results_*.json files, not consolidated or other JSON files
+    json_files = glob.glob(os.path.join(results_dir, "results_*.json"))
+
+    for json_file in json_files:
+        try:
+            with open(json_file, "r") as f:
+                data = json.load(f)
+                # Extract filesystem info - prefer from JSON data over filename
+                filename = os.path.basename(json_file)
+
+                # First, try to get filesystem from the JSON data itself
+                fs_type = data.get("filesystem", None)
+
+                # If not in JSON, try to parse from filename (backwards compatibility)
+                if not fs_type:
+                    parts = (
+                        filename.replace("results_", "").replace(".json", "").split("-")
+                    )
+
+                    # Parse host info
+                    if "debian13-ai-" in filename:
+                        host_parts = (
+                            filename.replace("results_debian13-ai-", "")
+                            .replace("_1.json", "")
+                            .replace("_2.json", "")
+                            .replace("_3.json", "")
+                            .split("-")
+                        )
+                        if "xfs" in host_parts[0]:
+                            fs_type = "xfs"
+                            # Extract block size (e.g., "4k", "16k", etc.)
+                            block_size = (
+                                host_parts[1] if len(host_parts) > 1 else "unknown"
+                            )
+                        elif "ext4" in host_parts[0]:
+                            fs_type = "ext4"
+                            block_size = host_parts[1] if len(host_parts) > 1 else "4k"
+                        elif "btrfs" in host_parts[0]:
+                            fs_type = "btrfs"
+                            block_size = "default"
+                        else:
+                            fs_type = "unknown"
+                            block_size = "unknown"
+                    else:
+                        fs_type = "unknown"
+                        block_size = "unknown"
+                else:
+                    # If filesystem came from JSON, set appropriate block size
+                    if fs_type == "btrfs":
+                        block_size = "default"
+                    elif fs_type in ["ext4", "xfs"]:
+                        block_size = data.get("block_size", "4k")
+                    else:
+                        block_size = data.get("block_size", "default")
+
+                is_dev = "dev" in filename
+
+                # Use filesystem from JSON if available, otherwise use parsed value
+                if "filesystem" not in data:
+                    data["filesystem"] = fs_type
+                data["block_size"] = block_size
+                data["is_dev"] = is_dev
+                data["filename"] = filename
+
+                results.append(data)
+        except Exception as e:
+            print(f"Error loading {json_file}: {e}")
+
+    return results
+
+
+def create_filesystem_comparison_chart(results, output_dir):
+    """Create a bar chart comparing performance across filesystems"""
+    # Group by filesystem and baseline/dev
+    fs_data = defaultdict(lambda: {"baseline": [], "dev": []})
+
+    for result in results:
+        fs = result.get("filesystem", "unknown")
+        category = "dev" if result.get("is_dev", False) else "baseline"
+
+        # Extract actual performance data from results
+        if "insert_performance" in result:
+            insert_qps = result["insert_performance"].get("vectors_per_second", 0)
+        else:
+            insert_qps = 0
+        fs_data[fs][category].append(insert_qps)
+
+    # Prepare data for plotting
+    filesystems = list(fs_data.keys())
+    baseline_means = [
+        np.mean(fs_data[fs]["baseline"]) if fs_data[fs]["baseline"] else 0
+        for fs in filesystems
+    ]
+    dev_means = [
+        np.mean(fs_data[fs]["dev"]) if fs_data[fs]["dev"] else 0 for fs in filesystems
+    ]
+
+    x = np.arange(len(filesystems))
+    width = 0.35
+
+    fig, ax = plt.subplots(figsize=(10, 6))
+    baseline_bars = ax.bar(
+        x - width / 2, baseline_means, width, label="Baseline", color="#1f77b4"
+    )
+    dev_bars = ax.bar(
+        x + width / 2, dev_means, width, label="Development", color="#ff7f0e"
+    )
+
+    ax.set_xlabel("Filesystem")
+    ax.set_ylabel("Insert QPS")
+    ax.set_title("Vector Database Performance by Filesystem")
+    ax.set_xticks(x)
+    ax.set_xticklabels(filesystems)
+    ax.legend()
+    ax.grid(True, alpha=0.3)
+
+    # Add value labels on bars
+    for bars in [baseline_bars, dev_bars]:
+        for bar in bars:
+            height = bar.get_height()
+            if height > 0:
+                ax.annotate(
+                    f"{height:.0f}",
+                    xy=(bar.get_x() + bar.get_width() / 2, height),
+                    xytext=(0, 3),
+                    textcoords="offset points",
+                    ha="center",
+                    va="bottom",
+                )
+
+    plt.tight_layout()
+    plt.savefig(os.path.join(output_dir, "filesystem_comparison.png"), dpi=150)
+    plt.close()
+
+
+def create_block_size_analysis(results, output_dir):
+    """Create analysis for different block sizes (XFS specific)"""
+    # Filter XFS results
+    xfs_results = [r for r in results if r.get("filesystem") == "xfs"]
+
+    if not xfs_results:
+        return
+
+    # Group by block size
+    block_size_data = defaultdict(lambda: {"baseline": [], "dev": []})
+
+    for result in xfs_results:
+        block_size = result.get("block_size", "unknown")
+        category = "dev" if result.get("is_dev", False) else "baseline"
+        if "insert_performance" in result:
+            insert_qps = result["insert_performance"].get("vectors_per_second", 0)
+        else:
+            insert_qps = 0
+        block_size_data[block_size][category].append(insert_qps)
+
+    # Sort block sizes
+    block_sizes = sorted(
+        block_size_data.keys(),
+        key=lambda x: (
+            int(x.replace("k", "").replace("s", ""))
+            if x not in ["unknown", "default"]
+            else 0
+        ),
+    )
+
+    # Create grouped bar chart
+    baseline_means = [
+        (
+            np.mean(block_size_data[bs]["baseline"])
+            if block_size_data[bs]["baseline"]
+            else 0
+        )
+        for bs in block_sizes
+    ]
+    dev_means = [
+        np.mean(block_size_data[bs]["dev"]) if block_size_data[bs]["dev"] else 0
+        for bs in block_sizes
+    ]
+
+    x = np.arange(len(block_sizes))
+    width = 0.35
+
+    fig, ax = plt.subplots(figsize=(12, 6))
+    baseline_bars = ax.bar(
+        x - width / 2, baseline_means, width, label="Baseline", color="#2ca02c"
+    )
+    dev_bars = ax.bar(
+        x + width / 2, dev_means, width, label="Development", color="#d62728"
+    )
+
+    ax.set_xlabel("Block Size")
+    ax.set_ylabel("Insert QPS")
+    ax.set_title("XFS Performance by Block Size")
+    ax.set_xticks(x)
+    ax.set_xticklabels(block_sizes)
+    ax.legend()
+    ax.grid(True, alpha=0.3)
+
+    # Add value labels
+    for bars in [baseline_bars, dev_bars]:
+        for bar in bars:
+            height = bar.get_height()
+            if height > 0:
+                ax.annotate(
+                    f"{height:.0f}",
+                    xy=(bar.get_x() + bar.get_width() / 2, height),
+                    xytext=(0, 3),
+                    textcoords="offset points",
+                    ha="center",
+                    va="bottom",
+                )
+
+    plt.tight_layout()
+    plt.savefig(os.path.join(output_dir, "xfs_block_size_analysis.png"), dpi=150)
+    plt.close()
+
+
+def create_heatmap_analysis(results, output_dir):
+    """Create a heatmap showing AVERAGE performance across all test iterations"""
+    # Group data by configuration and version, collecting ALL values for averaging
+    config_data = defaultdict(
+        lambda: {
+            "baseline": {"insert": [], "query": [], "count": 0},
+            "dev": {"insert": [], "query": [], "count": 0},
+        }
+    )
+
+    for result in results:
+        fs = result.get("filesystem", "unknown")
+        block_size = result.get("block_size", "default")
+        config = f"{fs}-{block_size}"
+        version = "dev" if result.get("is_dev", False) else "baseline"
+
+        # Get actual insert performance
+        if "insert_performance" in result:
+            insert_qps = result["insert_performance"].get("vectors_per_second", 0)
+        else:
+            insert_qps = 0
+
+        # Calculate average query QPS
+        query_qps = 0
+        if "query_performance" in result:
+            qp = result["query_performance"]
+            total_qps = 0
+            count = 0
+            for topk_key in ["topk_1", "topk_10", "topk_100"]:
+                if topk_key in qp:
+                    for batch_key in ["batch_1", "batch_10", "batch_100"]:
+                        if batch_key in qp[topk_key]:
+                            total_qps += qp[topk_key][batch_key].get(
+                                "queries_per_second", 0
+                            )
+                            count += 1
+            if count > 0:
+                query_qps = total_qps / count
+
+        # Collect all values for averaging
+        config_data[config][version]["insert"].append(insert_qps)
+        config_data[config][version]["query"].append(query_qps)
+        config_data[config][version]["count"] += 1
+
+    # Sort configurations
+    configs = sorted(config_data.keys())
+
+    # Calculate averages for heatmap
+    insert_baseline = []
+    insert_dev = []
+    query_baseline = []
+    query_dev = []
+    iteration_counts = {"baseline": 0, "dev": 0}
+
+    for c in configs:
+        # Calculate average insert QPS
+        baseline_insert_vals = config_data[c]["baseline"]["insert"]
+        insert_baseline.append(
+            np.mean(baseline_insert_vals) if baseline_insert_vals else 0
+        )
+
+        dev_insert_vals = config_data[c]["dev"]["insert"]
+        insert_dev.append(np.mean(dev_insert_vals) if dev_insert_vals else 0)
+
+        # Calculate average query QPS
+        baseline_query_vals = config_data[c]["baseline"]["query"]
+        query_baseline.append(
+            np.mean(baseline_query_vals) if baseline_query_vals else 0
+        )
+
+        dev_query_vals = config_data[c]["dev"]["query"]
+        query_dev.append(np.mean(dev_query_vals) if dev_query_vals else 0)
+
+        # Track iteration counts
+        iteration_counts["baseline"] = max(
+            iteration_counts["baseline"], len(baseline_insert_vals)
+        )
+        iteration_counts["dev"] = max(iteration_counts["dev"], len(dev_insert_vals))
+
+    # Create figure with custom heatmap
+    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 8))
+
+    # Create data matrices
+    insert_data = np.array([insert_baseline, insert_dev]).T
+    query_data = np.array([query_baseline, query_dev]).T
+
+    # Insert QPS heatmap
+    im1 = ax1.imshow(insert_data, cmap="YlOrRd", aspect="auto")
+    ax1.set_xticks([0, 1])
+    ax1.set_xticklabels(["Baseline", "Development"])
+    ax1.set_yticks(range(len(configs)))
+    ax1.set_yticklabels(configs)
+    ax1.set_title(
+        f"Insert Performance - AVERAGE across {iteration_counts['baseline']} iterations\n(1M vectors, 128 dims, HNSW index)"
+    )
+    ax1.set_ylabel("Configuration")
+
+    # Add text annotations with dynamic color based on background
+    # Get the colormap to determine actual colors
+    cmap1 = plt.cm.YlOrRd
+    norm1 = plt.Normalize(vmin=insert_data.min(), vmax=insert_data.max())
+
+    for i in range(len(configs)):
+        for j in range(2):
+            # Get the actual color from the colormap
+            val = insert_data[i, j]
+            rgba = cmap1(norm1(val))
+            # Calculate luminance using standard formula
+            # Perceived luminance: 0.299*R + 0.587*G + 0.114*B
+            luminance = 0.299 * rgba[0] + 0.587 * rgba[1] + 0.114 * rgba[2]
+            # Use white text on dark backgrounds (low luminance)
+            text_color = "white" if luminance < 0.5 else "black"
+
+            # Show average value with indicator
+            text = ax1.text(
+                j,
+                i,
+                f"{int(insert_data[i, j])}\n(avg)",
+                ha="center",
+                va="center",
+                color=text_color,
+                fontweight="bold",
+                fontsize=9,
+            )
+
+    # Add colorbar
+    cbar1 = plt.colorbar(im1, ax=ax1)
+    cbar1.set_label("Insert QPS")
+
+    # Query QPS heatmap
+    im2 = ax2.imshow(query_data, cmap="YlGnBu", aspect="auto")
+    ax2.set_xticks([0, 1])
+    ax2.set_xticklabels(["Baseline", "Development"])
+    ax2.set_yticks(range(len(configs)))
+    ax2.set_yticklabels(configs)
+    ax2.set_title(
+        f"Query Performance - AVERAGE across {iteration_counts['dev']} iterations\n(1M vectors, 128 dims, HNSW index)"
+    )
+
+    # Add text annotations with dynamic color based on background
+    # Get the colormap to determine actual colors
+    cmap2 = plt.cm.YlGnBu
+    norm2 = plt.Normalize(vmin=query_data.min(), vmax=query_data.max())
+
+    for i in range(len(configs)):
+        for j in range(2):
+            # Get the actual color from the colormap
+            val = query_data[i, j]
+            rgba = cmap2(norm2(val))
+            # Calculate luminance using standard formula
+            # Perceived luminance: 0.299*R + 0.587*G + 0.114*B
+            luminance = 0.299 * rgba[0] + 0.587 * rgba[1] + 0.114 * rgba[2]
+            # Use white text on dark backgrounds (low luminance)
+            text_color = "white" if luminance < 0.5 else "black"
+
+            # Show average value with indicator
+            text = ax2.text(
+                j,
+                i,
+                f"{int(query_data[i, j])}\n(avg)",
+                ha="center",
+                va="center",
+                color=text_color,
+                fontweight="bold",
+                fontsize=9,
+            )
+
+    # Add colorbar
+    cbar2 = plt.colorbar(im2, ax=ax2)
+    cbar2.set_label("Query QPS")
+
+    # Add overall figure title
+    fig.suptitle(
+        "Performance Heatmap - Showing AVERAGES across Multiple Test Iterations",
+        fontsize=14,
+        fontweight="bold",
+        y=1.02,
+    )
+
+    plt.tight_layout()
+    plt.savefig(
+        os.path.join(output_dir, "performance_heatmap.png"),
+        dpi=150,
+        bbox_inches="tight",
+    )
+    plt.close()
+
+
+def create_performance_trends(results, output_dir):
+    """Create line charts showing performance trends"""
+    # Group by filesystem type
+    fs_types = defaultdict(
+        lambda: {
+            "configs": [],
+            "baseline_insert": [],
+            "dev_insert": [],
+            "baseline_query": [],
+            "dev_query": [],
+        }
+    )
+
+    for result in results:
+        fs = result.get("filesystem", "unknown")
+        block_size = result.get("block_size", "default")
+        config = f"{block_size}"
+
+        if config not in fs_types[fs]["configs"]:
+            fs_types[fs]["configs"].append(config)
+            fs_types[fs]["baseline_insert"].append(0)
+            fs_types[fs]["dev_insert"].append(0)
+            fs_types[fs]["baseline_query"].append(0)
+            fs_types[fs]["dev_query"].append(0)
+
+        idx = fs_types[fs]["configs"].index(config)
+
+        # Calculate average query QPS from all test configurations
+        query_qps = 0
+        if "query_performance" in result:
+            qp = result["query_performance"]
+            total_qps = 0
+            count = 0
+            for topk_key in ["topk_1", "topk_10", "topk_100"]:
+                if topk_key in qp:
+                    for batch_key in ["batch_1", "batch_10", "batch_100"]:
+                        if batch_key in qp[topk_key]:
+                            total_qps += qp[topk_key][batch_key].get(
+                                "queries_per_second", 0
+                            )
+                            count += 1
+            if count > 0:
+                query_qps = total_qps / count
+
+        if result.get("is_dev", False):
+            if "insert_performance" in result:
+                fs_types[fs]["dev_insert"][idx] = result["insert_performance"].get(
+                    "vectors_per_second", 0
+                )
+            fs_types[fs]["dev_query"][idx] = query_qps
+        else:
+            if "insert_performance" in result:
+                fs_types[fs]["baseline_insert"][idx] = result["insert_performance"].get(
+                    "vectors_per_second", 0
+                )
+            fs_types[fs]["baseline_query"][idx] = query_qps
+
+    # Create separate plots for each filesystem
+    for fs, data in fs_types.items():
+        if not data["configs"]:
+            continue
+
+        fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 10))
+
+        x = range(len(data["configs"]))
+
+        # Insert performance
+        ax1.plot(
+            x,
+            data["baseline_insert"],
+            "o-",
+            label="Baseline",
+            linewidth=2,
+            markersize=8,
+        )
+        ax1.plot(
+            x, data["dev_insert"], "s-", label="Development", linewidth=2, markersize=8
+        )
+        ax1.set_xlabel("Configuration")
+        ax1.set_ylabel("Insert QPS")
+        ax1.set_title(f"{fs.upper()} Insert Performance")
+        ax1.set_xticks(x)
+        ax1.set_xticklabels(data["configs"])
+        ax1.legend()
+        ax1.grid(True, alpha=0.3)
+
+        # Query performance
+        ax2.plot(
+            x, data["baseline_query"], "o-", label="Baseline", linewidth=2, markersize=8
+        )
+        ax2.plot(
+            x, data["dev_query"], "s-", label="Development", linewidth=2, markersize=8
+        )
+        ax2.set_xlabel("Configuration")
+        ax2.set_ylabel("Query QPS")
+        ax2.set_title(f"{fs.upper()} Query Performance")
+        ax2.set_xticks(x)
+        ax2.set_xticklabels(data["configs"])
+        ax2.legend()
+        ax2.grid(True, alpha=0.3)
+
+        plt.tight_layout()
+        plt.savefig(os.path.join(output_dir, f"{fs}_performance_trends.png"), dpi=150)
+        plt.close()
+
+
+def create_simple_performance_trends(results, output_dir):
+    """Create a simple performance trends chart for basic Milvus testing"""
+    if not results:
+        return
+
+    # Extract configuration from first result for display
+    config_text = ""
+    if results:
+        first_result = results[0]
+        if "config" in first_result:
+            cfg = first_result["config"]
+            config_text = (
+                f"Test Config:\n"
+                f"• {cfg.get('vector_dataset_size', 'N/A'):,} vectors/iteration\n"
+                f"• {cfg.get('vector_dimensions', 'N/A')} dimensions\n"
+                f"• {cfg.get('index_type', 'N/A')} index"
+            )
+
+    # Separate baseline and dev results
+    baseline_results = [r for r in results if not r.get("is_dev", False)]
+    dev_results = [r for r in results if r.get("is_dev", False)]
+
+    if not baseline_results and not dev_results:
+        return
+
+    fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 10))
+
+    # Prepare data
+    baseline_insert = []
+    baseline_query = []
+    dev_insert = []
+    dev_query = []
+    labels = []
+
+    # Process baseline results
+    for i, result in enumerate(baseline_results):
+        if "insert_performance" in result:
+            baseline_insert.append(
+                result["insert_performance"].get("vectors_per_second", 0)
+            )
+        else:
+            baseline_insert.append(0)
+
+        # Calculate average query QPS
+        query_qps = 0
+        if "query_performance" in result:
+            qp = result["query_performance"]
+            total_qps = 0
+            count = 0
+            for topk_key in ["topk_1", "topk_10", "topk_100"]:
+                if topk_key in qp:
+                    for batch_key in ["batch_1", "batch_10", "batch_100"]:
+                        if batch_key in qp[topk_key]:
+                            total_qps += qp[topk_key][batch_key].get(
+                                "queries_per_second", 0
+                            )
+                            count += 1
+            if count > 0:
+                query_qps = total_qps / count
+        baseline_query.append(query_qps)
+        labels.append(f"Iteration {i+1}")
+
+    # Process dev results
+    for result in dev_results:
+        if "insert_performance" in result:
+            dev_insert.append(result["insert_performance"].get("vectors_per_second", 0))
+        else:
+            dev_insert.append(0)
+
+        query_qps = 0
+        if "query_performance" in result:
+            qp = result["query_performance"]
+            total_qps = 0
+            count = 0
+            for topk_key in ["topk_1", "topk_10", "topk_100"]:
+                if topk_key in qp:
+                    for batch_key in ["batch_1", "batch_10", "batch_100"]:
+                        if batch_key in qp[topk_key]:
+                            total_qps += qp[topk_key][batch_key].get(
+                                "queries_per_second", 0
+                            )
+                            count += 1
+            if count > 0:
+                query_qps = total_qps / count
+        dev_query.append(query_qps)
+
+    x = range(len(baseline_results) if baseline_results else len(dev_results))
+
+    # Insert performance - with visible markers for all points
+    if baseline_insert:
+        # Line plot with smaller markers
+        ax1.plot(
+            x,
+            baseline_insert,
+            "-",
+            label="Baseline",
+            linewidth=1.5,
+            color="blue",
+            alpha=0.6,
+        )
+        # Add distinct markers for each point
+        ax1.scatter(
+            x,
+            baseline_insert,
+            s=30,
+            color="blue",
+            alpha=0.8,
+            edgecolors="darkblue",
+            linewidth=0.5,
+            zorder=5,
+        )
+    if dev_insert:
+        # Line plot with smaller markers
+        ax1.plot(
+            x[: len(dev_insert)],
+            dev_insert,
+            "-",
+            label="Development",
+            linewidth=1.5,
+            color="red",
+            alpha=0.6,
+        )
+        # Add distinct markers for each point
+        ax1.scatter(
+            x[: len(dev_insert)],
+            dev_insert,
+            s=30,
+            color="red",
+            alpha=0.8,
+            edgecolors="darkred",
+            linewidth=0.5,
+            marker="s",
+            zorder=5,
+        )
+    ax1.set_xlabel("Test Iteration (same configuration, repeated for reliability)")
+    ax1.set_ylabel("Insert QPS (vectors/second)")
+    ax1.set_title("Milvus Insert Performance")
+
+    # Handle x-axis labels to prevent overlap
+    num_points = len(x)
+    if num_points > 20:
+        # Show every 5th label for many iterations
+        step = 5
+        tick_positions = list(range(0, num_points, step))
+        tick_labels = [
+            labels[i] if labels else f"Iteration {i+1}" for i in tick_positions
+        ]
+        ax1.set_xticks(tick_positions)
+        ax1.set_xticklabels(tick_labels, rotation=45, ha="right")
+    elif num_points > 10:
+        # Show every 2nd label for moderate iterations
+        step = 2
+        tick_positions = list(range(0, num_points, step))
+        tick_labels = [
+            labels[i] if labels else f"Iteration {i+1}" for i in tick_positions
+        ]
+        ax1.set_xticks(tick_positions)
+        ax1.set_xticklabels(tick_labels, rotation=45, ha="right")
+    else:
+        # Show all labels for few iterations
+        ax1.set_xticks(x)
+        ax1.set_xticklabels(labels if labels else [f"Iteration {i+1}" for i in x])
+
+    ax1.legend()
+    ax1.grid(True, alpha=0.3)
+
+    # Add configuration text box - compact
+    if config_text:
+        ax1.text(
+            0.02,
+            0.98,
+            config_text,
+            transform=ax1.transAxes,
+            fontsize=6,
+            verticalalignment="top",
+            bbox=dict(boxstyle="round,pad=0.3", facecolor="wheat", alpha=0.85),
+        )
+
+    # Query performance - with visible markers for all points
+    if baseline_query:
+        # Line plot
+        ax2.plot(
+            x,
+            baseline_query,
+            "-",
+            label="Baseline",
+            linewidth=1.5,
+            color="blue",
+            alpha=0.6,
+        )
+        # Add distinct markers for each point
+        ax2.scatter(
+            x,
+            baseline_query,
+            s=30,
+            color="blue",
+            alpha=0.8,
+            edgecolors="darkblue",
+            linewidth=0.5,
+            zorder=5,
+        )
+    if dev_query:
+        # Line plot
+        ax2.plot(
+            x[: len(dev_query)],
+            dev_query,
+            "-",
+            label="Development",
+            linewidth=1.5,
+            color="red",
+            alpha=0.6,
+        )
+        # Add distinct markers for each point
+        ax2.scatter(
+            x[: len(dev_query)],
+            dev_query,
+            s=30,
+            color="red",
+            alpha=0.8,
+            edgecolors="darkred",
+            linewidth=0.5,
+            marker="s",
+            zorder=5,
+        )
+    ax2.set_xlabel("Test Iteration (same configuration, repeated for reliability)")
+    ax2.set_ylabel("Query QPS (queries/second)")
+    ax2.set_title("Milvus Query Performance")
+
+    # Handle x-axis labels to prevent overlap
+    num_points = len(x)
+    if num_points > 20:
+        # Show every 5th label for many iterations
+        step = 5
+        tick_positions = list(range(0, num_points, step))
+        tick_labels = [
+            labels[i] if labels else f"Iteration {i+1}" for i in tick_positions
+        ]
+        ax2.set_xticks(tick_positions)
+        ax2.set_xticklabels(tick_labels, rotation=45, ha="right")
+    elif num_points > 10:
+        # Show every 2nd label for moderate iterations
+        step = 2
+        tick_positions = list(range(0, num_points, step))
+        tick_labels = [
+            labels[i] if labels else f"Iteration {i+1}" for i in tick_positions
+        ]
+        ax2.set_xticks(tick_positions)
+        ax2.set_xticklabels(tick_labels, rotation=45, ha="right")
+    else:
+        # Show all labels for few iterations
+        ax2.set_xticks(x)
+        ax2.set_xticklabels(labels if labels else [f"Iteration {i+1}" for i in x])
+
+    ax2.legend()
+    ax2.grid(True, alpha=0.3)
+
+    # Add configuration text box - compact
+    if config_text:
+        ax2.text(
+            0.02,
+            0.98,
+            config_text,
+            transform=ax2.transAxes,
+            fontsize=6,
+            verticalalignment="top",
+            bbox=dict(boxstyle="round,pad=0.3", facecolor="wheat", alpha=0.85),
+        )
+
+    plt.tight_layout()
+    plt.savefig(os.path.join(output_dir, "performance_trends.png"), dpi=150)
+    plt.close()
+
+
+def generate_summary_statistics(results, output_dir):
+    """Generate summary statistics and save to JSON"""
+    # Get unique filesystems, excluding "unknown"
+    filesystems = set()
+    for r in results:
+        fs = r.get("filesystem", "unknown")
+        if fs != "unknown":
+            filesystems.add(fs)
+
+    summary = {
+        "total_tests": len(results),
+        "filesystems_tested": sorted(list(filesystems)),
+        "configurations": {},
+        "performance_summary": {
+            "best_insert_qps": {"value": 0, "config": ""},
+            "best_query_qps": {"value": 0, "config": ""},
+            "average_insert_qps": 0,
+            "average_query_qps": 0,
+        },
+    }
+
+    # Calculate statistics
+    all_insert_qps = []
+    all_query_qps = []
+
+    for result in results:
+        fs = result.get("filesystem", "unknown")
+        block_size = result.get("block_size", "default")
+        is_dev = "dev" if result.get("is_dev", False) else "baseline"
+        config_name = f"{fs}-{block_size}-{is_dev}"
+
+        # Get actual performance metrics
+        if "insert_performance" in result:
+            insert_qps = result["insert_performance"].get("vectors_per_second", 0)
+        else:
+            insert_qps = 0
+
+        # Calculate average query QPS
+        query_qps = 0
+        if "query_performance" in result:
+            qp = result["query_performance"]
+            total_qps = 0
+            count = 0
+            for topk_key in ["topk_1", "topk_10", "topk_100"]:
+                if topk_key in qp:
+                    for batch_key in ["batch_1", "batch_10", "batch_100"]:
+                        if batch_key in qp[topk_key]:
+                            total_qps += qp[topk_key][batch_key].get(
+                                "queries_per_second", 0
+                            )
+                            count += 1
+            if count > 0:
+                query_qps = total_qps / count
+
+        all_insert_qps.append(insert_qps)
+        all_query_qps.append(query_qps)
+
+        summary["configurations"][config_name] = {
+            "insert_qps": insert_qps,
+            "query_qps": query_qps,
+            "host": result.get("host", "unknown"),
+        }
+
+        if insert_qps > summary["performance_summary"]["best_insert_qps"]["value"]:
+            summary["performance_summary"]["best_insert_qps"] = {
+                "value": insert_qps,
+                "config": config_name,
+            }
+
+        if query_qps > summary["performance_summary"]["best_query_qps"]["value"]:
+            summary["performance_summary"]["best_query_qps"] = {
+                "value": query_qps,
+                "config": config_name,
+            }
+
+    summary["performance_summary"]["average_insert_qps"] = (
+        np.mean(all_insert_qps) if all_insert_qps else 0
+    )
+    summary["performance_summary"]["average_query_qps"] = (
+        np.mean(all_query_qps) if all_query_qps else 0
+    )
+
+    # Save summary
+    with open(os.path.join(output_dir, "summary.json"), "w") as f:
+        json.dump(summary, f, indent=2)
+
+    return summary
+
+
+def create_comprehensive_fs_comparison(results, output_dir):
+    """Create comprehensive filesystem performance comparison including all configurations"""
+    import matplotlib.pyplot as plt
+    import numpy as np
+    from collections import defaultdict
+
+    # Collect data for all filesystem configurations
+    config_data = defaultdict(lambda: {"baseline": [], "dev": []})
+
+    for result in results:
+        fs = result.get("filesystem", "unknown")
+        block_size = result.get("block_size", "")
+
+        # Create configuration label
+        if block_size and block_size != "default":
+            config_label = f"{fs}-{block_size}"
+        else:
+            config_label = fs
+
+        category = "dev" if result.get("is_dev", False) else "baseline"
+
+        # Extract performance metrics
+        if "insert_performance" in result:
+            insert_qps = result["insert_performance"].get("vectors_per_second", 0)
+        else:
+            insert_qps = 0
+
+        config_data[config_label][category].append(insert_qps)
+
+    # Sort configurations for consistent display
+    configs = sorted(config_data.keys())
+
+    # Calculate means and standard deviations
+    baseline_means = []
+    baseline_stds = []
+    dev_means = []
+    dev_stds = []
+
+    for config in configs:
+        baseline_vals = config_data[config]["baseline"]
+        dev_vals = config_data[config]["dev"]
+
+        baseline_means.append(np.mean(baseline_vals) if baseline_vals else 0)
+        baseline_stds.append(np.std(baseline_vals) if baseline_vals else 0)
+        dev_means.append(np.mean(dev_vals) if dev_vals else 0)
+        dev_stds.append(np.std(dev_vals) if dev_vals else 0)
+
+    # Create the plot
+    fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(14, 10))
+
+    x = np.arange(len(configs))
+    width = 0.35
+
+    # Top plot: Absolute performance
+    baseline_bars = ax1.bar(
+        x - width / 2,
+        baseline_means,
+        width,
+        yerr=baseline_stds,
+        label="Baseline",
+        color="#1f77b4",
+        capsize=5,
+    )
+    dev_bars = ax1.bar(
+        x + width / 2,
+        dev_means,
+        width,
+        yerr=dev_stds,
+        label="Development",
+        color="#ff7f0e",
+        capsize=5,
+    )
+
+    ax1.set_ylabel("Insert QPS")
+    ax1.set_title("Vector Database Performance Across Filesystem Configurations")
+    ax1.set_xticks(x)
+    ax1.set_xticklabels(configs, rotation=45, ha="right")
+    ax1.legend()
+    ax1.grid(True, alpha=0.3)
+
+    # Add value labels on bars
+    for bars in [baseline_bars, dev_bars]:
+        for bar in bars:
+            height = bar.get_height()
+            if height > 0:
+                ax1.annotate(
+                    f"{height:.0f}",
+                    xy=(bar.get_x() + bar.get_width() / 2, height),
+                    xytext=(0, 3),
+                    textcoords="offset points",
+                    ha="center",
+                    va="bottom",
+                    fontsize=8,
+                )
+
+    # Bottom plot: Percentage improvement (dev vs baseline)
+    improvements = []
+    for i in range(len(configs)):
+        if baseline_means[i] > 0:
+            improvement = ((dev_means[i] - baseline_means[i]) / baseline_means[i]) * 100
+        else:
+            improvement = 0
+        improvements.append(improvement)
+
+    colors = ["green" if x > 0 else "red" for x in improvements]
+    improvement_bars = ax2.bar(x, improvements, color=colors, alpha=0.7)
+
+    ax2.set_ylabel("Performance Change (%)")
+    ax2.set_title("Development vs Baseline Performance Change")
+    ax2.set_xticks(x)
+    ax2.set_xticklabels(configs, rotation=45, ha="right")
+    ax2.axhline(y=0, color="black", linestyle="-", linewidth=0.5)
+    ax2.grid(True, alpha=0.3)
+
+    # Add percentage labels
+    for bar, val in zip(improvement_bars, improvements):
+        ax2.annotate(
+            f"{val:.1f}%",
+            xy=(bar.get_x() + bar.get_width() / 2, val),
+            xytext=(0, 3 if val > 0 else -15),
+            textcoords="offset points",
+            ha="center",
+            va="bottom" if val > 0 else "top",
+            fontsize=8,
+        )
+
+    plt.tight_layout()
+    plt.savefig(os.path.join(output_dir, "comprehensive_fs_comparison.png"), dpi=150)
+    plt.close()
+
+
+def create_fs_latency_comparison(results, output_dir):
+    """Create latency comparison across filesystems"""
+    import matplotlib.pyplot as plt
+    import numpy as np
+    from collections import defaultdict
+
+    # Collect latency data
+    config_latency = defaultdict(lambda: {"baseline": [], "dev": []})
+
+    for result in results:
+        fs = result.get("filesystem", "unknown")
+        block_size = result.get("block_size", "")
+
+        if block_size and block_size != "default":
+            config_label = f"{fs}-{block_size}"
+        else:
+            config_label = fs
+
+        category = "dev" if result.get("is_dev", False) else "baseline"
+
+        # Extract latency metrics
+        if "query_performance" in result:
+            latency_p99 = result["query_performance"].get("latency_p99_ms", 0)
+        else:
+            latency_p99 = 0
+
+        if latency_p99 > 0:
+            config_latency[config_label][category].append(latency_p99)
+
+    if not config_latency:
+        return
+
+    # Sort configurations
+    configs = sorted(config_latency.keys())
+
+    # Calculate statistics
+    baseline_p99 = []
+    dev_p99 = []
+
+    for config in configs:
+        baseline_vals = config_latency[config]["baseline"]
+        dev_vals = config_latency[config]["dev"]
+
+        baseline_p99.append(np.mean(baseline_vals) if baseline_vals else 0)
+        dev_p99.append(np.mean(dev_vals) if dev_vals else 0)
+
+    # Create plot
+    fig, ax = plt.subplots(figsize=(12, 6))
+
+    x = np.arange(len(configs))
+    width = 0.35
+
+    baseline_bars = ax.bar(
+        x - width / 2, baseline_p99, width, label="Baseline P99", color="#9467bd"
+    )
+    dev_bars = ax.bar(
+        x + width / 2, dev_p99, width, label="Development P99", color="#e377c2"
+    )
+
+    ax.set_xlabel("Filesystem Configuration")
+    ax.set_ylabel("Latency P99 (ms)")
+    ax.set_title("Query Latency (P99) Comparison Across Filesystems")
+    ax.set_xticks(x)
+    ax.set_xticklabels(configs, rotation=45, ha="right")
+    ax.legend()
+    ax.grid(True, alpha=0.3)
+
+    # Add value labels
+    for bars in [baseline_bars, dev_bars]:
+        for bar in bars:
+            height = bar.get_height()
+            if height > 0:
+                ax.annotate(
+                    f"{height:.1f}",
+                    xy=(bar.get_x() + bar.get_width() / 2, height),
+                    xytext=(0, 3),
+                    textcoords="offset points",
+                    ha="center",
+                    va="bottom",
+                    fontsize=8,
+                )
+
+    plt.tight_layout()
+    plt.savefig(os.path.join(output_dir, "filesystem_latency_comparison.png"), dpi=150)
+    plt.close()
+
+
+def main():
+    if len(sys.argv) < 3:
+        print("Usage: generate_graphs.py <results_dir> <output_dir>")
+        sys.exit(1)
+
+    results_dir = sys.argv[1]
+    output_dir = sys.argv[2]
+
+    # Create output directory
+    os.makedirs(output_dir, exist_ok=True)
+
+    # Load results
+    results = load_results(results_dir)
+
+    if not results:
+        print("No results found to analyze")
+        sys.exit(1)
+
+    print(f"Loaded {len(results)} result files")
+
+    # Generate graphs
+    print("Generating performance heatmap...")
+    create_heatmap_analysis(results, output_dir)
+
+    print("Generating performance trends...")
+    create_simple_performance_trends(results, output_dir)
+
+    print("Generating summary statistics...")
+    summary = generate_summary_statistics(results, output_dir)
+
+    # Check if we have multiple filesystems to compare
+    filesystems = set(r.get("filesystem", "unknown") for r in results)
+    if len(filesystems) > 1:
+        print("Generating filesystem comparison chart...")
+        create_filesystem_comparison_chart(results, output_dir)
+
+        print("Generating comprehensive filesystem comparison...")
+        create_comprehensive_fs_comparison(results, output_dir)
+
+        print("Generating filesystem latency comparison...")
+        create_fs_latency_comparison(results, output_dir)
+
+        # Check if we have XFS results with different block sizes
+        xfs_results = [r for r in results if r.get("filesystem") == "xfs"]
+        block_sizes = set(r.get("block_size", "unknown") for r in xfs_results)
+        if len(block_sizes) > 1:
+            print("Generating XFS block size analysis...")
+            create_block_size_analysis(results, output_dir)
+
+    print(f"\nAnalysis complete! Graphs saved to {output_dir}")
+    print(f"Total configurations tested: {summary['total_tests']}")
+    print(
+        f"Best insert QPS: {summary['performance_summary']['best_insert_qps']['value']} ({summary['performance_summary']['best_insert_qps']['config']})"
+    )
+    print(
+        f"Best query QPS: {summary['performance_summary']['best_query_qps']['value']} ({summary['performance_summary']['best_query_qps']['config']})"
+    )
+
+
+if __name__ == "__main__":
+    main()
diff --git a/workflows/ai/scripts/generate_html_report.py b/workflows/ai/scripts/generate_html_report.py
new file mode 100755
index 00000000..3aa8342f
--- /dev/null
+++ b/workflows/ai/scripts/generate_html_report.py
@@ -0,0 +1,558 @@
+#!/usr/bin/env python3
+"""
+Generate HTML report for AI benchmark results
+"""
+
+import json
+import os
+import sys
+import glob
+from datetime import datetime
+from pathlib import Path
+
+HTML_TEMPLATE = """
+<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>AI Benchmark Results - {timestamp}</title>
+    <style>
+        body {{
+            font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, 'Helvetica Neue', Arial, sans-serif;
+            line-height: 1.6;
+            color: #333;
+            max-width: 1400px;
+            margin: 0 auto;
+            padding: 20px;
+            background-color: #f5f5f5;
+        }}
+        .header {{
+            background-color: #2c3e50;
+            color: white;
+            padding: 30px;
+            border-radius: 8px;
+            margin-bottom: 30px;
+            text-align: center;
+        }}
+        h1 {{
+            margin: 0;
+            font-size: 2.5em;
+        }}
+        .subtitle {{
+            margin-top: 10px;
+            opacity: 0.9;
+        }}
+        .summary-cards {{
+            display: grid;
+            grid-template-columns: repeat(auto-fit, minmax(250px, 1fr));
+            gap: 20px;
+            margin-bottom: 40px;
+        }}
+        .card {{
+            background: white;
+            padding: 20px;
+            border-radius: 8px;
+            box-shadow: 0 2px 4px rgba(0,0,0,0.1);
+            text-align: center;
+        }}
+        .card h3 {{
+            margin: 0 0 10px 0;
+            color: #2c3e50;
+        }}
+        .card .value {{
+            font-size: 2em;
+            font-weight: bold;
+            color: #3498db;
+        }}
+        .card .label {{
+            color: #7f8c8d;
+            font-size: 0.9em;
+        }}
+        .config-box {{
+            background: #f8f9fa;
+            border-left: 4px solid #3498db;
+            padding: 15px;
+            margin: 20px 0;
+            border-radius: 4px;
+        }}
+        .config-box h3 {{
+            margin-top: 0;
+            color: #2c3e50;
+        }}
+        .config-box ul {{
+            margin: 10px 0;
+            padding-left: 20px;
+        }}
+        .config-box li {{
+            margin: 5px 0;
+        }}
+        .section {{
+            background: white;
+            padding: 30px;
+            border-radius: 8px;
+            box-shadow: 0 2px 4px rgba(0,0,0,0.1);
+            margin-bottom: 30px;
+        }}
+        .section h2 {{
+            color: #2c3e50;
+            border-bottom: 2px solid #3498db;
+            padding-bottom: 10px;
+            margin-bottom: 20px;
+        }}
+        .graph-container {{
+            text-align: center;
+            margin: 20px 0;
+        }}
+        .graph-container img {{
+            max-width: 100%;
+            height: auto;
+            border-radius: 4px;
+            box-shadow: 0 2px 8px rgba(0,0,0,0.1);
+        }}
+        .results-table {{
+            width: 100%;
+            border-collapse: collapse;
+            margin-top: 20px;
+        }}
+        .results-table th, .results-table td {{
+            padding: 12px;
+            text-align: left;
+            border-bottom: 1px solid #ddd;
+        }}
+        .results-table th {{
+            background-color: #f8f9fa;
+            font-weight: 600;
+            color: #2c3e50;
+        }}
+        .results-table tr:hover {{
+            background-color: #f8f9fa;
+        }}
+        .baseline {{
+            background-color: #e8f4fd;
+        }}
+        .dev {{
+            background-color: #fff3cd;
+        }}
+        .footer {{
+            text-align: center;
+            padding: 20px;
+            color: #7f8c8d;
+            font-size: 0.9em;
+        }}
+        .graph-grid {{
+            display: grid;
+            grid-template-columns: repeat(auto-fit, minmax(500px, 1fr));
+            gap: 20px;
+            margin: 20px 0;
+        }}
+        .best-config {{
+            background-color: #d4edda;
+            font-weight: bold;
+        }}
+        .navigation {{
+            position: sticky;
+            top: 20px;
+            background: white;
+            padding: 20px;
+            border-radius: 8px;
+            box-shadow: 0 2px 4px rgba(0,0,0,0.1);
+            margin-bottom: 30px;
+        }}
+        .navigation ul {{
+            list-style: none;
+            padding: 0;
+            margin: 0;
+        }}
+        .navigation li {{
+            display: inline-block;
+            margin-right: 20px;
+        }}
+        .navigation a {{
+            color: #3498db;
+            text-decoration: none;
+            font-weight: 500;
+        }}
+        .navigation a:hover {{
+            text-decoration: underline;
+        }}
+    </style>
+</head>
+<body>
+    <div class="header">
+        <h1>AI Vector Database Benchmark Results</h1>
+        <div class="subtitle">Generated on {timestamp}</div>
+    </div>
+    
+    <nav class="navigation">
+        <ul>
+            <li><a href="#summary">Summary</a></li>
+            {filesystem_nav_items}
+            <li><a href="#performance-metrics">Performance Metrics</a></li>
+            <li><a href="#performance-heatmap">Performance Heatmap</a></li>
+            <li><a href="#detailed-results">Detailed Results</a></li>
+        </ul>
+    </nav>
+    
+    <div id="summary" class="summary-cards">
+        <div class="card">
+            <h3>Total Tests</h3>
+            <div class="value">{total_tests}</div>
+            <div class="label">Configurations</div>
+        </div>
+        <div class="card">
+            <h3>Best Insert QPS</h3>
+            <div class="value">{best_insert_qps}</div>
+            <div class="label">{best_insert_config}</div>
+        </div>
+        <div class="card">
+            <h3>Best Query QPS</h3>
+            <div class="value">{best_query_qps}</div>
+            <div class="label">{best_query_config}</div>
+        </div>
+        <div class="card">
+            <h3>{fourth_card_title}</h3>
+            <div class="value">{fourth_card_value}</div>
+            <div class="label">{fourth_card_label}</div>
+        </div>
+    </div>
+    
+    {filesystem_comparison_section}
+    
+    {block_size_analysis_section}
+    
+    <div id="performance-heatmap" class="section">
+        <h2>Performance Heatmap</h2>
+        <p>Heatmap visualization showing performance metrics across all tested configurations.</p>
+        <div class="graph-container">
+            <img src="graphs/performance_heatmap.png" alt="Performance Heatmap">
+        </div>
+    </div>
+    
+    <div id="performance-metrics" class="section">
+        <h2>Performance Metrics</h2>
+        {config_summary}
+        <div class="graph-grid">
+            {performance_trend_graphs}
+        </div>
+    </div>
+    
+    <div id="detailed-results" class="section">
+        <h2>Detailed Results Table</h2>
+        <table class="results-table">
+            <thead>
+                <tr>
+                    <th>Host</th>
+                    <th>Type</th>
+                    <th>Insert QPS</th>
+                    <th>Query QPS</th>
+                    <th>Timestamp</th>
+                </tr>
+            </thead>
+            <tbody>
+                {table_rows}
+            </tbody>
+        </table>
+    </div>
+    
+    <div class="footer">
+        <p>Generated by kdevops AI Benchmark Suite | <a href="https://github.com/linux-kdevops/kdevops">GitHub</a></p>
+    </div>
+</body>
+</html>
+"""
+
+
+def load_summary(graphs_dir):
+    """Load the summary.json file"""
+    summary_path = os.path.join(graphs_dir, "summary.json")
+    if os.path.exists(summary_path):
+        with open(summary_path, "r") as f:
+            return json.load(f)
+    return None
+
+
+def load_results(results_dir):
+    """Load all result files for detailed table"""
+    results = []
+    json_files = glob.glob(os.path.join(results_dir, "*.json"))
+
+    for json_file in json_files:
+        try:
+            with open(json_file, "r") as f:
+                data = json.load(f)
+                # Get filesystem from JSON data first, then fallback to filename parsing
+                filename = os.path.basename(json_file)
+
+                # Skip results without valid performance data
+                insert_perf = data.get("insert_performance", {})
+                query_perf = data.get("query_performance", {})
+                if not insert_perf or not query_perf:
+                    continue
+
+                # Get filesystem from JSON data
+                fs_type = data.get("filesystem", None)
+
+                # If not in JSON, try to parse from filename (backwards compatibility)
+                if not fs_type and "debian13-ai" in filename:
+                    host_parts = (
+                        filename.replace("results_debian13-ai-", "")
+                        .replace("_1.json", "")
+                        .replace("_2.json", "")
+                        .replace("_3.json", "")
+                        .split("-")
+                    )
+                    if "xfs" in host_parts[0]:
+                        fs_type = "xfs"
+                        block_size = host_parts[1] if len(host_parts) > 1 else "4k"
+                    elif "ext4" in host_parts[0]:
+                        fs_type = "ext4"
+                        block_size = host_parts[1] if len(host_parts) > 1 else "4k"
+                    elif "btrfs" in host_parts[0]:
+                        fs_type = "btrfs"
+                        block_size = "default"
+                    else:
+                        fs_type = "unknown"
+                        block_size = "unknown"
+                else:
+                    # Set appropriate block size based on filesystem
+                    if fs_type == "btrfs":
+                        block_size = "default"
+                    else:
+                        block_size = data.get("block_size", "default")
+
+                # Default to unknown if still not found
+                if not fs_type:
+                    fs_type = "unknown"
+                    block_size = "unknown"
+
+                is_dev = "dev" in filename
+
+                # Calculate average QPS from query performance data
+                query_qps = 0
+                query_count = 0
+                for topk_data in query_perf.values():
+                    for batch_data in topk_data.values():
+                        qps = batch_data.get("queries_per_second", 0)
+                        if qps > 0:
+                            query_qps += qps
+                            query_count += 1
+                if query_count > 0:
+                    query_qps = query_qps / query_count
+
+                results.append(
+                    {
+                        "host": filename.replace("results_", "").replace(".json", ""),
+                        "filesystem": fs_type,
+                        "block_size": block_size,
+                        "type": "Development" if is_dev else "Baseline",
+                        "insert_qps": insert_perf.get("vectors_per_second", 0),
+                        "query_qps": query_qps,
+                        "timestamp": data.get("timestamp", "N/A"),
+                        "is_dev": is_dev,
+                    }
+                )
+        except Exception as e:
+            print(f"Error loading {json_file}: {e}")
+
+    # Sort by filesystem, block size, then type
+    results.sort(key=lambda x: (x["filesystem"], x["block_size"], x["type"]))
+    return results
+
+
+def generate_table_rows(results, best_configs):
+    """Generate HTML table rows"""
+    rows = []
+    for result in results:
+        config_key = f"{result['filesystem']}-{result['block_size']}-{'dev' if result['is_dev'] else 'baseline'}"
+        row_class = "dev" if result["is_dev"] else "baseline"
+
+        # Check if this is a best configuration
+        if config_key in best_configs:
+            row_class += " best-config"
+
+        row = f"""
+        <tr class="{row_class}">
+            <td>{result['host']}</td>
+            <td>{result['type']}</td>
+            <td>{result['insert_qps']:,}</td>
+            <td>{result['query_qps']:,}</td>
+            <td>{result['timestamp']}</td>
+        </tr>
+        """
+        rows.append(row)
+
+    return "\n".join(rows)
+
+
+def generate_config_summary(results_dir):
+    """Generate configuration summary HTML from results"""
+    # Try to load first result file to get configuration
+    result_files = glob.glob(os.path.join(results_dir, "results_*.json"))
+    if not result_files:
+        return ""
+
+    try:
+        with open(result_files[0], "r") as f:
+            data = json.load(f)
+            config = data.get("config", {})
+
+            # Format configuration details
+            config_html = """
+        <div class="config-box">
+            <h3>Test Configuration</h3>
+            <ul>
+                <li><strong>Vector Dataset Size:</strong> {:,} vectors</li>
+                <li><strong>Vector Dimensions:</strong> {}</li>
+                <li><strong>Index Type:</strong> {} (M={}, ef_construction={}, ef={})</li>
+                <li><strong>Benchmark Runtime:</strong> {} seconds</li>
+                <li><strong>Batch Size:</strong> {:,}</li>
+                <li><strong>Test Iterations:</strong> {} runs with identical configuration</li>
+            </ul>
+        </div>
+            """.format(
+                config.get("vector_dataset_size", "N/A"),
+                config.get("vector_dimensions", "N/A"),
+                config.get("index_type", "N/A"),
+                config.get("index_hnsw_m", "N/A"),
+                config.get("index_hnsw_ef_construction", "N/A"),
+                config.get("index_hnsw_ef", "N/A"),
+                config.get("benchmark_runtime", "N/A"),
+                config.get("batch_size", "N/A"),
+                len(result_files),
+            )
+            return config_html
+    except Exception as e:
+        print(f"Warning: Could not generate config summary: {e}")
+        return ""
+
+
+def find_performance_trend_graphs(graphs_dir):
+    """Find performance trend graphs"""
+    graphs = []
+    # Look for filesystem-specific graphs in multi-fs mode
+    for fs in ["xfs", "ext4", "btrfs"]:
+        graph_path = f"{fs}_performance_trends.png"
+        if os.path.exists(os.path.join(graphs_dir, graph_path)):
+            graphs.append(
+                f'<div class="graph-container"><img src="graphs/{graph_path}" alt="{fs.upper()} Performance Trends"></div>'
+            )
+    # Fallback to simple performance trends for single mode
+    if not graphs and os.path.exists(
+        os.path.join(graphs_dir, "performance_trends.png")
+    ):
+        graphs.append(
+            '<div class="graph-container"><img src="graphs/performance_trends.png" alt="Performance Trends"></div>'
+        )
+    return "\n".join(graphs)
+
+
+def generate_html_report(results_dir, graphs_dir, output_path):
+    """Generate the HTML report"""
+    # Load summary
+    summary = load_summary(graphs_dir)
+    if not summary:
+        print("Warning: No summary.json found")
+        summary = {
+            "total_tests": 0,
+            "filesystems_tested": [],
+            "performance_summary": {
+                "best_insert_qps": {"value": 0, "config": "N/A"},
+                "best_query_qps": {"value": 0, "config": "N/A"},
+            },
+        }
+
+    # Load detailed results
+    results = load_results(results_dir)
+
+    # Find best configurations
+    best_configs = set()
+    if summary["performance_summary"]["best_insert_qps"]["config"]:
+        best_configs.add(summary["performance_summary"]["best_insert_qps"]["config"])
+    if summary["performance_summary"]["best_query_qps"]["config"]:
+        best_configs.add(summary["performance_summary"]["best_query_qps"]["config"])
+
+    # Check if multi-filesystem testing is enabled (more than one filesystem)
+    filesystems_tested = summary.get("filesystems_tested", [])
+    is_multifs_enabled = len(filesystems_tested) > 1
+
+    # Generate conditional sections based on multi-fs status
+    if is_multifs_enabled:
+        filesystem_nav_items = """
+            <li><a href="#filesystem-comparison">Filesystem Comparison</a></li>
+            <li><a href="#block-size-analysis">Block Size Analysis</a></li>"""
+
+        filesystem_comparison_section = """<div id="filesystem-comparison" class="section">
+        <h2>Filesystem Performance Comparison</h2>
+        <p>Comparison of vector database performance across different filesystems, showing both baseline and development kernel results.</p>
+        <div class="graph-container">
+            <img src="graphs/filesystem_comparison.png" alt="Filesystem Comparison">
+        </div>
+    </div>"""
+
+        block_size_analysis_section = """<div id="block-size-analysis" class="section">
+        <h2>XFS Block Size Analysis</h2>
+        <p>Performance analysis of XFS filesystem with different block sizes (4K, 16K, 32K, 64K).</p>
+        <div class="graph-container">
+            <img src="graphs/xfs_block_size_analysis.png" alt="XFS Block Size Analysis">
+        </div>
+    </div>"""
+
+        # Multi-fs mode: show filesystem info
+        fourth_card_title = "Filesystems Tested"
+        fourth_card_value = str(len(filesystems_tested))
+        fourth_card_label = ", ".join(filesystems_tested).upper()
+    else:
+        # Single filesystem mode - hide multi-fs sections
+        filesystem_nav_items = ""
+        filesystem_comparison_section = ""
+        block_size_analysis_section = ""
+
+        # Single mode: show test iterations
+        fourth_card_title = "Test Iterations"
+        fourth_card_value = str(summary["total_tests"])
+        fourth_card_label = "Identical Configuration Runs"
+
+    # Generate configuration summary
+    config_summary = generate_config_summary(results_dir)
+
+    # Generate HTML
+    html_content = HTML_TEMPLATE.format(
+        timestamp=datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
+        total_tests=summary["total_tests"],
+        best_insert_qps=f"{summary['performance_summary']['best_insert_qps']['value']:,}",
+        best_insert_config=summary["performance_summary"]["best_insert_qps"]["config"],
+        best_query_qps=f"{summary['performance_summary']['best_query_qps']['value']:,}",
+        best_query_config=summary["performance_summary"]["best_query_qps"]["config"],
+        fourth_card_title=fourth_card_title,
+        fourth_card_value=fourth_card_value,
+        fourth_card_label=fourth_card_label,
+        filesystem_nav_items=filesystem_nav_items,
+        filesystem_comparison_section=filesystem_comparison_section,
+        block_size_analysis_section=block_size_analysis_section,
+        config_summary=config_summary,
+        performance_trend_graphs=find_performance_trend_graphs(graphs_dir),
+        table_rows=generate_table_rows(results, best_configs),
+    )
+
+    # Write HTML file
+    with open(output_path, "w") as f:
+        f.write(html_content)
+
+    print(f"HTML report generated: {output_path}")
+
+
+def main():
+    if len(sys.argv) < 4:
+        print("Usage: generate_html_report.py <results_dir> <graphs_dir> <output_html>")
+        sys.exit(1)
+
+    results_dir = sys.argv[1]
+    graphs_dir = sys.argv[2]
+    output_html = sys.argv[3]
+
+    generate_html_report(results_dir, graphs_dir, output_html)
+
+
+if __name__ == "__main__":
+    main()
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] ai: add multi-filesystem testing support for Milvus benchmarks
  2025-08-27  9:31 [PATCH 0/2] kdevops: add milvus with minio support Luis Chamberlain
  2025-08-27  9:32 ` [PATCH 1/2] ai: add Milvus vector database benchmarking support Luis Chamberlain
@ 2025-08-27  9:32 ` Luis Chamberlain
  2025-08-27 14:47   ` Chuck Lever
  2025-09-01 20:11   ` Daniel Gomez
  2025-08-29  2:05 ` [PATCH 0/2] kdevops: add milvus with minio support Luis Chamberlain
  2 siblings, 2 replies; 8+ messages in thread
From: Luis Chamberlain @ 2025-08-27  9:32 UTC (permalink / raw)
  To: Chuck Lever, Daniel Gomez, hui81.qi, kundan.kumar, kdevops
  Cc: Luis Chamberlain

Extend the AI workflow to support testing Milvus across multiple
filesystem configurations simultaneously. This enables comprehensive
performance comparisons between different filesystems and their
configuration options.

Key features:
- Dynamic node generation based on enabled filesystem configurations
- Support for XFS, EXT4, and BTRFS with various mount options
- Per-filesystem result collection and analysis
- A/B testing across all filesystem configurations
- Automated comparison graphs between filesystems

Filesystem configurations:
- XFS: default, nocrc, bigtime with various block sizes (512, 1k, 2k, 4k)
- EXT4: default, nojournal, bigalloc configurations
- BTRFS: default, zlib, lzo, zstd compression options

Defconfigs:
- ai-milvus-multifs: Test 7 filesystem configs with A/B testing
- ai-milvus-multifs-distro: Test with distribution kernels
- ai-milvus-multifs-extended: Extended configs (14 filesystems total)

Node generation:
The system dynamically generates nodes based on enabled filesystem
configurations. With A/B testing enabled, this creates baseline and
dev nodes for each filesystem (e.g., debian13-ai-xfs-4k and
debian13-ai-xfs-4k-dev).

Usage:
  make defconfig-ai-milvus-multifs
  make bringup    # Creates nodes for each filesystem
  make ai         # Setup infrastructure on all nodes
  make ai-tests   # Run benchmarks on all filesystems
  make ai-results # Collect and compare results

This enables systematic evaluation of how different filesystems and
their configurations affect vector database performance.

Generated-by: Claude AI
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 .github/workflows/docker-tests.yml            |    6 +
 Makefile                                      |    2 +-
 defconfigs/ai-milvus-multifs                  |   67 +
 defconfigs/ai-milvus-multifs-distro           |  109 ++
 defconfigs/ai-milvus-multifs-extended         |  108 ++
 docs/ai/vector-databases/README.md            |    1 -
 playbooks/ai_install.yml                      |    6 +
 playbooks/ai_multifs.yml                      |   24 +
 .../host_vars/debian13-ai-xfs-4k-4ks.yml      |   10 -
 .../files/analyze_results.py                  | 1132 +++++++++++---
 .../files/generate_better_graphs.py           |   16 +-
 .../files/generate_graphs.py                  |  888 ++++-------
 .../files/generate_html_report.py             |  263 +++-
 .../roles/ai_collect_results/tasks/main.yml   |   42 +-
 .../templates/analysis_config.json.j2         |    2 +-
 .../roles/ai_milvus_storage/tasks/main.yml    |  161 ++
 .../tasks/generate_comparison.yml             |  279 ++++
 playbooks/roles/ai_multifs_run/tasks/main.yml |   23 +
 .../tasks/run_single_filesystem.yml           |  104 ++
 .../templates/milvus_config.json.j2           |   42 +
 .../roles/ai_multifs_setup/defaults/main.yml  |   49 +
 .../roles/ai_multifs_setup/tasks/main.yml     |   70 +
 .../files/milvus_benchmark.py                 |  164 +-
 playbooks/roles/gen_hosts/tasks/main.yml      |   19 +
 .../roles/gen_hosts/templates/fstests.j2      |    2 +
 playbooks/roles/gen_hosts/templates/gitr.j2   |    2 +
 playbooks/roles/gen_hosts/templates/hosts.j2  |   35 +-
 .../roles/gen_hosts/templates/nfstest.j2      |    2 +
 playbooks/roles/gen_hosts/templates/pynfs.j2  |    2 +
 playbooks/roles/gen_nodes/tasks/main.yml      |   90 ++
 .../roles/guestfs/tasks/bringup/main.yml      |   15 +
 scripts/guestfs.Makefile                      |    2 +-
 workflows/ai/Kconfig                          |   13 +
 workflows/ai/Kconfig.fs                       |  118 ++
 workflows/ai/Kconfig.multifs                  |  184 +++
 workflows/ai/scripts/analysis_config.json     |    2 +-
 workflows/ai/scripts/analyze_results.py       | 1132 +++++++++++---
 workflows/ai/scripts/generate_graphs.py       | 1372 ++++-------------
 workflows/ai/scripts/generate_html_report.py  |   94 +-
 39 files changed, 4356 insertions(+), 2296 deletions(-)
 create mode 100644 defconfigs/ai-milvus-multifs
 create mode 100644 defconfigs/ai-milvus-multifs-distro
 create mode 100644 defconfigs/ai-milvus-multifs-extended
 create mode 100644 playbooks/ai_multifs.yml
 delete mode 100644 playbooks/host_vars/debian13-ai-xfs-4k-4ks.yml
 create mode 100644 playbooks/roles/ai_milvus_storage/tasks/main.yml
 create mode 100644 playbooks/roles/ai_multifs_run/tasks/generate_comparison.yml
 create mode 100644 playbooks/roles/ai_multifs_run/tasks/main.yml
 create mode 100644 playbooks/roles/ai_multifs_run/tasks/run_single_filesystem.yml
 create mode 100644 playbooks/roles/ai_multifs_run/templates/milvus_config.json.j2
 create mode 100644 playbooks/roles/ai_multifs_setup/defaults/main.yml
 create mode 100644 playbooks/roles/ai_multifs_setup/tasks/main.yml
 create mode 100644 workflows/ai/Kconfig.fs
 create mode 100644 workflows/ai/Kconfig.multifs

diff --git a/.github/workflows/docker-tests.yml b/.github/workflows/docker-tests.yml
index c0e0d03d..adea1182 100644
--- a/.github/workflows/docker-tests.yml
+++ b/.github/workflows/docker-tests.yml
@@ -53,3 +53,9 @@ jobs:
           echo "Running simple make targets on ${{ matrix.distro_container }} environment"
           make mrproper
 
+      - name: Test fio-tests defconfig
+        run: |
+          echo "Testing fio-tests CI configuration"
+          make defconfig-fio-tests-ci
+          make
+          echo "Configuration test passed for fio-tests"
diff --git a/Makefile b/Makefile
index 8755577e..83c67340 100644
--- a/Makefile
+++ b/Makefile
@@ -226,7 +226,7 @@ include scripts/bringup.Makefile
 endif
 
 DEFAULT_DEPS += $(ANSIBLE_INVENTORY_FILE)
-$(ANSIBLE_INVENTORY_FILE): .config $(ANSIBLE_CFG_FILE) $(KDEVOPS_HOSTS_TEMPLATE)
+$(ANSIBLE_INVENTORY_FILE): .config $(ANSIBLE_CFG_FILE) $(KDEVOPS_HOSTS_TEMPLATE) $(KDEVOPS_NODES)
 	$(Q)ANSIBLE_LOCALHOST_WARNING=False ANSIBLE_INVENTORY_UNPARSED_WARNING=False \
 		ansible-playbook $(ANSIBLE_VERBOSE) \
 		$(KDEVOPS_PLAYBOOKS_DIR)/gen_hosts.yml \
diff --git a/defconfigs/ai-milvus-multifs b/defconfigs/ai-milvus-multifs
new file mode 100644
index 00000000..7e5ad971
--- /dev/null
+++ b/defconfigs/ai-milvus-multifs
@@ -0,0 +1,67 @@
+CONFIG_GUESTFS=y
+CONFIG_LIBVIRT=y
+
+# Disable mirror features for CI/testing
+# CONFIG_ENABLE_LOCAL_LINUX_MIRROR is not set
+# CONFIG_USE_LOCAL_LINUX_MIRROR is not set
+# CONFIG_INSTALL_ONLY_GIT_DAEMON is not set
+# CONFIG_MIRROR_INSTALL is not set
+
+CONFIG_WORKFLOWS=y
+CONFIG_WORKFLOW_LINUX_CUSTOM=y
+
+CONFIG_BOOTLINUX=y
+CONFIG_BOOTLINUX_9P=y
+
+# Enable A/B testing with different kernel references
+CONFIG_KDEVOPS_BASELINE_AND_DEV=y
+CONFIG_BOOTLINUX_AB_DIFFERENT_REF=y
+
+# AI workflow configuration
+CONFIG_WORKFLOWS_TESTS=y
+CONFIG_WORKFLOWS_LINUX_TESTS=y
+CONFIG_WORKFLOWS_DEDICATED_WORKFLOW=y
+CONFIG_KDEVOPS_WORKFLOW_DEDICATE_AI=y
+
+# Vector database configuration
+CONFIG_AI_TESTS_VECTOR_DATABASE=y
+CONFIG_AI_VECTOR_DB_MILVUS=y
+CONFIG_AI_VECTOR_DB_MILVUS_DOCKER=y
+
+# Enable multi-filesystem testing
+CONFIG_AI_MULTIFS_ENABLE=y
+CONFIG_AI_ENABLE_MULTIFS_TESTING=y
+
+# Enable dedicated Milvus storage with node-based filesystem
+CONFIG_AI_MILVUS_STORAGE_ENABLE=y
+CONFIG_AI_MILVUS_USE_NODE_FS=y
+
+# Test XFS with different block sizes
+CONFIG_AI_MULTIFS_TEST_XFS=y
+CONFIG_AI_MULTIFS_XFS_4K_4KS=y
+CONFIG_AI_MULTIFS_XFS_16K_4KS=y
+CONFIG_AI_MULTIFS_XFS_32K_4KS=y
+CONFIG_AI_MULTIFS_XFS_64K_4KS=y
+
+# Test EXT4 configurations
+CONFIG_AI_MULTIFS_TEST_EXT4=y
+CONFIG_AI_MULTIFS_EXT4_4K=y
+CONFIG_AI_MULTIFS_EXT4_16K_BIGALLOC=y
+
+# Test BTRFS
+CONFIG_AI_MULTIFS_TEST_BTRFS=y
+CONFIG_AI_MULTIFS_BTRFS_DEFAULT=y
+
+# Performance settings
+CONFIG_AI_BENCHMARK_ENABLE_GRAPHING=y
+CONFIG_AI_BENCHMARK_ITERATIONS=5
+
+# Dataset configuration for benchmarking
+CONFIG_AI_VECTOR_DB_MILVUS_DATASET_SIZE=100000
+CONFIG_AI_VECTOR_DB_MILVUS_BATCH_SIZE=10000
+CONFIG_AI_VECTOR_DB_MILVUS_NUM_QUERIES=10000
+
+# Container configuration
+CONFIG_AI_VECTOR_DB_MILVUS_CONTAINER_IMAGE_2_5=y
+CONFIG_AI_VECTOR_DB_MILVUS_MEMORY_LIMIT="8g"
+CONFIG_AI_VECTOR_DB_MILVUS_CPU_LIMIT="4.0"
\ No newline at end of file
diff --git a/defconfigs/ai-milvus-multifs-distro b/defconfigs/ai-milvus-multifs-distro
new file mode 100644
index 00000000..fb71f2b5
--- /dev/null
+++ b/defconfigs/ai-milvus-multifs-distro
@@ -0,0 +1,109 @@
+# AI Multi-Filesystem Performance Testing Configuration (Distro Kernel)
+# This configuration enables testing AI workloads across multiple filesystem
+# configurations including XFS (4k and 16k block sizes), ext4 (4k and 16k bigalloc),
+# and btrfs (default profile) using the distribution kernel without A/B testing.
+
+# Base virtualization setup
+CONFIG_LIBVIRT=y
+CONFIG_LIBVIRT_MACHINE_TYPE_Q35=y
+CONFIG_LIBVIRT_STORAGE_POOL_PATH="/opt/kdevops/libvirt"
+CONFIG_LIBVIRT_ENABLE_LARGEIO=y
+CONFIG_LIBVIRT_EXTRA_STORAGE_DRIVE_NVME=y
+CONFIG_LIBVIRT_EXTRA_STORAGE_DRIVE_SIZE="50GiB"
+
+# Network configuration
+CONFIG_LIBVIRT_ENABLE_BRIDGED_NETWORKING=y
+CONFIG_LIBVIRT_NET_NAME="kdevops"
+
+# Host configuration
+CONFIG_KDEVOPS_HOSTS_TEMPLATE="hosts.j2"
+CONFIG_VAGRANT_NVME_DISK_SIZE="50GiB"
+
+# Base system requirements
+CONFIG_WORKFLOWS=y
+CONFIG_WORKFLOWS_TESTS=y
+CONFIG_WORKFLOWS_LINUX_TESTS=y
+CONFIG_WORKFLOWS_DEDICATED_WORKFLOW=y
+CONFIG_KDEVOPS_WORKFLOW_DEDICATE_AI=y
+
+# AI Workflow Configuration
+CONFIG_KDEVOPS_WORKFLOW_ENABLE_AI=y
+CONFIG_AI_TESTS_VECTOR_DATABASE=y
+CONFIG_AI_MILVUS_DOCKER=y
+CONFIG_AI_VECTOR_DB_TYPE_MILVUS=y
+
+# Milvus Configuration
+CONFIG_AI_MILVUS_HOST="localhost"
+CONFIG_AI_MILVUS_PORT=19530
+CONFIG_AI_MILVUS_DATABASE_NAME="ai_benchmark"
+
+# Test Parameters (optimized for multi-fs testing)
+CONFIG_AI_BENCHMARK_ITERATIONS=3
+CONFIG_AI_DATASET_1M=y
+CONFIG_AI_VECTOR_DIM_128=y
+CONFIG_AI_BENCHMARK_RUNTIME="180"
+CONFIG_AI_BENCHMARK_WARMUP_TIME="30"
+
+# Query patterns
+CONFIG_AI_BENCHMARK_QUERY_TOPK_1=y
+CONFIG_AI_BENCHMARK_QUERY_TOPK_10=y
+
+# Batch sizes
+CONFIG_AI_BENCHMARK_BATCH_1=y
+CONFIG_AI_BENCHMARK_BATCH_10=y
+
+# Index configuration
+CONFIG_AI_INDEX_HNSW=y
+CONFIG_AI_INDEX_TYPE="HNSW"
+CONFIG_AI_INDEX_HNSW_M=16
+CONFIG_AI_INDEX_HNSW_EF_CONSTRUCTION=200
+CONFIG_AI_INDEX_HNSW_EF=64
+
+# Results and visualization
+CONFIG_AI_BENCHMARK_RESULTS_DIR="/data/ai-benchmark"
+CONFIG_AI_BENCHMARK_ENABLE_GRAPHING=y
+CONFIG_AI_BENCHMARK_GRAPH_FORMAT="png"
+CONFIG_AI_BENCHMARK_GRAPH_DPI=300
+CONFIG_AI_BENCHMARK_GRAPH_THEME="default"
+
+# Multi-filesystem testing configuration
+CONFIG_AI_ENABLE_MULTIFS_TESTING=y
+CONFIG_AI_MULTIFS_RESULTS_DIR="/data/ai-multifs-benchmark"
+
+# Enable dedicated Milvus storage with node-based filesystem
+CONFIG_AI_MILVUS_STORAGE_ENABLE=y
+CONFIG_AI_MILVUS_USE_NODE_FS=y
+CONFIG_AI_MILVUS_DEVICE="/dev/disk/by-id/nvme-QEMU_NVMe_Ctrl_kdevops3"
+CONFIG_AI_MILVUS_MOUNT_POINT="/data/milvus"
+
+# XFS configurations
+CONFIG_AI_MULTIFS_TEST_XFS=y
+CONFIG_AI_MULTIFS_XFS_4K_4KS=y
+CONFIG_AI_MULTIFS_XFS_16K_4KS=y
+CONFIG_AI_MULTIFS_XFS_32K_4KS=y
+CONFIG_AI_MULTIFS_XFS_64K_4KS=y
+
+# ext4 configurations
+CONFIG_AI_MULTIFS_TEST_EXT4=y
+CONFIG_AI_MULTIFS_EXT4_4K=y
+CONFIG_AI_MULTIFS_EXT4_16K_BIGALLOC=y
+
+# btrfs configurations
+CONFIG_AI_MULTIFS_TEST_BTRFS=y
+CONFIG_AI_MULTIFS_BTRFS_DEFAULT=y
+
+# Standard filesystem configuration (for comparison)
+CONFIG_AI_FILESYSTEM_XFS=y
+CONFIG_AI_FILESYSTEM="xfs"
+CONFIG_AI_FSTYPE="xfs"
+CONFIG_AI_XFS_MKFS_OPTS="-f -s size=4096"
+CONFIG_AI_XFS_MOUNT_OPTS="rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota"
+
+# Use distribution kernel (no kernel building)
+# CONFIG_BOOTLINUX is not set
+
+# Memory configuration
+CONFIG_LIBVIRT_MEM_MB=16384
+
+# Disable A/B testing to use single baseline configuration
+# CONFIG_KDEVOPS_BASELINE_AND_DEV is not set
diff --git a/defconfigs/ai-milvus-multifs-extended b/defconfigs/ai-milvus-multifs-extended
new file mode 100644
index 00000000..7886c8c4
--- /dev/null
+++ b/defconfigs/ai-milvus-multifs-extended
@@ -0,0 +1,108 @@
+# AI Extended Multi-Filesystem Performance Testing Configuration (Distro Kernel)
+# This configuration enables testing AI workloads across multiple filesystem
+# configurations including XFS (4k, 16k, 32k, 64k block sizes), ext4 (4k and 16k bigalloc),
+# and btrfs (default profile) using the distribution kernel without A/B testing.
+
+# Base virtualization setup
+CONFIG_LIBVIRT=y
+CONFIG_LIBVIRT_MACHINE_TYPE_Q35=y
+CONFIG_LIBVIRT_STORAGE_POOL_PATH="/opt/kdevops/libvirt"
+CONFIG_LIBVIRT_ENABLE_LARGEIO=y
+CONFIG_LIBVIRT_EXTRA_STORAGE_DRIVE_NVME=y
+CONFIG_LIBVIRT_EXTRA_STORAGE_DRIVE_SIZE="50GiB"
+
+# Network configuration
+CONFIG_LIBVIRT_ENABLE_BRIDGED_NETWORKING=y
+CONFIG_LIBVIRT_NET_NAME="kdevops"
+
+# Host configuration
+CONFIG_KDEVOPS_HOSTS_TEMPLATE="hosts.j2"
+CONFIG_VAGRANT_NVME_DISK_SIZE="50GiB"
+
+# Base system requirements
+CONFIG_WORKFLOWS=y
+CONFIG_WORKFLOWS_TESTS=y
+CONFIG_WORKFLOWS_LINUX_TESTS=y
+CONFIG_WORKFLOWS_DEDICATED_WORKFLOW=y
+CONFIG_KDEVOPS_WORKFLOW_DEDICATE_AI=y
+
+# AI Workflow Configuration
+CONFIG_KDEVOPS_WORKFLOW_ENABLE_AI=y
+CONFIG_AI_TESTS_VECTOR_DATABASE=y
+CONFIG_AI_VECTOR_DB_MILVUS=y
+CONFIG_AI_VECTOR_DB_MILVUS_DOCKER=y
+
+# Test Parameters (optimized for multi-fs testing)
+CONFIG_AI_BENCHMARK_ITERATIONS=3
+CONFIG_AI_DATASET_1M=y
+CONFIG_AI_VECTOR_DIM_128=y
+CONFIG_AI_BENCHMARK_RUNTIME="180"
+CONFIG_AI_BENCHMARK_WARMUP_TIME="30"
+
+# Query patterns
+CONFIG_AI_BENCHMARK_QUERY_TOPK_1=y
+CONFIG_AI_BENCHMARK_QUERY_TOPK_10=y
+
+# Batch sizes
+CONFIG_AI_BENCHMARK_BATCH_1=y
+CONFIG_AI_BENCHMARK_BATCH_10=y
+
+# Index configuration
+CONFIG_AI_INDEX_HNSW=y
+CONFIG_AI_INDEX_TYPE="HNSW"
+CONFIG_AI_INDEX_HNSW_M=16
+CONFIG_AI_INDEX_HNSW_EF_CONSTRUCTION=200
+CONFIG_AI_INDEX_HNSW_EF=64
+
+# Results and visualization
+CONFIG_AI_BENCHMARK_RESULTS_DIR="/data/ai-benchmark"
+CONFIG_AI_BENCHMARK_ENABLE_GRAPHING=y
+CONFIG_AI_BENCHMARK_GRAPH_FORMAT="png"
+CONFIG_AI_BENCHMARK_GRAPH_DPI=300
+CONFIG_AI_BENCHMARK_GRAPH_THEME="default"
+
+# Multi-filesystem testing configuration
+CONFIG_AI_MULTIFS_ENABLE=y
+CONFIG_AI_ENABLE_MULTIFS_TESTING=y
+CONFIG_AI_MULTIFS_RESULTS_DIR="/data/ai-multifs-benchmark"
+
+# Enable dedicated Milvus storage with node-based filesystem
+CONFIG_AI_MILVUS_STORAGE_ENABLE=y
+CONFIG_AI_MILVUS_USE_NODE_FS=y
+CONFIG_AI_MILVUS_DEVICE="/dev/disk/by-id/nvme-QEMU_NVMe_Ctrl_kdevops3"
+CONFIG_AI_MILVUS_MOUNT_POINT="/data/milvus"
+
+# Extended XFS configurations (4k, 16k, 32k, 64k block sizes)
+CONFIG_AI_MULTIFS_TEST_XFS=y
+CONFIG_AI_MULTIFS_XFS_4K_4KS=y
+CONFIG_AI_MULTIFS_XFS_16K_4KS=y
+CONFIG_AI_MULTIFS_XFS_32K_4KS=y
+CONFIG_AI_MULTIFS_XFS_64K_4KS=y
+
+# ext4 configurations
+CONFIG_AI_MULTIFS_TEST_EXT4=y
+CONFIG_AI_MULTIFS_EXT4_4K=y
+CONFIG_AI_MULTIFS_EXT4_16K_BIGALLOC=y
+
+# btrfs configurations
+CONFIG_AI_MULTIFS_TEST_BTRFS=y
+CONFIG_AI_MULTIFS_BTRFS_DEFAULT=y
+
+# Standard filesystem configuration (for comparison)
+CONFIG_AI_FILESYSTEM_XFS=y
+CONFIG_AI_FILESYSTEM="xfs"
+CONFIG_AI_FSTYPE="xfs"
+CONFIG_AI_XFS_MKFS_OPTS="-f -s size=4096"
+CONFIG_AI_XFS_MOUNT_OPTS="rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota"
+
+# Use distribution kernel (no kernel building)
+# CONFIG_BOOTLINUX is not set
+
+# Memory configuration
+CONFIG_LIBVIRT_MEM_MB=16384
+
+# Baseline/dev testing setup
+CONFIG_KDEVOPS_BASELINE_AND_DEV=y
+# Build Linux
+CONFIG_WORKFLOW_LINUX_CUSTOM=y
+CONFIG_BOOTLINUX_AB_DIFFERENT_REF=y
diff --git a/docs/ai/vector-databases/README.md b/docs/ai/vector-databases/README.md
index 2a3955d7..0fdd204b 100644
--- a/docs/ai/vector-databases/README.md
+++ b/docs/ai/vector-databases/README.md
@@ -52,7 +52,6 @@ Vector databases heavily depend on storage performance. The workflow tests acros
 - **XFS**: Default for many production deployments
 - **ext4**: Traditional Linux filesystem
 - **btrfs**: Copy-on-write with compression support
-- **ZFS**: Advanced features for data integrity
 
 ## Configuration Dimensions
 
diff --git a/playbooks/ai_install.yml b/playbooks/ai_install.yml
index 70b734e4..38e6671c 100644
--- a/playbooks/ai_install.yml
+++ b/playbooks/ai_install.yml
@@ -4,5 +4,11 @@
   become: true
   become_user: root
   roles:
+    - role: ai_docker_storage
+      when: ai_docker_storage_enable | default(true)
+      tags: ['ai', 'docker', 'storage']
+    - role: ai_milvus_storage
+      when: ai_milvus_storage_enable | default(false)
+      tags: ['ai', 'milvus', 'storage']
     - role: milvus
       tags: ['ai', 'vector_db', 'milvus', 'install']
diff --git a/playbooks/ai_multifs.yml b/playbooks/ai_multifs.yml
new file mode 100644
index 00000000..637f11f4
--- /dev/null
+++ b/playbooks/ai_multifs.yml
@@ -0,0 +1,24 @@
+---
+- hosts: baseline
+  become: yes
+  gather_facts: yes
+  vars:
+    ai_benchmark_results_dir: "{{ ai_multifs_results_dir | default('/data/ai-multifs-benchmark') }}"
+  roles:
+    - role: ai_multifs_setup
+    - role: ai_multifs_run
+  tasks:
+    - name: Final multi-filesystem testing summary
+      debug:
+        msg: |
+          Multi-filesystem AI benchmark testing completed!
+
+          Results directory: {{ ai_multifs_results_dir }}
+          Comparison report: {{ ai_multifs_results_dir }}/comparison/multi_filesystem_comparison.html
+
+          Individual filesystem results:
+          {% for config in ai_multifs_configurations %}
+          {% if config.enabled %}
+          - {{ config.name }}: {{ ai_multifs_results_dir }}/{{ config.name }}/
+          {% endif %}
+          {% endfor %}
diff --git a/playbooks/host_vars/debian13-ai-xfs-4k-4ks.yml b/playbooks/host_vars/debian13-ai-xfs-4k-4ks.yml
deleted file mode 100644
index ffe9eb28..00000000
--- a/playbooks/host_vars/debian13-ai-xfs-4k-4ks.yml
+++ /dev/null
@@ -1,10 +0,0 @@
----
-# XFS 4k block, 4k sector configuration
-ai_docker_fstype: "xfs"
-ai_docker_xfs_blocksize: 4096
-ai_docker_xfs_sectorsize: 4096
-ai_docker_xfs_mkfs_opts: ""
-filesystem_type: "xfs"
-filesystem_block_size: "4k-4ks"
-ai_filesystem: "xfs"
-ai_data_device_path: "/var/lib/docker"
\ No newline at end of file
diff --git a/playbooks/roles/ai_collect_results/files/analyze_results.py b/playbooks/roles/ai_collect_results/files/analyze_results.py
index 3d11fb11..2dc4a1d6 100755
--- a/playbooks/roles/ai_collect_results/files/analyze_results.py
+++ b/playbooks/roles/ai_collect_results/files/analyze_results.py
@@ -226,6 +226,68 @@ class ResultsAnalyzer:
 
         return fs_info
 
+    def _extract_filesystem_config(
+        self, result: Dict[str, Any]
+    ) -> tuple[str, str, str]:
+        """Extract filesystem type and block size from result data.
+        Returns (fs_type, block_size, config_key)"""
+        filename = result.get("_file", "")
+
+        # Primary: Extract filesystem type from filename (more reliable than JSON)
+        fs_type = "unknown"
+        block_size = "default"
+
+        if "xfs" in filename:
+            fs_type = "xfs"
+            # Check larger sizes first to avoid substring matches
+            if "64k" in filename and "64k-" in filename:
+                block_size = "64k"
+            elif "32k" in filename and "32k-" in filename:
+                block_size = "32k"
+            elif "16k" in filename and "16k-" in filename:
+                block_size = "16k"
+            elif "4k" in filename and "4k-" in filename:
+                block_size = "4k"
+        elif "ext4" in filename:
+            fs_type = "ext4"
+            if "16k" in filename:
+                block_size = "16k"
+            elif "4k" in filename:
+                block_size = "4k"
+        elif "btrfs" in filename:
+            fs_type = "btrfs"
+            block_size = "default"
+        else:
+            # Fallback to JSON data if filename parsing fails
+            fs_type = result.get("filesystem", "unknown")
+            self.logger.warning(
+                f"Could not determine filesystem from filename {filename}, using JSON data: {fs_type}"
+            )
+
+        config_key = f"{fs_type}-{block_size}" if block_size != "default" else fs_type
+        return fs_type, block_size, config_key
+
+    def _extract_node_info(self, result: Dict[str, Any]) -> tuple[str, bool]:
+        """Extract node hostname and determine if it's a dev node.
+        Returns (hostname, is_dev_node)"""
+        # Get hostname from system_info (preferred) or fall back to filename
+        system_info = result.get("system_info", {})
+        hostname = system_info.get("hostname", "")
+
+        # If no hostname in system_info, try extracting from filename
+        if not hostname:
+            filename = result.get("_file", "")
+            # Remove results_ prefix and .json suffix
+            hostname = filename.replace("results_", "").replace(".json", "")
+            # Remove iteration number if present (_1, _2, etc.)
+            if "_" in hostname and hostname.split("_")[-1].isdigit():
+                hostname = "_".join(hostname.split("_")[:-1])
+
+        # Determine if this is a dev node
+        is_dev = hostname.endswith("-dev")
+
+        return hostname, is_dev
+
     def load_results(self) -> bool:
         """Load all result files from the results directory"""
         try:
@@ -391,6 +453,8 @@ class ResultsAnalyzer:
             html.append(
                 "        .highlight { background-color: #fff3cd; padding: 10px; border-radius: 3px; }"
             )
+            html.append("        .baseline-row { background-color: #e8f5e9; }")
+            html.append("        .dev-row { background-color: #e3f2fd; }")
             html.append("    </style>")
             html.append("</head>")
             html.append("<body>")
@@ -486,26 +550,69 @@ class ResultsAnalyzer:
             else:
                 html.append("        <p>No storage device information available.</p>")
 
-            # Filesystem section
-            html.append("        <h3>🗂️ Filesystem Configuration</h3>")
-            fs_info = self.system_info.get("filesystem_info", {})
-            html.append("        <table class='config-table'>")
-            html.append(
-                "            <tr><td>Filesystem Type</td><td>"
-                + str(fs_info.get("filesystem_type", "Unknown"))
-                + "</td></tr>"
-            )
-            html.append(
-                "            <tr><td>Mount Point</td><td>"
-                + str(fs_info.get("mount_point", "Unknown"))
-                + "</td></tr>"
-            )
-            html.append(
-                "            <tr><td>Mount Options</td><td>"
-                + str(fs_info.get("mount_options", "Unknown"))
-                + "</td></tr>"
-            )
-            html.append("        </table>")
+            # Node Configuration section - Extract from actual benchmark results
+            html.append("        <h3>🗂️ Node Configuration</h3>")
+
+            # Collect node and filesystem information from benchmark results
+            node_configs = {}
+            for result in self.results_data:
+                # Extract node information
+                hostname, is_dev = self._extract_node_info(result)
+                fs_type, block_size, config_key = self._extract_filesystem_config(
+                    result
+                )
+
+                system_info = result.get("system_info", {})
+                data_path = system_info.get("data_path", "/data/milvus")
+                mount_point = system_info.get("mount_point", "/data")
+                kernel_version = system_info.get("kernel_version", "unknown")
+
+                if hostname not in node_configs:
+                    node_configs[hostname] = {
+                        "hostname": hostname,
+                        "node_type": "Development" if is_dev else "Baseline",
+                        "filesystem": fs_type,
+                        "block_size": block_size,
+                        "data_path": data_path,
+                        "mount_point": mount_point,
+                        "kernel": kernel_version,
+                        "test_count": 0,
+                    }
+                node_configs[hostname]["test_count"] += 1
+
+            if node_configs:
+                html.append("        <table class='config-table'>")
+                html.append(
+                    "            <tr><th>Node</th><th>Type</th><th>Filesystem</th><th>Block Size</th><th>Data Path</th><th>Mount Point</th><th>Kernel</th><th>Tests</th></tr>"
+                )
+                # Sort nodes with baseline first, then dev
+                sorted_nodes = sorted(
+                    node_configs.items(),
+                    key=lambda x: (x[1]["node_type"] != "Baseline", x[0]),
+                )
+                for hostname, config_info in sorted_nodes:
+                    row_class = (
+                        "dev-row"
+                        if config_info["node_type"] == "Development"
+                        else "baseline-row"
+                    )
+                    html.append(f"            <tr class='{row_class}'>")
+                    html.append(f"                <td><strong>{hostname}</strong></td>")
+                    html.append(f"                <td>{config_info['node_type']}</td>")
+                    html.append(f"                <td>{config_info['filesystem']}</td>")
+                    html.append(f"                <td>{config_info['block_size']}</td>")
+                    html.append(f"                <td>{config_info['data_path']}</td>")
+                    html.append(
+                        f"                <td>{config_info['mount_point']}</td>"
+                    )
+                    html.append(f"                <td>{config_info['kernel']}</td>")
+                    html.append(f"                <td>{config_info['test_count']}</td>")
+                    html.append(f"            </tr>")
+                html.append("        </table>")
+            else:
+                html.append(
+                    "        <p>No node configuration data found in results.</p>"
+                )
             html.append("    </div>")
 
             # Test Configuration Section
@@ -551,92 +658,192 @@ class ResultsAnalyzer:
                 html.append("        </table>")
                 html.append("    </div>")
 
-            # Performance Results Section
+            # Performance Results Section - Per Node
             html.append("    <div class='section'>")
-            html.append("        <h2>📊 Performance Results Summary</h2>")
+            html.append("        <h2>📊 Performance Results by Node</h2>")
 
             if self.results_data:
-                # Insert performance
-                insert_times = [
-                    r.get("insert_performance", {}).get("total_time_seconds", 0)
-                    for r in self.results_data
-                ]
-                insert_rates = [
-                    r.get("insert_performance", {}).get("vectors_per_second", 0)
-                    for r in self.results_data
-                ]
-
-                if insert_times and any(t > 0 for t in insert_times):
-                    html.append("        <h3>📈 Vector Insert Performance</h3>")
-                    html.append("        <table class='metric-table'>")
-                    html.append(
-                        f"            <tr><td>Average Insert Time</td><td>{np.mean(insert_times):.2f} seconds</td></tr>"
-                    )
-                    html.append(
-                        f"            <tr><td>Average Insert Rate</td><td>{np.mean(insert_rates):.2f} vectors/sec</td></tr>"
+                # Group results by node
+                node_performance = {}
+
+                for result in self.results_data:
+                    # Use node hostname as the grouping key
+                    hostname, is_dev = self._extract_node_info(result)
+                    fs_type, block_size, config_key = self._extract_filesystem_config(
+                        result
                     )
-                    html.append(
-                        f"            <tr><td>Insert Rate Range</td><td>{np.min(insert_rates):.2f} - {np.max(insert_rates):.2f} vectors/sec</td></tr>"
-                    )
-                    html.append("        </table>")
 
-                # Index performance
-                index_times = [
-                    r.get("index_performance", {}).get("creation_time_seconds", 0)
-                    for r in self.results_data
-                ]
-                if index_times and any(t > 0 for t in index_times):
-                    html.append("        <h3>🔗 Index Creation Performance</h3>")
-                    html.append("        <table class='metric-table'>")
-                    html.append(
-                        f"            <tr><td>Average Index Creation Time</td><td>{np.mean(index_times):.2f} seconds</td></tr>"
+                    if hostname not in node_performance:
+                        node_performance[hostname] = {
+                            "hostname": hostname,
+                            "node_type": "Development" if is_dev else "Baseline",
+                            "insert_rates": [],
+                            "insert_times": [],
+                            "index_times": [],
+                            "query_performance": {},
+                            "filesystem": fs_type,
+                            "block_size": block_size,
+                        }
+
+                    # Add insert performance
+                    insert_perf = result.get("insert_performance", {})
+                    if insert_perf:
+                        rate = insert_perf.get("vectors_per_second", 0)
+                        time = insert_perf.get("total_time_seconds", 0)
+                        if rate > 0:
+                            node_performance[hostname]["insert_rates"].append(rate)
+                        if time > 0:
+                            node_performance[hostname]["insert_times"].append(time)
+
+                    # Add index performance
+                    index_perf = result.get("index_performance", {})
+                    if index_perf:
+                        time = index_perf.get("creation_time_seconds", 0)
+                        if time > 0:
+                            node_performance[hostname]["index_times"].append(time)
+
+                    # Collect query performance (use first result for each node)
+                    query_perf = result.get("query_performance", {})
+                    if (
+                        query_perf
+                        and not node_performance[hostname]["query_performance"]
+                    ):
+                        node_performance[hostname]["query_performance"] = query_perf
+
+                # Display results for each node, sorted with baseline first
+                sorted_nodes = sorted(
+                    node_performance.items(),
+                    key=lambda x: (x[1]["node_type"] != "Baseline", x[0]),
+                )
+                for hostname, perf_data in sorted_nodes:
+                    node_type_badge = (
+                        "🔵" if perf_data["node_type"] == "Development" else "🟢"
                     )
                     html.append(
-                        f"            <tr><td>Index Time Range</td><td>{np.min(index_times):.2f} - {np.max(index_times):.2f} seconds</td></tr>"
+                        f"        <h3>{node_type_badge} {hostname} ({perf_data['node_type']})</h3>"
                     )
-                    html.append("        </table>")
-
-                # Query performance
-                html.append("        <h3>🔍 Query Performance</h3>")
-                first_query_perf = self.results_data[0].get("query_performance", {})
-                if first_query_perf:
-                    html.append("        <table>")
                     html.append(
-                        "            <tr><th>Query Type</th><th>Batch Size</th><th>QPS</th><th>Avg Latency (ms)</th></tr>"
+                        f"        <p>Filesystem: {perf_data['filesystem']}, Block Size: {perf_data['block_size']}</p>"
                     )
 
-                    for topk, topk_data in first_query_perf.items():
-                        for batch, batch_data in topk_data.items():
-                            qps = batch_data.get("queries_per_second", 0)
-                            avg_time = batch_data.get("average_time_seconds", 0) * 1000
-
-                            # Color coding for performance
-                            qps_class = ""
-                            if qps > 1000:
-                                qps_class = "performance-good"
-                            elif qps > 100:
-                                qps_class = "performance-warning"
-                            else:
-                                qps_class = "performance-poor"
-
-                            html.append(f"            <tr>")
-                            html.append(
-                                f"                <td>{topk.replace('topk_', 'Top-')}</td>"
-                            )
-                            html.append(
-                                f"                <td>{batch.replace('batch_', 'Batch ')}</td>"
-                            )
-                            html.append(
-                                f"                <td class='{qps_class}'>{qps:.2f}</td>"
-                            )
-                            html.append(f"                <td>{avg_time:.2f}</td>")
-                            html.append(f"            </tr>")
+                    # Insert performance
+                    insert_rates = perf_data["insert_rates"]
+                    if insert_rates:
+                        html.append("        <h4>📈 Vector Insert Performance</h4>")
+                        html.append("        <table class='metric-table'>")
+                        html.append(
+                            f"            <tr><td>Average Insert Rate</td><td>{np.mean(insert_rates):.2f} vectors/sec</td></tr>"
+                        )
+                        html.append(
+                            f"            <tr><td>Insert Rate Range</td><td>{np.min(insert_rates):.2f} - {np.max(insert_rates):.2f} vectors/sec</td></tr>"
+                        )
+                        html.append(
+                            f"            <tr><td>Test Iterations</td><td>{len(insert_rates)}</td></tr>"
+                        )
+                        html.append("        </table>")
+
+                    # Index performance
+                    index_times = perf_data["index_times"]
+                    if index_times:
+                        html.append("        <h4>🔗 Index Creation Performance</h4>")
+                        html.append("        <table class='metric-table'>")
+                        html.append(
+                            f"            <tr><td>Average Index Creation Time</td><td>{np.mean(index_times):.3f} seconds</td></tr>"
+                        )
+                        html.append(
+                            f"            <tr><td>Index Time Range</td><td>{np.min(index_times):.3f} - {np.max(index_times):.3f} seconds</td></tr>"
+                        )
+                        html.append("        </table>")
+
+                    # Query performance
+                    query_perf = perf_data["query_performance"]
+                    if query_perf:
+                        html.append("        <h4>🔍 Query Performance</h4>")
+                        html.append("        <table>")
+                        html.append(
+                            "            <tr><th>Query Type</th><th>Batch Size</th><th>QPS</th><th>Avg Latency (ms)</th></tr>"
+                        )
 
-                    html.append("        </table>")
+                        for topk, topk_data in query_perf.items():
+                            for batch, batch_data in topk_data.items():
+                                qps = batch_data.get("queries_per_second", 0)
+                                avg_time = (
+                                    batch_data.get("average_time_seconds", 0) * 1000
+                                )
+
+                                # Color coding for performance
+                                qps_class = ""
+                                if qps > 1000:
+                                    qps_class = "performance-good"
+                                elif qps > 100:
+                                    qps_class = "performance-warning"
+                                else:
+                                    qps_class = "performance-poor"
+
+                                html.append(f"            <tr>")
+                                html.append(
+                                    f"                <td>{topk.replace('topk_', 'Top-')}</td>"
+                                )
+                                html.append(
+                                    f"                <td>{batch.replace('batch_', 'Batch ')}</td>"
+                                )
+                                html.append(
+                                    f"                <td class='{qps_class}'>{qps:.2f}</td>"
+                                )
+                                html.append(f"                <td>{avg_time:.2f}</td>")
+                                html.append(f"            </tr>")
+                        html.append("        </table>")
+
+                    html.append("        <br>")  # Add spacing between configurations
 
-                html.append("    </div>")
+            html.append("    </div>")
 
             # Footer
+            # Performance Graphs Section
+            html.append("    <div class='section'>")
+            html.append("        <h2>📈 Performance Visualizations</h2>")
+            html.append(
+                "        <p>The following graphs provide visual analysis of the benchmark results across all tested filesystem configurations:</p>"
+            )
+            html.append("        <ul>")
+            html.append(
+                "            <li><strong>Insert Performance:</strong> Shows vector insertion rates and times for each filesystem configuration</li>"
+            )
+            html.append(
+                "            <li><strong>Query Performance:</strong> Displays query performance heatmaps for different Top-K and batch sizes</li>"
+            )
+            html.append(
+                "            <li><strong>Index Performance:</strong> Compares index creation times across filesystems</li>"
+            )
+            html.append(
+                "            <li><strong>Performance Matrix:</strong> Comprehensive comparison matrix of all metrics</li>"
+            )
+            html.append(
+                "            <li><strong>Filesystem Comparison:</strong> Side-by-side comparison of filesystem performance</li>"
+            )
+            html.append("        </ul>")
+            html.append(
+                "        <p><em>Note: Graphs are generated as separate PNG files in the same directory as this report.</em></p>"
+            )
+            html.append("        <div style='margin-top: 20px;'>")
+            html.append(
+                "            <img src='insert_performance.png' alt='Insert Performance' style='max-width: 100%; height: auto; margin-bottom: 20px;'>"
+            )
+            html.append(
+                "            <img src='query_performance.png' alt='Query Performance' style='max-width: 100%; height: auto; margin-bottom: 20px;'>"
+            )
+            html.append(
+                "            <img src='index_performance.png' alt='Index Performance' style='max-width: 100%; height: auto; margin-bottom: 20px;'>"
+            )
+            html.append(
+                "            <img src='performance_matrix.png' alt='Performance Matrix' style='max-width: 100%; height: auto; margin-bottom: 20px;'>"
+            )
+            html.append(
+                "            <img src='filesystem_comparison.png' alt='Filesystem Comparison' style='max-width: 100%; height: auto; margin-bottom: 20px;'>"
+            )
+            html.append("        </div>")
+            html.append("    </div>")
+
             html.append("    <div class='section'>")
             html.append("        <h2>📝 Notes</h2>")
             html.append("        <ul>")
@@ -661,10 +868,11 @@ class ResultsAnalyzer:
             return "\n".join(html)
 
         except Exception as e:
-            self.logger.error(f"Error generating HTML report: {e}")
-            return (
-                f"<html><body><h1>Error generating HTML report: {e}</h1></body></html>"
-            )
+            import traceback
+
+            tb = traceback.format_exc()
+            self.logger.error(f"Error generating HTML report: {e}\n{tb}")
+            return f"<html><body><h1>Error generating HTML report: {e}</h1><pre>{tb}</pre></body></html>"
 
     def generate_graphs(self) -> bool:
         """Generate performance visualization graphs"""
@@ -691,6 +899,9 @@ class ResultsAnalyzer:
             # Graph 4: Performance Comparison Matrix
             self._plot_performance_matrix()
 
+            # Graph 5: Multi-filesystem Comparison (if applicable)
+            self._plot_filesystem_comparison()
+
             self.logger.info("Graphs generated successfully")
             return True
 
@@ -699,34 +910,188 @@ class ResultsAnalyzer:
             return False
 
     def _plot_insert_performance(self):
-        """Plot insert performance metrics"""
-        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
+        """Plot insert performance metrics with node differentiation"""
+        # Group data by node
+        node_performance = {}
 
-        # Extract insert data
-        iterations = []
-        insert_rates = []
-        insert_times = []
+        for result in self.results_data:
+            hostname, is_dev = self._extract_node_info(result)
+
+            if hostname not in node_performance:
+                node_performance[hostname] = {
+                    "insert_rates": [],
+                    "insert_times": [],
+                    "iterations": [],
+                    "is_dev": is_dev,
+                }
 
-        for i, result in enumerate(self.results_data):
             insert_perf = result.get("insert_performance", {})
             if insert_perf:
-                iterations.append(i + 1)
-                insert_rates.append(insert_perf.get("vectors_per_second", 0))
-                insert_times.append(insert_perf.get("total_time_seconds", 0))
-
-        # Plot insert rate
-        ax1.plot(iterations, insert_rates, "b-o", linewidth=2, markersize=6)
-        ax1.set_xlabel("Iteration")
-        ax1.set_ylabel("Vectors/Second")
-        ax1.set_title("Vector Insert Rate Performance")
-        ax1.grid(True, alpha=0.3)
-
-        # Plot insert time
-        ax2.plot(iterations, insert_times, "r-o", linewidth=2, markersize=6)
-        ax2.set_xlabel("Iteration")
-        ax2.set_ylabel("Total Time (seconds)")
-        ax2.set_title("Vector Insert Time Performance")
-        ax2.grid(True, alpha=0.3)
+                node_performance[hostname]["insert_rates"].append(
+                    insert_perf.get("vectors_per_second", 0)
+                )
+                node_performance[hostname]["insert_times"].append(
+                    insert_perf.get("total_time_seconds", 0)
+                )
+                node_performance[hostname]["iterations"].append(
+                    len(node_performance[hostname]["insert_rates"])
+                )
+
+        # Check if we have multiple nodes
+        if len(node_performance) > 1:
+            # Multi-node mode: separate lines for each node
+            fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(18, 7))
+
+            # Sort nodes with baseline first, then dev
+            sorted_nodes = sorted(
+                node_performance.items(), key=lambda x: (x[1]["is_dev"], x[0])
+            )
+
+            # Create color palettes for baseline and dev nodes
+            baseline_colors = [
+                "#2E7D32",
+                "#43A047",
+                "#66BB6A",
+                "#81C784",
+                "#A5D6A7",
+                "#C8E6C9",
+            ]  # Greens
+            dev_colors = [
+                "#0D47A1",
+                "#1565C0",
+                "#1976D2",
+                "#1E88E5",
+                "#2196F3",
+                "#42A5F5",
+                "#64B5F6",
+            ]  # Blues
+
+            # Additional colors if needed
+            extra_colors = [
+                "#E65100",
+                "#F57C00",
+                "#FF9800",
+                "#FFB300",
+                "#FFC107",
+                "#FFCA28",
+            ]  # Oranges
+
+            # Line styles to cycle through
+            line_styles = ["-", "--", "-.", ":"]
+            markers = ["o", "s", "^", "v", "D", "p", "*", "h"]
+
+            baseline_idx = 0
+            dev_idx = 0
+
+            # Use different colors and styles for each node
+            for idx, (hostname, perf_data) in enumerate(sorted_nodes):
+                if not perf_data["insert_rates"]:
+                    continue
+
+                # Choose color and style based on node type and index
+                if perf_data["is_dev"]:
+                    # Development nodes - blues
+                    color = dev_colors[dev_idx % len(dev_colors)]
+                    linestyle = line_styles[
+                        (dev_idx // len(dev_colors)) % len(line_styles)
+                    ]
+                    marker = markers[4 + (dev_idx % 4)]  # Use markers 4-7 for dev
+                    label = f"{hostname} (Dev)"
+                    dev_idx += 1
+                else:
+                    # Baseline nodes - greens
+                    color = baseline_colors[baseline_idx % len(baseline_colors)]
+                    linestyle = line_styles[
+                        (baseline_idx // len(baseline_colors)) % len(line_styles)
+                    ]
+                    marker = markers[
+                        baseline_idx % 4
+                    ]  # Use first 4 markers for baseline
+                    label = f"{hostname} (Baseline)"
+                    baseline_idx += 1
+
+                iterations = list(range(1, len(perf_data["insert_rates"]) + 1))
+
+                # Plot insert rate with alpha for better visibility
+                ax1.plot(
+                    iterations,
+                    perf_data["insert_rates"],
+                    color=color,
+                    linestyle=linestyle,
+                    marker=marker,
+                    linewidth=1.5,
+                    markersize=5,
+                    label=label,
+                    alpha=0.8,
+                )
+
+                # Plot insert time
+                ax2.plot(
+                    iterations,
+                    perf_data["insert_times"],
+                    color=color,
+                    linestyle=linestyle,
+                    marker=marker,
+                    linewidth=1.5,
+                    markersize=5,
+                    label=label,
+                    alpha=0.8,
+                )
+
+            ax1.set_xlabel("Iteration")
+            ax1.set_ylabel("Vectors/Second")
+            ax1.set_title("Milvus Insert Rate by Node")
+            ax1.grid(True, alpha=0.3)
+            # Position legend outside plot area for better visibility with many nodes
+            ax1.legend(bbox_to_anchor=(1.05, 1), loc="upper left", fontsize=7, ncol=1)
+
+            ax2.set_xlabel("Iteration")
+            ax2.set_ylabel("Total Time (seconds)")
+            ax2.set_title("Milvus Insert Time by Node")
+            ax2.grid(True, alpha=0.3)
+            # Position legend outside plot area for better visibility with many nodes
+            ax2.legend(bbox_to_anchor=(1.05, 1), loc="upper left", fontsize=7, ncol=1)
+
+            plt.suptitle(
+                "Insert Performance Analysis: Baseline vs Development",
+                fontsize=14,
+                y=1.02,
+            )
+        else:
+            # Single node mode: original behavior
+            fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
+
+            # Extract insert data from single node
+            hostname = list(node_performance.keys())[0] if node_performance else None
+            if hostname:
+                perf_data = node_performance[hostname]
+                iterations = list(range(1, len(perf_data["insert_rates"]) + 1))
+
+                # Plot insert rate
+                ax1.plot(
+                    iterations,
+                    perf_data["insert_rates"],
+                    "b-o",
+                    linewidth=2,
+                    markersize=6,
+                )
+                ax1.set_xlabel("Iteration")
+                ax1.set_ylabel("Vectors/Second")
+                ax1.set_title(f"Vector Insert Rate Performance - {hostname}")
+                ax1.grid(True, alpha=0.3)
+
+                # Plot insert time
+                ax2.plot(
+                    iterations,
+                    perf_data["insert_times"],
+                    "r-o",
+                    linewidth=2,
+                    markersize=6,
+                )
+                ax2.set_xlabel("Iteration")
+                ax2.set_ylabel("Total Time (seconds)")
+                ax2.set_title(f"Vector Insert Time Performance - {hostname}")
+                ax2.grid(True, alpha=0.3)
 
         plt.tight_layout()
         output_file = os.path.join(
@@ -739,52 +1104,110 @@ class ResultsAnalyzer:
         plt.close()
 
     def _plot_query_performance(self):
-        """Plot query performance metrics"""
+        """Plot query performance metrics comparing baseline vs dev nodes"""
         if not self.results_data:
             return
 
-        # Collect query performance data
-        query_data = []
+        # Group data by filesystem configuration
+        fs_groups = {}
         for result in self.results_data:
+            hostname, is_dev = self._extract_node_info(result)
+            fs_type, block_size, config_key = self._extract_filesystem_config(result)
+
+            if config_key not in fs_groups:
+                fs_groups[config_key] = {"baseline": [], "dev": []}
+
             query_perf = result.get("query_performance", {})
-            for topk, topk_data in query_perf.items():
-                for batch, batch_data in topk_data.items():
-                    query_data.append(
-                        {
-                            "topk": topk.replace("topk_", ""),
-                            "batch": batch.replace("batch_", ""),
-                            "qps": batch_data.get("queries_per_second", 0),
-                            "avg_time": batch_data.get("average_time_seconds", 0)
-                            * 1000,  # Convert to ms
-                        }
-                    )
+            if query_perf:
+                node_type = "dev" if is_dev else "baseline"
+                for topk, topk_data in query_perf.items():
+                    for batch, batch_data in topk_data.items():
+                        fs_groups[config_key][node_type].append(
+                            {
+                                "hostname": hostname,
+                                "topk": topk.replace("topk_", ""),
+                                "batch": batch.replace("batch_", ""),
+                                "qps": batch_data.get("queries_per_second", 0),
+                                "avg_time": batch_data.get("average_time_seconds", 0)
+                                * 1000,
+                            }
+                        )
 
-        if not query_data:
+        if not fs_groups:
             return
 
-        df = pd.DataFrame(query_data)
+        # Create subplots for each filesystem config
+        n_configs = len(fs_groups)
+        fig_height = max(8, 4 * n_configs)
+        fig, axes = plt.subplots(n_configs, 2, figsize=(16, fig_height))
 
-        # Create subplots
-        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
+        if n_configs == 1:
+            axes = axes.reshape(1, -1)
 
-        # QPS heatmap
-        qps_pivot = df.pivot_table(
-            values="qps", index="topk", columns="batch", aggfunc="mean"
-        )
-        sns.heatmap(qps_pivot, annot=True, fmt=".1f", ax=ax1, cmap="YlOrRd")
-        ax1.set_title("Queries Per Second (QPS)")
-        ax1.set_xlabel("Batch Size")
-        ax1.set_ylabel("Top-K")
-
-        # Latency heatmap
-        latency_pivot = df.pivot_table(
-            values="avg_time", index="topk", columns="batch", aggfunc="mean"
-        )
-        sns.heatmap(latency_pivot, annot=True, fmt=".1f", ax=ax2, cmap="YlOrRd")
-        ax2.set_title("Average Query Latency (ms)")
-        ax2.set_xlabel("Batch Size")
-        ax2.set_ylabel("Top-K")
+        for idx, (config_key, data) in enumerate(sorted(fs_groups.items())):
+            # Create DataFrames for baseline and dev
+            baseline_df = (
+                pd.DataFrame(data["baseline"]) if data["baseline"] else pd.DataFrame()
+            )
+            dev_df = pd.DataFrame(data["dev"]) if data["dev"] else pd.DataFrame()
+
+            # Baseline QPS heatmap
+            ax_base = axes[idx][0]
+            if not baseline_df.empty:
+                baseline_pivot = baseline_df.pivot_table(
+                    values="qps", index="topk", columns="batch", aggfunc="mean"
+                )
+                sns.heatmap(
+                    baseline_pivot,
+                    annot=True,
+                    fmt=".1f",
+                    ax=ax_base,
+                    cmap="Greens",
+                    cbar_kws={"label": "QPS"},
+                )
+                ax_base.set_title(f"{config_key.upper()} - Baseline QPS")
+                ax_base.set_xlabel("Batch Size")
+                ax_base.set_ylabel("Top-K")
+            else:
+                ax_base.text(
+                    0.5,
+                    0.5,
+                    f"No baseline data for {config_key}",
+                    ha="center",
+                    va="center",
+                    transform=ax_base.transAxes,
+                )
+                ax_base.set_title(f"{config_key.upper()} - Baseline QPS")
 
+            # Dev QPS heatmap
+            ax_dev = axes[idx][1]
+            if not dev_df.empty:
+                dev_pivot = dev_df.pivot_table(
+                    values="qps", index="topk", columns="batch", aggfunc="mean"
+                )
+                sns.heatmap(
+                    dev_pivot,
+                    annot=True,
+                    fmt=".1f",
+                    ax=ax_dev,
+                    cmap="Blues",
+                    cbar_kws={"label": "QPS"},
+                )
+                ax_dev.set_title(f"{config_key.upper()} - Development QPS")
+                ax_dev.set_xlabel("Batch Size")
+                ax_dev.set_ylabel("Top-K")
+            else:
+                ax_dev.text(
+                    0.5,
+                    0.5,
+                    f"No dev data for {config_key}",
+                    ha="center",
+                    va="center",
+                    transform=ax_dev.transAxes,
+                )
+                ax_dev.set_title(f"{config_key.upper()} - Development QPS")
+
+        plt.suptitle("Query Performance: Baseline vs Development", fontsize=16, y=1.02)
         plt.tight_layout()
         output_file = os.path.join(
             self.output_dir,
@@ -796,32 +1219,101 @@ class ResultsAnalyzer:
         plt.close()
 
     def _plot_index_performance(self):
-        """Plot index creation performance"""
-        iterations = []
-        index_times = []
+        """Plot index creation performance comparing baseline vs dev"""
+        # Group by filesystem configuration
+        fs_groups = {}
+
+        for result in self.results_data:
+            hostname, is_dev = self._extract_node_info(result)
+            fs_type, block_size, config_key = self._extract_filesystem_config(result)
+
+            if config_key not in fs_groups:
+                fs_groups[config_key] = {"baseline": [], "dev": []}
 
-        for i, result in enumerate(self.results_data):
             index_perf = result.get("index_performance", {})
             if index_perf:
-                iterations.append(i + 1)
-                index_times.append(index_perf.get("creation_time_seconds", 0))
+                time = index_perf.get("creation_time_seconds", 0)
+                if time > 0:
+                    node_type = "dev" if is_dev else "baseline"
+                    fs_groups[config_key][node_type].append(time)
 
-        if not index_times:
+        if not fs_groups:
             return
 
-        plt.figure(figsize=(10, 6))
-        plt.bar(iterations, index_times, alpha=0.7, color="green")
-        plt.xlabel("Iteration")
-        plt.ylabel("Index Creation Time (seconds)")
-        plt.title("Index Creation Performance")
-        plt.grid(True, alpha=0.3)
-
-        # Add average line
-        avg_time = np.mean(index_times)
-        plt.axhline(
-            y=avg_time, color="red", linestyle="--", label=f"Average: {avg_time:.2f}s"
+        # Create comparison bar chart
+        fig, ax = plt.subplots(figsize=(14, 8))
+
+        configs = sorted(fs_groups.keys())
+        x = np.arange(len(configs))
+        width = 0.35
+
+        # Calculate averages for each config
+        baseline_avgs = []
+        dev_avgs = []
+        baseline_stds = []
+        dev_stds = []
+
+        for config in configs:
+            baseline_times = fs_groups[config]["baseline"]
+            dev_times = fs_groups[config]["dev"]
+
+            baseline_avgs.append(np.mean(baseline_times) if baseline_times else 0)
+            dev_avgs.append(np.mean(dev_times) if dev_times else 0)
+            baseline_stds.append(np.std(baseline_times) if baseline_times else 0)
+            dev_stds.append(np.std(dev_times) if dev_times else 0)
+
+        # Create bars
+        bars1 = ax.bar(
+            x - width / 2,
+            baseline_avgs,
+            width,
+            yerr=baseline_stds,
+            label="Baseline",
+            color="#4CAF50",
+            capsize=5,
+        )
+        bars2 = ax.bar(
+            x + width / 2,
+            dev_avgs,
+            width,
+            yerr=dev_stds,
+            label="Development",
+            color="#2196F3",
+            capsize=5,
         )
-        plt.legend()
+
+        # Add value labels on bars
+        for bar, val in zip(bars1, baseline_avgs):
+            if val > 0:
+                height = bar.get_height()
+                ax.text(
+                    bar.get_x() + bar.get_width() / 2.0,
+                    height,
+                    f"{val:.3f}s",
+                    ha="center",
+                    va="bottom",
+                    fontsize=9,
+                )
+
+        for bar, val in zip(bars2, dev_avgs):
+            if val > 0:
+                height = bar.get_height()
+                ax.text(
+                    bar.get_x() + bar.get_width() / 2.0,
+                    height,
+                    f"{val:.3f}s",
+                    ha="center",
+                    va="bottom",
+                    fontsize=9,
+                )
+
+        ax.set_xlabel("Filesystem Configuration", fontsize=12)
+        ax.set_ylabel("Index Creation Time (seconds)", fontsize=12)
+        ax.set_title("Index Creation Performance: Baseline vs Development", fontsize=14)
+        ax.set_xticks(x)
+        ax.set_xticklabels([c.upper() for c in configs], rotation=45, ha="right")
+        ax.legend(loc="upper right")
+        ax.grid(True, alpha=0.3, axis="y")
 
         output_file = os.path.join(
             self.output_dir,
@@ -833,61 +1325,148 @@ class ResultsAnalyzer:
         plt.close()
 
     def _plot_performance_matrix(self):
-        """Plot comprehensive performance comparison matrix"""
+        """Plot performance comparison matrix for each filesystem config"""
         if len(self.results_data) < 2:
             return
 
-        # Extract key metrics for comparison
-        metrics = []
-        for i, result in enumerate(self.results_data):
+        # Group by filesystem configuration
+        fs_metrics = {}
+
+        for result in self.results_data:
+            hostname, is_dev = self._extract_node_info(result)
+            fs_type, block_size, config_key = self._extract_filesystem_config(result)
+
+            if config_key not in fs_metrics:
+                fs_metrics[config_key] = {"baseline": [], "dev": []}
+
+            # Collect metrics
             insert_perf = result.get("insert_performance", {})
             index_perf = result.get("index_performance", {})
+            query_perf = result.get("query_performance", {})
 
             metric = {
-                "iteration": i + 1,
+                "hostname": hostname,
                 "insert_rate": insert_perf.get("vectors_per_second", 0),
                 "index_time": index_perf.get("creation_time_seconds", 0),
             }
 
-            # Add query metrics
-            query_perf = result.get("query_performance", {})
+            # Get representative query performance (topk_10, batch_1)
             if "topk_10" in query_perf and "batch_1" in query_perf["topk_10"]:
                 metric["query_qps"] = query_perf["topk_10"]["batch_1"].get(
                     "queries_per_second", 0
                 )
+            else:
+                metric["query_qps"] = 0
 
-            metrics.append(metric)
+            node_type = "dev" if is_dev else "baseline"
+            fs_metrics[config_key][node_type].append(metric)
 
-        df = pd.DataFrame(metrics)
+        if not fs_metrics:
+            return
 
-        # Normalize metrics for comparison
-        numeric_cols = ["insert_rate", "index_time", "query_qps"]
-        for col in numeric_cols:
-            if col in df.columns:
-                df[f"{col}_norm"] = (df[col] - df[col].min()) / (
-                    df[col].max() - df[col].min() + 1e-6
-                )
+        # Create subplots for each filesystem
+        n_configs = len(fs_metrics)
+        n_cols = min(3, n_configs)
+        n_rows = (n_configs + n_cols - 1) // n_cols
+
+        fig, axes = plt.subplots(n_rows, n_cols, figsize=(n_cols * 6, n_rows * 5))
+        if n_rows == 1 and n_cols == 1:
+            axes = [[axes]]
+        elif n_rows == 1:
+            axes = [axes]
+        elif n_cols == 1:
+            axes = [[ax] for ax in axes]
+
+        for idx, (config_key, data) in enumerate(sorted(fs_metrics.items())):
+            row = idx // n_cols
+            col = idx % n_cols
+            ax = axes[row][col]
+
+            # Calculate averages
+            baseline_metrics = data["baseline"]
+            dev_metrics = data["dev"]
+
+            if baseline_metrics and dev_metrics:
+                categories = ["Insert Rate\n(vec/s)", "Index Time\n(s)", "Query QPS"]
+
+                baseline_avg = [
+                    np.mean([m["insert_rate"] for m in baseline_metrics]),
+                    np.mean([m["index_time"] for m in baseline_metrics]),
+                    np.mean([m["query_qps"] for m in baseline_metrics]),
+                ]
 
-        # Create radar chart
-        fig, ax = plt.subplots(figsize=(10, 8), subplot_kw=dict(projection="polar"))
+                dev_avg = [
+                    np.mean([m["insert_rate"] for m in dev_metrics]),
+                    np.mean([m["index_time"] for m in dev_metrics]),
+                    np.mean([m["query_qps"] for m in dev_metrics]),
+                ]
 
-        angles = np.linspace(0, 2 * np.pi, len(numeric_cols), endpoint=False).tolist()
-        angles += angles[:1]  # Complete the circle
+                x = np.arange(len(categories))
+                width = 0.35
 
-        for i, row in df.iterrows():
-            values = [row.get(f"{col}_norm", 0) for col in numeric_cols]
-            values += values[:1]  # Complete the circle
+                bars1 = ax.bar(
+                    x - width / 2,
+                    baseline_avg,
+                    width,
+                    label="Baseline",
+                    color="#4CAF50",
+                )
+                bars2 = ax.bar(
+                    x + width / 2, dev_avg, width, label="Development", color="#2196F3"
+                )
 
-            ax.plot(
-                angles, values, "o-", linewidth=2, label=f'Iteration {row["iteration"]}'
-            )
-            ax.fill(angles, values, alpha=0.25)
+                # Add value labels
+                for bar, val in zip(bars1, baseline_avg):
+                    height = bar.get_height()
+                    ax.text(
+                        bar.get_x() + bar.get_width() / 2.0,
+                        height,
+                        f"{val:.0f}" if val > 100 else f"{val:.2f}",
+                        ha="center",
+                        va="bottom",
+                        fontsize=8,
+                    )
 
-        ax.set_xticks(angles[:-1])
-        ax.set_xticklabels(["Insert Rate", "Index Time (inv)", "Query QPS"])
-        ax.set_ylim(0, 1)
-        ax.set_title("Performance Comparison Matrix (Normalized)", y=1.08)
-        ax.legend(loc="upper right", bbox_to_anchor=(1.3, 1.0))
+                for bar, val in zip(bars2, dev_avg):
+                    height = bar.get_height()
+                    ax.text(
+                        bar.get_x() + bar.get_width() / 2.0,
+                        height,
+                        f"{val:.0f}" if val > 100 else f"{val:.2f}",
+                        ha="center",
+                        va="bottom",
+                        fontsize=8,
+                    )
+
+                ax.set_xlabel("Metrics")
+                ax.set_ylabel("Value")
+                ax.set_title(f"{config_key.upper()}")
+                ax.set_xticks(x)
+                ax.set_xticklabels(categories)
+                ax.legend(loc="upper right", fontsize=8)
+                ax.grid(True, alpha=0.3, axis="y")
+            else:
+                ax.text(
+                    0.5,
+                    0.5,
+                    f"Insufficient data\nfor {config_key}",
+                    ha="center",
+                    va="center",
+                    transform=ax.transAxes,
+                )
+                ax.set_title(f"{config_key.upper()}")
+
+        # Hide unused subplots
+        for idx in range(n_configs, n_rows * n_cols):
+            row = idx // n_cols
+            col = idx % n_cols
+            axes[row][col].set_visible(False)
+
+        plt.suptitle(
+            "Performance Comparison Matrix: Baseline vs Development",
+            fontsize=14,
+            y=1.02,
+        )
 
         output_file = os.path.join(
             self.output_dir,
@@ -898,6 +1477,149 @@ class ResultsAnalyzer:
         )
         plt.close()
 
+    def _plot_filesystem_comparison(self):
+        """Plot node performance comparison chart"""
+        if len(self.results_data) < 2:
+            return
+
+        # Group results by node
+        node_performance = {}
+
+        for result in self.results_data:
+            hostname, is_dev = self._extract_node_info(result)
+
+            if hostname not in node_performance:
+                node_performance[hostname] = {
+                    "insert_rates": [],
+                    "index_times": [],
+                    "query_qps": [],
+                    "is_dev": is_dev,
+                }
+
+            # Collect metrics
+            insert_perf = result.get("insert_performance", {})
+            if insert_perf:
+                node_performance[hostname]["insert_rates"].append(
+                    insert_perf.get("vectors_per_second", 0)
+                )
+
+            index_perf = result.get("index_performance", {})
+            if index_perf:
+                node_performance[hostname]["index_times"].append(
+                    index_perf.get("creation_time_seconds", 0)
+                )
+
+            # Get top-10 batch-1 query performance as representative
+            query_perf = result.get("query_performance", {})
+            if "topk_10" in query_perf and "batch_1" in query_perf["topk_10"]:
+                qps = query_perf["topk_10"]["batch_1"].get("queries_per_second", 0)
+                node_performance[hostname]["query_qps"].append(qps)
+
+        # Only create comparison if we have multiple nodes
+        if len(node_performance) > 1:
+            # Calculate averages
+            node_metrics = {}
+            for hostname, perf_data in node_performance.items():
+                node_metrics[hostname] = {
+                    "avg_insert_rate": (
+                        np.mean(perf_data["insert_rates"])
+                        if perf_data["insert_rates"]
+                        else 0
+                    ),
+                    "avg_index_time": (
+                        np.mean(perf_data["index_times"])
+                        if perf_data["index_times"]
+                        else 0
+                    ),
+                    "avg_query_qps": (
+                        np.mean(perf_data["query_qps"]) if perf_data["query_qps"] else 0
+                    ),
+                    "is_dev": perf_data["is_dev"],
+                }
+
+            # Create comparison bar chart with more space
+            fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(24, 8))
+
+            # Sort nodes with baseline first
+            sorted_nodes = sorted(
+                node_metrics.items(), key=lambda x: (x[1]["is_dev"], x[0])
+            )
+            node_names = [hostname for hostname, _ in sorted_nodes]
+
+            # Use different colors for baseline vs dev
+            colors = [
+                "#4CAF50" if not node_metrics[hostname]["is_dev"] else "#2196F3"
+                for hostname in node_names
+            ]
+
+            # Add labels for clarity
+            labels = [
+                f"{hostname}\n({'Dev' if node_metrics[hostname]['is_dev'] else 'Baseline'})"
+                for hostname in node_names
+            ]
+
+            # Insert rate comparison
+            insert_rates = [
+                node_metrics[hostname]["avg_insert_rate"] for hostname in node_names
+            ]
+            bars1 = ax1.bar(labels, insert_rates, color=colors)
+            ax1.set_title("Average Milvus Insert Rate by Node")
+            ax1.set_ylabel("Vectors/Second")
+            # Rotate labels for better readability
+            ax1.set_xticklabels(labels, rotation=45, ha="right", fontsize=8)
+
+            # Index time comparison (lower is better)
+            index_times = [
+                node_metrics[hostname]["avg_index_time"] for hostname in node_names
+            ]
+            bars2 = ax2.bar(labels, index_times, color=colors)
+            ax2.set_title("Average Milvus Index Time by Node")
+            ax2.set_ylabel("Seconds (Lower is Better)")
+            ax2.set_xticklabels(labels, rotation=45, ha="right", fontsize=8)
+
+            # Query QPS comparison
+            query_qps = [
+                node_metrics[hostname]["avg_query_qps"] for hostname in node_names
+            ]
+            bars3 = ax3.bar(labels, query_qps, color=colors)
+            ax3.set_title("Average Milvus Query QPS by Node")
+            ax3.set_ylabel("Queries/Second")
+            ax3.set_xticklabels(labels, rotation=45, ha="right", fontsize=8)
+
+            # Add value labels on bars
+            for bars, values in [
+                (bars1, insert_rates),
+                (bars2, index_times),
+                (bars3, query_qps),
+            ]:
+                for bar, value in zip(bars, values):
+                    height = bar.get_height()
+                    ax = bar.axes
+                    ax.text(
+                        bar.get_x() + bar.get_width() / 2.0,
+                        height + height * 0.01,
+                        f"{value:.1f}",
+                        ha="center",
+                        va="bottom",
+                        fontsize=10,
+                    )
+
+            plt.suptitle(
+                "Milvus Performance Comparison: Baseline vs Development Nodes",
+                fontsize=16,
+                y=1.02,
+            )
+            plt.tight_layout()
+
+            output_file = os.path.join(
+                self.output_dir,
+                f"filesystem_comparison.{self.config.get('graph_format', 'png')}",
+            )
+            plt.savefig(
+                output_file, dpi=self.config.get("graph_dpi", 300), bbox_inches="tight"
+            )
+            plt.close()
+
     def analyze(self) -> bool:
         """Run complete analysis"""
         self.logger.info("Starting results analysis...")
diff --git a/playbooks/roles/ai_collect_results/files/generate_better_graphs.py b/playbooks/roles/ai_collect_results/files/generate_better_graphs.py
index 645bac9e..b3681ff9 100755
--- a/playbooks/roles/ai_collect_results/files/generate_better_graphs.py
+++ b/playbooks/roles/ai_collect_results/files/generate_better_graphs.py
@@ -29,17 +29,18 @@ def extract_filesystem_from_filename(filename):
         if "_" in node_name:
             parts = node_name.split("_")
             node_name = "_".join(parts[:-1])  # Remove last part (iteration)
-        
+
         # Extract filesystem type from node name
         if "-xfs-" in node_name:
             return "xfs"
         elif "-ext4-" in node_name:
-            return "ext4"  
+            return "ext4"
         elif "-btrfs-" in node_name:
             return "btrfs"
-    
+
     return "unknown"
 
+
 def extract_node_config_from_filename(filename):
     """Extract detailed node configuration from filename"""
     # Expected format: results_debian13-ai-xfs-4k-4ks_1.json
@@ -50,14 +51,15 @@ def extract_node_config_from_filename(filename):
         if "_" in node_name:
             parts = node_name.split("_")
             node_name = "_".join(parts[:-1])  # Remove last part (iteration)
-        
+
         # Remove -dev suffix if present
         node_name = node_name.replace("-dev", "")
-        
+
         return node_name.replace("debian13-ai-", "")
-    
+
     return "unknown"
 
+
 def detect_filesystem():
     """Detect the filesystem type of /data on test nodes"""
     # This is now a fallback - we primarily use filename-based detection
@@ -104,7 +106,7 @@ def load_results(results_dir):
                 # Extract node type from filename
                 filename = os.path.basename(json_file)
                 data["filename"] = filename
-                
+
                 # Extract filesystem type and config from filename
                 data["filesystem"] = extract_filesystem_from_filename(filename)
                 data["node_config"] = extract_node_config_from_filename(filename)
diff --git a/playbooks/roles/ai_collect_results/files/generate_graphs.py b/playbooks/roles/ai_collect_results/files/generate_graphs.py
index 53a835e2..fafc62bf 100755
--- a/playbooks/roles/ai_collect_results/files/generate_graphs.py
+++ b/playbooks/roles/ai_collect_results/files/generate_graphs.py
@@ -9,7 +9,6 @@ import sys
 import glob
 import numpy as np
 import matplotlib
-
 matplotlib.use("Agg")  # Use non-interactive backend
 import matplotlib.pyplot as plt
 from datetime import datetime
@@ -17,68 +16,78 @@ from pathlib import Path
 from collections import defaultdict
 
 
+def _extract_filesystem_config(result):
+    """Extract filesystem type and block size from result data.
+    Returns (fs_type, block_size, config_key)"""
+    filename = result.get("_file", "")
+
+    # Primary: Extract filesystem type from filename (more reliable than JSON)
+    fs_type = "unknown"
+    block_size = "default"
+
+    if "xfs" in filename:
+        fs_type = "xfs"
+        # Check larger sizes first to avoid substring matches
+        if "64k" in filename and "64k-" in filename:
+            block_size = "64k"
+        elif "32k" in filename and "32k-" in filename:
+            block_size = "32k"
+        elif "16k" in filename and "16k-" in filename:
+            block_size = "16k"
+        elif "4k" in filename and "4k-" in filename:
+            block_size = "4k"
+    elif "ext4" in filename:
+        fs_type = "ext4"
+        if "4k" in filename and "4k-" in filename:
+            block_size = "4k"
+        elif "16k" in filename and "16k-" in filename:
+            block_size = "16k"
+    elif "btrfs" in filename:
+        fs_type = "btrfs"
+
+    # Fallback: Check JSON data if filename parsing failed
+    if fs_type == "unknown":
+        fs_type = result.get("filesystem", "unknown")
+
+    # Create descriptive config key
+    config_key = f"{fs_type}-{block_size}" if block_size != "default" else fs_type
+    return fs_type, block_size, config_key
+
+
+def _extract_node_info(result):
+    """Extract node hostname and determine if it's a dev node.
+    Returns (hostname, is_dev_node)"""
+    # Get hostname from system_info (preferred) or fall back to filename
+    system_info = result.get("system_info", {})
+    hostname = system_info.get("hostname", "")
+    
+    # If no hostname in system_info, try extracting from filename
+    if not hostname:
+        filename = result.get("_file", "")
+        # Remove results_ prefix and .json suffix
+        hostname = filename.replace("results_", "").replace(".json", "")
+        # Remove iteration number if present (_1, _2, etc.)
+        if "_" in hostname and hostname.split("_")[-1].isdigit():
+            hostname = "_".join(hostname.split("_")[:-1])
+    
+    # Determine if this is a dev node
+    is_dev = hostname.endswith("-dev")
+    
+    return hostname, is_dev
+
+
 def load_results(results_dir):
     """Load all JSON result files from the directory"""
     results = []
-    json_files = glob.glob(os.path.join(results_dir, "*.json"))
+    # Only load results_*.json files, not consolidated or other JSON files
+    json_files = glob.glob(os.path.join(results_dir, "results_*.json"))
 
     for json_file in json_files:
         try:
             with open(json_file, "r") as f:
                 data = json.load(f)
-                # Extract filesystem info - prefer from JSON data over filename
-                filename = os.path.basename(json_file)
-                
-                # First, try to get filesystem from the JSON data itself
-                fs_type = data.get("filesystem", None)
-                
-                # If not in JSON, try to parse from filename (backwards compatibility)
-                if not fs_type:
-                    parts = filename.replace("results_", "").replace(".json", "").split("-")
-                    
-                    # Parse host info
-                    if "debian13-ai-" in filename:
-                        host_parts = (
-                            filename.replace("results_debian13-ai-", "")
-                            .replace("_1.json", "")
-                            .replace("_2.json", "")
-                            .replace("_3.json", "")
-                            .split("-")
-                        )
-                        if "xfs" in host_parts[0]:
-                            fs_type = "xfs"
-                            # Extract block size (e.g., "4k", "16k", etc.)
-                            block_size = host_parts[1] if len(host_parts) > 1 else "unknown"
-                        elif "ext4" in host_parts[0]:
-                            fs_type = "ext4"
-                            block_size = host_parts[1] if len(host_parts) > 1 else "4k"
-                        elif "btrfs" in host_parts[0]:
-                            fs_type = "btrfs"
-                            block_size = "default"
-                        else:
-                            fs_type = "unknown"
-                            block_size = "unknown"
-                    else:
-                        fs_type = "unknown"
-                        block_size = "unknown"
-                else:
-                    # If filesystem came from JSON, set appropriate block size
-                    if fs_type == "btrfs":
-                        block_size = "default"
-                    elif fs_type in ["ext4", "xfs"]:
-                        block_size = data.get("block_size", "4k")
-                    else:
-                        block_size = data.get("block_size", "default")
-                
-                is_dev = "dev" in filename
-                
-                # Use filesystem from JSON if available, otherwise use parsed value
-                if "filesystem" not in data:
-                    data["filesystem"] = fs_type
-                data["block_size"] = block_size
-                data["is_dev"] = is_dev
-                data["filename"] = filename
-
+                # Add filename for filesystem detection
+                data["_file"] = os.path.basename(json_file)
                 results.append(data)
         except Exception as e:
             print(f"Error loading {json_file}: {e}")
@@ -86,554 +95,243 @@ def load_results(results_dir):
     return results
 
 
-def create_filesystem_comparison_chart(results, output_dir):
-    """Create a bar chart comparing performance across filesystems"""
-    # Group by filesystem and baseline/dev
-    fs_data = defaultdict(lambda: {"baseline": [], "dev": []})
-
-    for result in results:
-        fs = result.get("filesystem", "unknown")
-        category = "dev" if result.get("is_dev", False) else "baseline"
-
-        # Extract actual performance data from results
-        if "insert_performance" in result:
-            insert_qps = result["insert_performance"].get("vectors_per_second", 0)
-        else:
-            insert_qps = 0
-        fs_data[fs][category].append(insert_qps)
-
-    # Prepare data for plotting
-    filesystems = list(fs_data.keys())
-    baseline_means = [
-        np.mean(fs_data[fs]["baseline"]) if fs_data[fs]["baseline"] else 0
-        for fs in filesystems
-    ]
-    dev_means = [
-        np.mean(fs_data[fs]["dev"]) if fs_data[fs]["dev"] else 0 for fs in filesystems
-    ]
-
-    x = np.arange(len(filesystems))
-    width = 0.35
-
-    fig, ax = plt.subplots(figsize=(10, 6))
-    baseline_bars = ax.bar(
-        x - width / 2, baseline_means, width, label="Baseline", color="#1f77b4"
-    )
-    dev_bars = ax.bar(
-        x + width / 2, dev_means, width, label="Development", color="#ff7f0e"
-    )
-
-    ax.set_xlabel("Filesystem")
-    ax.set_ylabel("Insert QPS")
-    ax.set_title("Vector Database Performance by Filesystem")
-    ax.set_xticks(x)
-    ax.set_xticklabels(filesystems)
-    ax.legend()
-    ax.grid(True, alpha=0.3)
-
-    # Add value labels on bars
-    for bars in [baseline_bars, dev_bars]:
-        for bar in bars:
-            height = bar.get_height()
-            if height > 0:
-                ax.annotate(
-                    f"{height:.0f}",
-                    xy=(bar.get_x() + bar.get_width() / 2, height),
-                    xytext=(0, 3),
-                    textcoords="offset points",
-                    ha="center",
-                    va="bottom",
-                )
-
-    plt.tight_layout()
-    plt.savefig(os.path.join(output_dir, "filesystem_comparison.png"), dpi=150)
-    plt.close()
-
-
-def create_block_size_analysis(results, output_dir):
-    """Create analysis for different block sizes (XFS specific)"""
-    # Filter XFS results
-    xfs_results = [r for r in results if r.get("filesystem") == "xfs"]
-
-    if not xfs_results:
+def create_simple_performance_trends(results, output_dir):
+    """Create multi-node performance trends chart"""
+    if not results:
         return
 
-    # Group by block size
-    block_size_data = defaultdict(lambda: {"baseline": [], "dev": []})
-
-    for result in xfs_results:
-        block_size = result.get("block_size", "unknown")
-        category = "dev" if result.get("is_dev", False) else "baseline"
-        if "insert_performance" in result:
-            insert_qps = result["insert_performance"].get("vectors_per_second", 0)
-        else:
-            insert_qps = 0
-        block_size_data[block_size][category].append(insert_qps)
-
-    # Sort block sizes
-    block_sizes = sorted(
-        block_size_data.keys(),
-        key=lambda x: (
-            int(x.replace("k", "").replace("s", ""))
-            if x not in ["unknown", "default"]
-            else 0
-        ),
-    )
-
-    # Create grouped bar chart
-    baseline_means = [
-        (
-            np.mean(block_size_data[bs]["baseline"])
-            if block_size_data[bs]["baseline"]
-            else 0
-        )
-        for bs in block_sizes
-    ]
-    dev_means = [
-        np.mean(block_size_data[bs]["dev"]) if block_size_data[bs]["dev"] else 0
-        for bs in block_sizes
-    ]
-
-    x = np.arange(len(block_sizes))
-    width = 0.35
-
-    fig, ax = plt.subplots(figsize=(12, 6))
-    baseline_bars = ax.bar(
-        x - width / 2, baseline_means, width, label="Baseline", color="#2ca02c"
-    )
-    dev_bars = ax.bar(
-        x + width / 2, dev_means, width, label="Development", color="#d62728"
-    )
-
-    ax.set_xlabel("Block Size")
-    ax.set_ylabel("Insert QPS")
-    ax.set_title("XFS Performance by Block Size")
-    ax.set_xticks(x)
-    ax.set_xticklabels(block_sizes)
-    ax.legend()
-    ax.grid(True, alpha=0.3)
-
-    # Add value labels
-    for bars in [baseline_bars, dev_bars]:
-        for bar in bars:
-            height = bar.get_height()
-            if height > 0:
-                ax.annotate(
-                    f"{height:.0f}",
-                    xy=(bar.get_x() + bar.get_width() / 2, height),
-                    xytext=(0, 3),
-                    textcoords="offset points",
-                    ha="center",
-                    va="bottom",
-                )
-
-    plt.tight_layout()
-    plt.savefig(os.path.join(output_dir, "xfs_block_size_analysis.png"), dpi=150)
-    plt.close()
-
-
-def create_heatmap_analysis(results, output_dir):
-    """Create a heatmap showing performance across all configurations"""
-    # Group data by configuration and version
-    config_data = defaultdict(
-        lambda: {
-            "baseline": {"insert": 0, "query": 0},
-            "dev": {"insert": 0, "query": 0},
-        }
-    )
+    # Group results by node
+    node_performance = defaultdict(lambda: {
+        "insert_rates": [],
+        "insert_times": [],
+        "iterations": [],
+        "is_dev": False,
+    })
 
     for result in results:
-        fs = result.get("filesystem", "unknown")
-        block_size = result.get("block_size", "default")
-        config = f"{fs}-{block_size}"
-        version = "dev" if result.get("is_dev", False) else "baseline"
-
-        # Get actual insert performance
-        if "insert_performance" in result:
-            insert_qps = result["insert_performance"].get("vectors_per_second", 0)
-        else:
-            insert_qps = 0
-
-        # Calculate average query QPS
-        query_qps = 0
-        if "query_performance" in result:
-            qp = result["query_performance"]
-            total_qps = 0
-            count = 0
-            for topk_key in ["topk_1", "topk_10", "topk_100"]:
-                if topk_key in qp:
-                    for batch_key in ["batch_1", "batch_10", "batch_100"]:
-                        if batch_key in qp[topk_key]:
-                            total_qps += qp[topk_key][batch_key].get(
-                                "queries_per_second", 0
-                            )
-                            count += 1
-            if count > 0:
-                query_qps = total_qps / count
-
-        config_data[config][version]["insert"] = insert_qps
-        config_data[config][version]["query"] = query_qps
-
-    # Sort configurations
-    configs = sorted(config_data.keys())
-
-    # Prepare data for heatmap
-    insert_baseline = [config_data[c]["baseline"]["insert"] for c in configs]
-    insert_dev = [config_data[c]["dev"]["insert"] for c in configs]
-    query_baseline = [config_data[c]["baseline"]["query"] for c in configs]
-    query_dev = [config_data[c]["dev"]["query"] for c in configs]
-
-    # Create figure with custom heatmap
-    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 8))
-
-    # Create data matrices
-    insert_data = np.array([insert_baseline, insert_dev]).T
-    query_data = np.array([query_baseline, query_dev]).T
-
-    # Insert QPS heatmap
-    im1 = ax1.imshow(insert_data, cmap="YlOrRd", aspect="auto")
-    ax1.set_xticks([0, 1])
-    ax1.set_xticklabels(["Baseline", "Development"])
-    ax1.set_yticks(range(len(configs)))
-    ax1.set_yticklabels(configs)
-    ax1.set_title("Insert Performance Heatmap")
-    ax1.set_ylabel("Configuration")
-
-    # Add text annotations
-    for i in range(len(configs)):
-        for j in range(2):
-            text = ax1.text(
-                j,
-                i,
-                f"{int(insert_data[i, j])}",
-                ha="center",
-                va="center",
-                color="black",
-            )
+        hostname, is_dev = _extract_node_info(result)
+        
+        if hostname not in node_performance:
+            node_performance[hostname] = {
+                "insert_rates": [],
+                "insert_times": [],
+                "iterations": [],
+                "is_dev": is_dev,
+            }
 
-    # Add colorbar
-    cbar1 = plt.colorbar(im1, ax=ax1)
-    cbar1.set_label("Insert QPS")
-
-    # Query QPS heatmap
-    im2 = ax2.imshow(query_data, cmap="YlGnBu", aspect="auto")
-    ax2.set_xticks([0, 1])
-    ax2.set_xticklabels(["Baseline", "Development"])
-    ax2.set_yticks(range(len(configs)))
-    ax2.set_yticklabels(configs)
-    ax2.set_title("Query Performance Heatmap")
-
-    # Add text annotations
-    for i in range(len(configs)):
-        for j in range(2):
-            text = ax2.text(
-                j,
-                i,
-                f"{int(query_data[i, j])}",
-                ha="center",
-                va="center",
-                color="black",
+        insert_perf = result.get("insert_performance", {})
+        if insert_perf:
+            node_performance[hostname]["insert_rates"].append(
+                insert_perf.get("vectors_per_second", 0)
+            )
+            fs_performance[config_key]["insert_times"].append(
+                insert_perf.get("total_time_seconds", 0)
+            )
+            fs_performance[config_key]["iterations"].append(
+                len(fs_performance[config_key]["insert_rates"])
             )
 
-    # Add colorbar
-    cbar2 = plt.colorbar(im2, ax=ax2)
-    cbar2.set_label("Query QPS")
-
-    plt.tight_layout()
-    plt.savefig(os.path.join(output_dir, "performance_heatmap.png"), dpi=150)
-    plt.close()
-
-
-def create_performance_trends(results, output_dir):
-    """Create line charts showing performance trends"""
-    # Group by filesystem type
-    fs_types = defaultdict(
-        lambda: {
-            "configs": [],
-            "baseline_insert": [],
-            "dev_insert": [],
-            "baseline_query": [],
-            "dev_query": [],
-        }
-    )
-
-    for result in results:
-        fs = result.get("filesystem", "unknown")
-        block_size = result.get("block_size", "default")
-        config = f"{block_size}"
-
-        if config not in fs_types[fs]["configs"]:
-            fs_types[fs]["configs"].append(config)
-            fs_types[fs]["baseline_insert"].append(0)
-            fs_types[fs]["dev_insert"].append(0)
-            fs_types[fs]["baseline_query"].append(0)
-            fs_types[fs]["dev_query"].append(0)
-
-        idx = fs_types[fs]["configs"].index(config)
-
-        # Calculate average query QPS from all test configurations
-        query_qps = 0
-        if "query_performance" in result:
-            qp = result["query_performance"]
-            total_qps = 0
-            count = 0
-            for topk_key in ["topk_1", "topk_10", "topk_100"]:
-                if topk_key in qp:
-                    for batch_key in ["batch_1", "batch_10", "batch_100"]:
-                        if batch_key in qp[topk_key]:
-                            total_qps += qp[topk_key][batch_key].get(
-                                "queries_per_second", 0
-                            )
-                            count += 1
-            if count > 0:
-                query_qps = total_qps / count
-
-        if result.get("is_dev", False):
-            if "insert_performance" in result:
-                fs_types[fs]["dev_insert"][idx] = result["insert_performance"].get(
-                    "vectors_per_second", 0
-                )
-            fs_types[fs]["dev_query"][idx] = query_qps
-        else:
-            if "insert_performance" in result:
-                fs_types[fs]["baseline_insert"][idx] = result["insert_performance"].get(
-                    "vectors_per_second", 0
-                )
-            fs_types[fs]["baseline_query"][idx] = query_qps
-
-    # Create separate plots for each filesystem
-    for fs, data in fs_types.items():
-        if not data["configs"]:
-            continue
-
-        fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 10))
-
-        x = range(len(data["configs"]))
-
-        # Insert performance
-        ax1.plot(
-            x,
-            data["baseline_insert"],
-            "o-",
-            label="Baseline",
-            linewidth=2,
-            markersize=8,
-        )
-        ax1.plot(
-            x, data["dev_insert"], "s-", label="Development", linewidth=2, markersize=8
-        )
-        ax1.set_xlabel("Configuration")
-        ax1.set_ylabel("Insert QPS")
-        ax1.set_title(f"{fs.upper()} Insert Performance")
-        ax1.set_xticks(x)
-        ax1.set_xticklabels(data["configs"])
-        ax1.legend()
+    # Check if we have multi-filesystem data
+    if len(fs_performance) > 1:
+        # Multi-filesystem mode: separate lines for each filesystem
+        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
+        
+        colors = ["b", "r", "g", "m", "c", "y", "k"]
+        color_idx = 0
+        
+        for config_key, perf_data in fs_performance.items():
+            if not perf_data["insert_rates"]:
+                continue
+                
+            color = colors[color_idx % len(colors)]
+            iterations = list(range(1, len(perf_data["insert_rates"]) + 1))
+            
+            # Plot insert rate  
+            ax1.plot(
+                iterations,
+                perf_data["insert_rates"], 
+                f"{color}-o",
+                linewidth=2,
+                markersize=6,
+                label=config_key.upper(),
+            )
+            
+            # Plot insert time
+            ax2.plot(
+                iterations,
+                perf_data["insert_times"],
+                f"{color}-o", 
+                linewidth=2,
+                markersize=6,
+                label=config_key.upper(),
+            )
+            
+            color_idx += 1
+            
+        ax1.set_xlabel("Iteration")
+        ax1.set_ylabel("Vectors/Second")
+        ax1.set_title("Milvus Insert Rate by Storage Filesystem")
         ax1.grid(True, alpha=0.3)
-
-        # Query performance
-        ax2.plot(
-            x, data["baseline_query"], "o-", label="Baseline", linewidth=2, markersize=8
-        )
-        ax2.plot(
-            x, data["dev_query"], "s-", label="Development", linewidth=2, markersize=8
-        )
-        ax2.set_xlabel("Configuration")
-        ax2.set_ylabel("Query QPS")
-        ax2.set_title(f"{fs.upper()} Query Performance")
-        ax2.set_xticks(x)
-        ax2.set_xticklabels(data["configs"])
-        ax2.legend()
+        ax1.legend()
+        
+        ax2.set_xlabel("Iteration")
+        ax2.set_ylabel("Total Time (seconds)")
+        ax2.set_title("Milvus Insert Time by Storage Filesystem")
         ax2.grid(True, alpha=0.3)
-
-        plt.tight_layout()
-        plt.savefig(os.path.join(output_dir, f"{fs}_performance_trends.png"), dpi=150)
-        plt.close()
+        ax2.legend()
+    else:
+        # Single filesystem mode: original behavior
+        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
+        
+        # Extract insert data from single filesystem
+        config_key = list(fs_performance.keys())[0] if fs_performance else None
+        if config_key:
+            perf_data = fs_performance[config_key]
+            iterations = list(range(1, len(perf_data["insert_rates"]) + 1))
+            
+            # Plot insert rate
+            ax1.plot(
+                iterations,
+                perf_data["insert_rates"],
+                "b-o",
+                linewidth=2,
+                markersize=6,
+            )
+            ax1.set_xlabel("Iteration")
+            ax1.set_ylabel("Vectors/Second") 
+            ax1.set_title("Vector Insert Rate Performance")
+            ax1.grid(True, alpha=0.3)
+            
+            # Plot insert time
+            ax2.plot(
+                iterations,
+                perf_data["insert_times"],
+                "r-o",
+                linewidth=2,
+                markersize=6,
+            )
+            ax2.set_xlabel("Iteration")
+            ax2.set_ylabel("Total Time (seconds)")
+            ax2.set_title("Vector Insert Time Performance") 
+            ax2.grid(True, alpha=0.3)
+            
+    plt.tight_layout()
+    plt.savefig(os.path.join(output_dir, "performance_trends.png"), dpi=150)
+    plt.close()
 
 
-def create_simple_performance_trends(results, output_dir):
-    """Create a simple performance trends chart for basic Milvus testing"""
+def create_heatmap_analysis(results, output_dir):
+    """Create multi-filesystem heatmap showing query performance"""
     if not results:
         return
-    
-    # Separate baseline and dev results
-    baseline_results = [r for r in results if not r.get("is_dev", False)]
-    dev_results = [r for r in results if r.get("is_dev", False)]
-    
-    if not baseline_results and not dev_results:
-        return
-    
-    fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 10))
-    
-    # Prepare data
-    baseline_insert = []
-    baseline_query = []
-    dev_insert = []
-    dev_query = []
-    labels = []
-    
-    # Process baseline results
-    for i, result in enumerate(baseline_results):
-        if "insert_performance" in result:
-            baseline_insert.append(result["insert_performance"].get("vectors_per_second", 0))
-        else:
-            baseline_insert.append(0)
+
+    # Group data by filesystem configuration
+    fs_performance = defaultdict(lambda: {
+        "query_data": [],
+        "config_key": "",
+    })
+
+    for result in results:
+        fs_type, block_size, config_key = _extract_filesystem_config(result)
         
-        # Calculate average query QPS
-        query_qps = 0
-        if "query_performance" in result:
-            qp = result["query_performance"]
-            total_qps = 0
-            count = 0
-            for topk_key in ["topk_1", "topk_10", "topk_100"]:
-                if topk_key in qp:
-                    for batch_key in ["batch_1", "batch_10", "batch_100"]:
-                        if batch_key in qp[topk_key]:
-                            total_qps += qp[topk_key][batch_key].get("queries_per_second", 0)
-                            count += 1
-            if count > 0:
-                query_qps = total_qps / count
-        baseline_query.append(query_qps)
-        labels.append(f"Run {i+1}")
-    
-    # Process dev results
-    for result in dev_results:
-        if "insert_performance" in result:
-            dev_insert.append(result["insert_performance"].get("vectors_per_second", 0))
-        else:
-            dev_insert.append(0)
+        query_perf = result.get("query_performance", {})
+        for topk, topk_data in query_perf.items():
+            for batch, batch_data in topk_data.items():
+                qps = batch_data.get("queries_per_second", 0)
+                fs_performance[config_key]["query_data"].append({
+                    "topk": topk,
+                    "batch": batch,
+                    "qps": qps,
+                })
+                fs_performance[config_key]["config_key"] = config_key
+
+    # Check if we have multi-filesystem data
+    if len(fs_performance) > 1:
+        # Multi-filesystem mode: separate heatmaps for each filesystem
+        num_fs = len(fs_performance)
+        fig, axes = plt.subplots(1, num_fs, figsize=(5*num_fs, 6))
+        if num_fs == 1:
+            axes = [axes]
         
-        query_qps = 0
-        if "query_performance" in result:
-            qp = result["query_performance"]
-            total_qps = 0
-            count = 0
-            for topk_key in ["topk_1", "topk_10", "topk_100"]:
-                if topk_key in qp:
-                    for batch_key in ["batch_1", "batch_10", "batch_100"]:
-                        if batch_key in qp[topk_key]:
-                            total_qps += qp[topk_key][batch_key].get("queries_per_second", 0)
-                            count += 1
-            if count > 0:
-                query_qps = total_qps / count
-        dev_query.append(query_qps)
-    
-    x = range(len(baseline_results) if baseline_results else len(dev_results))
-    
-    # Insert performance
-    if baseline_insert:
-        ax1.plot(x, baseline_insert, "o-", label="Baseline", linewidth=2, markersize=8)
-    if dev_insert:
-        ax1.plot(x[:len(dev_insert)], dev_insert, "s-", label="Development", linewidth=2, markersize=8)
-    ax1.set_xlabel("Test Run")
-    ax1.set_ylabel("Insert QPS")
-    ax1.set_title("Milvus Insert Performance")
-    ax1.set_xticks(x)
-    ax1.set_xticklabels(labels if labels else [f"Run {i+1}" for i in x])
-    ax1.legend()
-    ax1.grid(True, alpha=0.3)
-    
-    # Query performance
-    if baseline_query:
-        ax2.plot(x, baseline_query, "o-", label="Baseline", linewidth=2, markersize=8)
-    if dev_query:
-        ax2.plot(x[:len(dev_query)], dev_query, "s-", label="Development", linewidth=2, markersize=8)
-    ax2.set_xlabel("Test Run")
-    ax2.set_ylabel("Query QPS")
-    ax2.set_title("Milvus Query Performance")
-    ax2.set_xticks(x)
-    ax2.set_xticklabels(labels if labels else [f"Run {i+1}" for i in x])
-    ax2.legend()
-    ax2.grid(True, alpha=0.3)
+        # Define common structure for consistency
+        topk_order = ["topk_1", "topk_10", "topk_100"]
+        batch_order = ["batch_1", "batch_10", "batch_100"]
+        
+        for idx, (config_key, perf_data) in enumerate(fs_performance.items()):
+            # Create matrix for this filesystem
+            matrix = np.zeros((len(topk_order), len(batch_order)))
+            
+            # Fill matrix with data
+            query_dict = {}
+            for item in perf_data["query_data"]:
+                query_dict[(item["topk"], item["batch"])] = item["qps"]
+                
+            for i, topk in enumerate(topk_order):
+                for j, batch in enumerate(batch_order):
+                    matrix[i, j] = query_dict.get((topk, batch), 0)
+            
+            # Plot heatmap
+            im = axes[idx].imshow(matrix, cmap='viridis', aspect='auto')
+            axes[idx].set_title(f"{config_key.upper()} Query Performance")
+            axes[idx].set_xticks(range(len(batch_order)))
+            axes[idx].set_xticklabels([b.replace("batch_", "Batch ") for b in batch_order])
+            axes[idx].set_yticks(range(len(topk_order)))
+            axes[idx].set_yticklabels([t.replace("topk_", "Top-") for t in topk_order])
+            
+            # Add text annotations
+            for i in range(len(topk_order)):
+                for j in range(len(batch_order)):
+                    axes[idx].text(j, i, f'{matrix[i, j]:.0f}',
+                                 ha="center", va="center", color="white", fontweight="bold")
+            
+            # Add colorbar
+            cbar = plt.colorbar(im, ax=axes[idx])
+            cbar.set_label('Queries Per Second (QPS)')
+    else:
+        # Single filesystem mode
+        fig, ax = plt.subplots(1, 1, figsize=(8, 6))
+        
+        if fs_performance:
+            config_key = list(fs_performance.keys())[0]
+            perf_data = fs_performance[config_key]
+            
+            # Create matrix
+            topk_order = ["topk_1", "topk_10", "topk_100"]
+            batch_order = ["batch_1", "batch_10", "batch_100"]
+            matrix = np.zeros((len(topk_order), len(batch_order)))
+            
+            # Fill matrix with data
+            query_dict = {}
+            for item in perf_data["query_data"]:
+                query_dict[(item["topk"], item["batch"])] = item["qps"]
+                
+            for i, topk in enumerate(topk_order):
+                for j, batch in enumerate(batch_order):
+                    matrix[i, j] = query_dict.get((topk, batch), 0)
+            
+            # Plot heatmap
+            im = ax.imshow(matrix, cmap='viridis', aspect='auto')
+            ax.set_title("Milvus Query Performance Heatmap")
+            ax.set_xticks(range(len(batch_order)))
+            ax.set_xticklabels([b.replace("batch_", "Batch ") for b in batch_order])
+            ax.set_yticks(range(len(topk_order)))
+            ax.set_yticklabels([t.replace("topk_", "Top-") for t in topk_order])
+            
+            # Add text annotations
+            for i in range(len(topk_order)):
+                for j in range(len(batch_order)):
+                    ax.text(j, i, f'{matrix[i, j]:.0f}',
+                           ha="center", va="center", color="white", fontweight="bold")
+            
+            # Add colorbar
+            cbar = plt.colorbar(im, ax=ax)
+            cbar.set_label('Queries Per Second (QPS)')
     
     plt.tight_layout()
-    plt.savefig(os.path.join(output_dir, "performance_trends.png"), dpi=150)
+    plt.savefig(os.path.join(output_dir, "performance_heatmap.png"), dpi=150, bbox_inches="tight")
     plt.close()
 
 
-def generate_summary_statistics(results, output_dir):
-    """Generate summary statistics and save to JSON"""
-    summary = {
-        "total_tests": len(results),
-        "filesystems_tested": list(
-            set(r.get("filesystem", "unknown") for r in results)
-        ),
-        "configurations": {},
-        "performance_summary": {
-            "best_insert_qps": {"value": 0, "config": ""},
-            "best_query_qps": {"value": 0, "config": ""},
-            "average_insert_qps": 0,
-            "average_query_qps": 0,
-        },
-    }
-
-    # Calculate statistics
-    all_insert_qps = []
-    all_query_qps = []
-
-    for result in results:
-        fs = result.get("filesystem", "unknown")
-        block_size = result.get("block_size", "default")
-        is_dev = "dev" if result.get("is_dev", False) else "baseline"
-        config_name = f"{fs}-{block_size}-{is_dev}"
-
-        # Get actual performance metrics
-        if "insert_performance" in result:
-            insert_qps = result["insert_performance"].get("vectors_per_second", 0)
-        else:
-            insert_qps = 0
-
-        # Calculate average query QPS
-        query_qps = 0
-        if "query_performance" in result:
-            qp = result["query_performance"]
-            total_qps = 0
-            count = 0
-            for topk_key in ["topk_1", "topk_10", "topk_100"]:
-                if topk_key in qp:
-                    for batch_key in ["batch_1", "batch_10", "batch_100"]:
-                        if batch_key in qp[topk_key]:
-                            total_qps += qp[topk_key][batch_key].get(
-                                "queries_per_second", 0
-                            )
-                            count += 1
-            if count > 0:
-                query_qps = total_qps / count
-
-        all_insert_qps.append(insert_qps)
-        all_query_qps.append(query_qps)
-
-        summary["configurations"][config_name] = {
-            "insert_qps": insert_qps,
-            "query_qps": query_qps,
-            "host": result.get("host", "unknown"),
-        }
-
-        if insert_qps > summary["performance_summary"]["best_insert_qps"]["value"]:
-            summary["performance_summary"]["best_insert_qps"] = {
-                "value": insert_qps,
-                "config": config_name,
-            }
-
-        if query_qps > summary["performance_summary"]["best_query_qps"]["value"]:
-            summary["performance_summary"]["best_query_qps"] = {
-                "value": query_qps,
-                "config": config_name,
-            }
-
-    summary["performance_summary"]["average_insert_qps"] = (
-        np.mean(all_insert_qps) if all_insert_qps else 0
-    )
-    summary["performance_summary"]["average_query_qps"] = (
-        np.mean(all_query_qps) if all_query_qps else 0
-    )
-
-    # Save summary
-    with open(os.path.join(output_dir, "summary.json"), "w") as f:
-        json.dump(summary, f, indent=2)
-
-    return summary
-
-
 def main():
     if len(sys.argv) < 3:
         print("Usage: generate_graphs.py <results_dir> <output_dir>")
@@ -642,37 +340,23 @@ def main():
     results_dir = sys.argv[1]
     output_dir = sys.argv[2]
 
-    # Create output directory
+    # Ensure output directory exists
     os.makedirs(output_dir, exist_ok=True)
 
     # Load results
     results = load_results(results_dir)
-
     if not results:
-        print("No results found to analyze")
+        print(f"No valid results found in {results_dir}")
         sys.exit(1)
 
     print(f"Loaded {len(results)} result files")
 
     # Generate graphs
-    print("Generating performance heatmap...")
-    create_heatmap_analysis(results, output_dir)
-
-    print("Generating performance trends...")
     create_simple_performance_trends(results, output_dir)
+    create_heatmap_analysis(results, output_dir)
 
-    print("Generating summary statistics...")
-    summary = generate_summary_statistics(results, output_dir)
-
-    print(f"\nAnalysis complete! Graphs saved to {output_dir}")
-    print(f"Total configurations tested: {summary['total_tests']}")
-    print(
-        f"Best insert QPS: {summary['performance_summary']['best_insert_qps']['value']} ({summary['performance_summary']['best_insert_qps']['config']})"
-    )
-    print(
-        f"Best query QPS: {summary['performance_summary']['best_query_qps']['value']} ({summary['performance_summary']['best_query_qps']['config']})"
-    )
+    print(f"Graphs generated in {output_dir}")
 
 
 if __name__ == "__main__":
-    main()
+    main()
\ No newline at end of file
diff --git a/playbooks/roles/ai_collect_results/files/generate_html_report.py b/playbooks/roles/ai_collect_results/files/generate_html_report.py
index a205577c..01ec734c 100755
--- a/playbooks/roles/ai_collect_results/files/generate_html_report.py
+++ b/playbooks/roles/ai_collect_results/files/generate_html_report.py
@@ -69,6 +69,24 @@ HTML_TEMPLATE = """
             color: #7f8c8d;
             font-size: 0.9em;
         }}
+        .config-box {{
+            background: #f8f9fa;
+            border-left: 4px solid #3498db;
+            padding: 15px;
+            margin: 20px 0;
+            border-radius: 4px;
+        }}
+        .config-box h3 {{
+            margin-top: 0;
+            color: #2c3e50;
+        }}
+        .config-box ul {{
+            margin: 10px 0;
+            padding-left: 20px;
+        }}
+        .config-box li {{
+            margin: 5px 0;
+        }}
         .section {{
             background: white;
             padding: 30px;
@@ -162,15 +180,16 @@ HTML_TEMPLATE = """
 </head>
 <body>
     <div class="header">
-        <h1>AI Vector Database Benchmark Results</h1>
+        <h1>Milvus Vector Database Benchmark Results</h1>
         <div class="subtitle">Generated on {timestamp}</div>
     </div>
     
     <nav class="navigation">
         <ul>
             <li><a href="#summary">Summary</a></li>
+            {filesystem_nav_items}
             <li><a href="#performance-metrics">Performance Metrics</a></li>
-            <li><a href="#performance-trends">Performance Trends</a></li>
+            <li><a href="#performance-heatmap">Performance Heatmap</a></li>
             <li><a href="#detailed-results">Detailed Results</a></li>
         </ul>
     </nav>
@@ -192,34 +211,40 @@ HTML_TEMPLATE = """
             <div class="label">{best_query_config}</div>
         </div>
         <div class="card">
-            <h3>Test Runs</h3>
-            <div class="value">{total_tests}</div>
-            <div class="label">Benchmark Executions</div>
+            <h3>{fourth_card_title}</h3>
+            <div class="value">{fourth_card_value}</div>
+            <div class="label">{fourth_card_label}</div>
         </div>
     </div>
     
-    <div id="performance-metrics" class="section">
-        <h2>Performance Metrics</h2>
-        <p>Key performance indicators for Milvus vector database operations.</p>
+    {filesystem_comparison_section}
+    
+    {block_size_analysis_section}
+    
+    <div id="performance-heatmap" class="section">
+        <h2>Performance Heatmap</h2>
+        <p>Heatmap visualization showing performance metrics across all tested configurations.</p>
         <div class="graph-container">
-            <img src="graphs/performance_heatmap.png" alt="Performance Metrics">
+            <img src="graphs/performance_heatmap.png" alt="Performance Heatmap">
         </div>
     </div>
     
-    <div id="performance-trends" class="section">
-        <h2>Performance Trends</h2>
-        <p>Performance comparison between baseline and development configurations.</p>
-        <div class="graph-container">
-            <img src="graphs/performance_trends.png" alt="Performance Trends">
+    <div id="performance-metrics" class="section">
+        <h2>Performance Metrics</h2>
+        {config_summary}
+        <div class="graph-grid">
+            {performance_trend_graphs}
         </div>
     </div>
     
     <div id="detailed-results" class="section">
-        <h2>Detailed Results Table</h2>
+        <h2>Milvus Performance by Storage Filesystem</h2>
+        <p>This table shows how Milvus vector database performs when its data is stored on different filesystem types and configurations.</p>
         <table class="results-table">
             <thead>
                 <tr>
-                    <th>Host</th>
+                    <th>Filesystem</th>
+                    <th>Configuration</th>
                     <th>Type</th>
                     <th>Insert QPS</th>
                     <th>Query QPS</th>
@@ -260,51 +285,77 @@ def load_results(results_dir):
                 data = json.load(f)
                 # Get filesystem from JSON data first, then fallback to filename parsing
                 filename = os.path.basename(json_file)
-                
+
                 # Skip results without valid performance data
                 insert_perf = data.get("insert_performance", {})
                 query_perf = data.get("query_performance", {})
                 if not insert_perf or not query_perf:
                     continue
-                
+
                 # Get filesystem from JSON data
                 fs_type = data.get("filesystem", None)
-                
-                # If not in JSON, try to parse from filename (backwards compatibility)
-                if not fs_type and "debian13-ai" in filename:
-                    host_parts = (
-                        filename.replace("results_debian13-ai-", "")
-                        .replace("_1.json", "")
+
+                # Always try to parse from filename first since JSON data might be wrong
+                if "-ai-" in filename:
+                    # Handle both debian13-ai- and prod-ai- prefixes
+                    cleaned_filename = filename.replace("results_", "")
+
+                    # Extract the part after -ai-
+                    if "debian13-ai-" in cleaned_filename:
+                        host_part = cleaned_filename.replace("debian13-ai-", "")
+                    elif "prod-ai-" in cleaned_filename:
+                        host_part = cleaned_filename.replace("prod-ai-", "")
+                    else:
+                        # Generic extraction
+                        ai_index = cleaned_filename.find("-ai-")
+                        if ai_index != -1:
+                            host_part = cleaned_filename[ai_index + 4 :]  # Skip "-ai-"
+                        else:
+                            host_part = cleaned_filename
+
+                    # Remove file extensions and dev suffix
+                    host_part = (
+                        host_part.replace("_1.json", "")
                         .replace("_2.json", "")
                         .replace("_3.json", "")
-                        .split("-")
+                        .replace("-dev", "")
                     )
-                    if "xfs" in host_parts[0]:
+
+                    # Parse filesystem type and block size
+                    if host_part.startswith("xfs-"):
                         fs_type = "xfs"
-                        block_size = host_parts[1] if len(host_parts) > 1 else "4k"
-                    elif "ext4" in host_parts[0]:
+                        # Extract block size: xfs-4k-4ks -> 4k
+                        parts = host_part.split("-")
+                        if len(parts) >= 2:
+                            block_size = parts[1]  # 4k, 16k, 32k, 64k
+                        else:
+                            block_size = "4k"
+                    elif host_part.startswith("ext4-"):
                         fs_type = "ext4"
-                        block_size = host_parts[1] if len(host_parts) > 1 else "4k"
-                    elif "btrfs" in host_parts[0]:
+                        parts = host_part.split("-")
+                        block_size = parts[1] if len(parts) > 1 else "4k"
+                    elif host_part.startswith("btrfs"):
                         fs_type = "btrfs"
                         block_size = "default"
                     else:
-                        fs_type = "unknown"
-                        block_size = "unknown"
+                        # Fallback to JSON data if available
+                        if not fs_type:
+                            fs_type = "unknown"
+                            block_size = "unknown"
                 else:
                     # Set appropriate block size based on filesystem
                     if fs_type == "btrfs":
                         block_size = "default"
                     else:
                         block_size = data.get("block_size", "default")
-                
+
                 # Default to unknown if still not found
                 if not fs_type:
                     fs_type = "unknown"
                     block_size = "unknown"
-                
+
                 is_dev = "dev" in filename
-                
+
                 # Calculate average QPS from query performance data
                 query_qps = 0
                 query_count = 0
@@ -316,7 +367,7 @@ def load_results(results_dir):
                             query_count += 1
                 if query_count > 0:
                     query_qps = query_qps / query_count
-                
+
                 results.append(
                     {
                         "host": filename.replace("results_", "").replace(".json", ""),
@@ -348,12 +399,36 @@ def generate_table_rows(results, best_configs):
         if config_key in best_configs:
             row_class += " best-config"
 
+        # Generate descriptive labels showing Milvus is running on this filesystem
+        if result["filesystem"] == "xfs" and result["block_size"] != "default":
+            storage_label = f"XFS {result['block_size'].upper()}"
+            config_details = f"Block size: {result['block_size']}, Milvus data on XFS"
+        elif result["filesystem"] == "ext4":
+            storage_label = "EXT4"
+            if "bigalloc" in result.get("host", "").lower():
+                config_details = "EXT4 with bigalloc, Milvus data on ext4"
+            else:
+                config_details = (
+                    f"Block size: {result['block_size']}, Milvus data on ext4"
+                )
+        elif result["filesystem"] == "btrfs":
+            storage_label = "BTRFS"
+            config_details = "Default Btrfs settings, Milvus data on Btrfs"
+        else:
+            storage_label = result["filesystem"].upper()
+            config_details = f"Milvus data on {result['filesystem']}"
+
+        # Extract clean node identifier from hostname
+        node_name = result["host"].replace("results_", "").replace(".json", "")
+
         row = f"""
         <tr class="{row_class}">
-            <td>{result['host']}</td>
+            <td><strong>{storage_label}</strong></td>
+            <td>{config_details}</td>
             <td>{result['type']}</td>
             <td>{result['insert_qps']:,}</td>
             <td>{result['query_qps']:,}</td>
+            <td><code>{node_name}</code></td>
             <td>{result['timestamp']}</td>
         </tr>
         """
@@ -362,10 +437,66 @@ def generate_table_rows(results, best_configs):
     return "\n".join(rows)
 
 
+def generate_config_summary(results_dir):
+    """Generate configuration summary HTML from results"""
+    # Try to load first result file to get configuration
+    result_files = glob.glob(os.path.join(results_dir, "results_*.json"))
+    if not result_files:
+        return ""
+
+    try:
+        with open(result_files[0], "r") as f:
+            data = json.load(f)
+            config = data.get("config", {})
+
+            # Format configuration details
+            config_html = """
+        <div class="config-box">
+            <h3>Test Configuration</h3>
+            <ul>
+                <li><strong>Vector Dataset Size:</strong> {:,} vectors</li>
+                <li><strong>Vector Dimensions:</strong> {}</li>
+                <li><strong>Index Type:</strong> {} (M={}, ef_construction={}, ef={})</li>
+                <li><strong>Benchmark Runtime:</strong> {} seconds</li>
+                <li><strong>Batch Size:</strong> {:,}</li>
+                <li><strong>Test Iterations:</strong> {} runs with identical configuration</li>
+            </ul>
+        </div>
+            """.format(
+                config.get("vector_dataset_size", "N/A"),
+                config.get("vector_dimensions", "N/A"),
+                config.get("index_type", "N/A"),
+                config.get("index_hnsw_m", "N/A"),
+                config.get("index_hnsw_ef_construction", "N/A"),
+                config.get("index_hnsw_ef", "N/A"),
+                config.get("benchmark_runtime", "N/A"),
+                config.get("batch_size", "N/A"),
+                len(result_files),
+            )
+            return config_html
+    except Exception as e:
+        print(f"Warning: Could not generate config summary: {e}")
+        return ""
+
+
 def find_performance_trend_graphs(graphs_dir):
-    """Find performance trend graph"""
-    # Not used in basic implementation since we embed the graph directly
-    return ""
+    """Find performance trend graphs"""
+    graphs = []
+    # Look for filesystem-specific graphs in multi-fs mode
+    for fs in ["xfs", "ext4", "btrfs"]:
+        graph_path = f"{fs}_performance_trends.png"
+        if os.path.exists(os.path.join(graphs_dir, graph_path)):
+            graphs.append(
+                f'<div class="graph-container"><img src="graphs/{graph_path}" alt="{fs.upper()} Performance Trends"></div>'
+            )
+    # Fallback to simple performance trends for single mode
+    if not graphs and os.path.exists(
+        os.path.join(graphs_dir, "performance_trends.png")
+    ):
+        graphs.append(
+            '<div class="graph-container"><img src="graphs/performance_trends.png" alt="Performance Trends"></div>'
+        )
+    return "\n".join(graphs)
 
 
 def generate_html_report(results_dir, graphs_dir, output_path):
@@ -393,6 +524,50 @@ def generate_html_report(results_dir, graphs_dir, output_path):
     if summary["performance_summary"]["best_query_qps"]["config"]:
         best_configs.add(summary["performance_summary"]["best_query_qps"]["config"])
 
+    # Check if multi-filesystem testing is enabled (more than one filesystem)
+    filesystems_tested = summary.get("filesystems_tested", [])
+    is_multifs_enabled = len(filesystems_tested) > 1
+
+    # Generate conditional sections based on multi-fs status
+    if is_multifs_enabled:
+        filesystem_nav_items = """
+            <li><a href="#filesystem-comparison">Filesystem Comparison</a></li>
+            <li><a href="#block-size-analysis">Block Size Analysis</a></li>"""
+
+        filesystem_comparison_section = """<div id="filesystem-comparison" class="section">
+        <h2>Milvus Storage Filesystem Comparison</h2>
+        <p>Comparison of Milvus vector database performance when its data is stored on different filesystem types (XFS, ext4, Btrfs) with various configurations.</p>
+        <div class="graph-container">
+            <img src="graphs/filesystem_comparison.png" alt="Filesystem Comparison">
+        </div>
+    </div>"""
+
+        block_size_analysis_section = """<div id="block-size-analysis" class="section">
+        <h2>XFS Block Size Analysis</h2>
+        <p>Performance analysis of XFS filesystem with different block sizes (4K, 16K, 32K, 64K).</p>
+        <div class="graph-container">
+            <img src="graphs/xfs_block_size_analysis.png" alt="XFS Block Size Analysis">
+        </div>
+    </div>"""
+
+        # Multi-fs mode: show filesystem info
+        fourth_card_title = "Storage Filesystems"
+        fourth_card_value = str(len(filesystems_tested))
+        fourth_card_label = ", ".join(filesystems_tested).upper() + " for Milvus Data"
+    else:
+        # Single filesystem mode - hide multi-fs sections
+        filesystem_nav_items = ""
+        filesystem_comparison_section = ""
+        block_size_analysis_section = ""
+
+        # Single mode: show test iterations
+        fourth_card_title = "Test Iterations"
+        fourth_card_value = str(summary["total_tests"])
+        fourth_card_label = "Identical Configuration Runs"
+
+    # Generate configuration summary
+    config_summary = generate_config_summary(results_dir)
+
     # Generate HTML
     html_content = HTML_TEMPLATE.format(
         timestamp=datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
@@ -401,6 +576,14 @@ def generate_html_report(results_dir, graphs_dir, output_path):
         best_insert_config=summary["performance_summary"]["best_insert_qps"]["config"],
         best_query_qps=f"{summary['performance_summary']['best_query_qps']['value']:,}",
         best_query_config=summary["performance_summary"]["best_query_qps"]["config"],
+        fourth_card_title=fourth_card_title,
+        fourth_card_value=fourth_card_value,
+        fourth_card_label=fourth_card_label,
+        filesystem_nav_items=filesystem_nav_items,
+        filesystem_comparison_section=filesystem_comparison_section,
+        block_size_analysis_section=block_size_analysis_section,
+        config_summary=config_summary,
+        performance_trend_graphs=find_performance_trend_graphs(graphs_dir),
         table_rows=generate_table_rows(results, best_configs),
     )
 
diff --git a/playbooks/roles/ai_collect_results/tasks/main.yml b/playbooks/roles/ai_collect_results/tasks/main.yml
index 6a15d89c..9586890a 100644
--- a/playbooks/roles/ai_collect_results/tasks/main.yml
+++ b/playbooks/roles/ai_collect_results/tasks/main.yml
@@ -134,13 +134,22 @@
   ansible.builtin.command: >
     python3 {{ local_scripts_dir }}/analyze_results.py
     --results-dir {{ local_results_dir }}
-    --output-dir {{ local_results_dir }}
+    --output-dir {{ local_results_dir }}/graphs
     {% if ai_benchmark_enable_graphing | bool %}--config {{ local_scripts_dir }}/analysis_config.json{% endif %}
   register: analysis_result
   run_once: true
   delegate_to: localhost
   when: collected_results.files is defined and collected_results.files | length > 0
   tags: ['results', 'analysis']
+  failed_when: analysis_result.rc != 0
+
+- name: Display analysis script output
+  ansible.builtin.debug:
+    var: analysis_result
+  run_once: true
+  delegate_to: localhost
+  when: collected_results.files is defined and collected_results.files | length > 0
+  tags: ['results', 'analysis']
 
 
 - name: Create graphs directory
@@ -155,35 +164,8 @@
     - collected_results.files | length > 0
   tags: ['results', 'graphs']
 
-- name: Generate performance graphs
-  ansible.builtin.command: >
-    python3 {{ local_scripts_dir }}/generate_better_graphs.py
-    {{ local_results_dir }}
-    {{ local_results_dir }}/graphs
-  register: graph_generation_result
-  failed_when: false
-  run_once: true
-  delegate_to: localhost
-  when:
-    - collected_results.files is defined
-    - collected_results.files | length > 0
-    - ai_benchmark_enable_graphing|bool
-  tags: ['results', 'graphs']
-
-- name: Fallback to basic graphs if better graphs fail
-  ansible.builtin.command: >
-    python3 {{ local_scripts_dir }}/generate_graphs.py
-    {{ local_results_dir }}
-    {{ local_results_dir }}/graphs
-  run_once: true
-  delegate_to: localhost
-  when:
-    - collected_results.files is defined
-    - collected_results.files | length > 0
-    - ai_benchmark_enable_graphing|bool
-    - graph_generation_result is defined
-    - graph_generation_result.rc != 0
-  tags: ['results', 'graphs']
+# Graph generation is now handled by analyze_results.py above
+# No separate graph generation step needed
 
 - name: Generate HTML report
   ansible.builtin.command: >
diff --git a/playbooks/roles/ai_collect_results/templates/analysis_config.json.j2 b/playbooks/roles/ai_collect_results/templates/analysis_config.json.j2
index 5a879649..459cd602 100644
--- a/playbooks/roles/ai_collect_results/templates/analysis_config.json.j2
+++ b/playbooks/roles/ai_collect_results/templates/analysis_config.json.j2
@@ -2,5 +2,5 @@
   "enable_graphing": {{ ai_benchmark_enable_graphing|default(true)|lower }},
   "graph_format": "{{ ai_benchmark_graph_format|default('png') }}",
   "graph_dpi": {{ ai_benchmark_graph_dpi|default(150) }},
-  "graph_theme": "{{ ai_benchmark_graph_theme|default('seaborn') }}"
+  "graph_theme": "{{ ai_benchmark_graph_theme|default('default') }}"
 }
diff --git a/playbooks/roles/ai_milvus_storage/tasks/main.yml b/playbooks/roles/ai_milvus_storage/tasks/main.yml
new file mode 100644
index 00000000..f8e4ea63
--- /dev/null
+++ b/playbooks/roles/ai_milvus_storage/tasks/main.yml
@@ -0,0 +1,161 @@
+---
+- name: Import optional extra_args file
+  include_vars: "{{ item }}"
+  ignore_errors: yes
+  with_items:
+    - "../extra_vars.yaml"
+  tags: vars
+
+- name: Milvus storage setup
+  when: ai_milvus_storage_enable|bool
+  block:
+    - name: Install filesystem utilities
+      package:
+        name:
+          - xfsprogs
+          - e2fsprogs
+          - btrfs-progs
+        state: present
+      become: yes
+      become_method: sudo
+
+    - name: Check if device exists
+      stat:
+        path: "{{ ai_milvus_device }}"
+      register: milvus_device_stat
+      failed_when: not milvus_device_stat.stat.exists
+
+    - name: Check if Milvus storage is already mounted
+      command: mountpoint -q {{ ai_milvus_mount_point }}
+      register: milvus_mount_check
+      changed_when: false
+      failed_when: false
+
+    - name: Setup Milvus storage filesystem
+      when: milvus_mount_check.rc != 0
+      block:
+        - name: Create Milvus mount point directory
+          file:
+            path: "{{ ai_milvus_mount_point }}"
+            state: directory
+            mode: '0755'
+          become: yes
+          become_method: sudo
+
+        - name: Detect filesystem type from node name
+          set_fact:
+            detected_fstype: >-
+              {%- if 'xfs' in inventory_hostname -%}
+                xfs
+              {%- elif 'ext4' in inventory_hostname -%}
+                ext4
+              {%- elif 'btrfs' in inventory_hostname -%}
+                btrfs
+              {%- else -%}
+                {{ ai_milvus_fstype | default('xfs') }}
+              {%- endif -%}
+          when: ai_milvus_use_node_fs | default(false) | bool
+
+        - name: Detect XFS parameters from node name
+          set_fact:
+            milvus_xfs_blocksize: >-
+              {%- if '64k' in inventory_hostname -%}
+                65536
+              {%- elif '32k' in inventory_hostname -%}
+                32768
+              {%- elif '16k' in inventory_hostname -%}
+                16384
+              {%- else -%}
+                {{ ai_milvus_xfs_blocksize | default(4096) }}
+              {%- endif -%}
+            milvus_xfs_sectorsize: >-
+              {%- if '4ks' in inventory_hostname -%}
+                4096
+              {%- elif '512s' in inventory_hostname -%}
+                512
+              {%- else -%}
+                {{ ai_milvus_xfs_sectorsize | default(4096) }}
+              {%- endif -%}
+          when:
+            - ai_milvus_use_node_fs | default(false) | bool
+            - detected_fstype | default(ai_milvus_fstype) == 'xfs'
+
+        - name: Detect ext4 parameters from node name
+          set_fact:
+            milvus_ext4_opts: >-
+              {%- if '16k' in inventory_hostname and 'bigalloc' in inventory_hostname -%}
+                -F -b 4096 -C 16384 -O bigalloc
+              {%- elif '4k' in inventory_hostname -%}
+                -F -b 4096
+              {%- else -%}
+                {{ ai_milvus_ext4_mkfs_opts | default('-F') }}
+              {%- endif -%}
+          when:
+            - ai_milvus_use_node_fs | default(false) | bool
+            - detected_fstype | default(ai_milvus_fstype) == 'ext4'
+
+        - name: Set final filesystem type
+          set_fact:
+            milvus_fstype: "{{ detected_fstype | default(ai_milvus_fstype | default('xfs')) }}"
+
+        - name: Format device with XFS
+          command: >
+            mkfs.xfs -f
+            -b size={{ milvus_xfs_blocksize | default(ai_milvus_xfs_blocksize | default(4096)) }}
+            -s size={{ milvus_xfs_sectorsize | default(ai_milvus_xfs_sectorsize | default(4096)) }}
+            {{ ai_milvus_xfs_mkfs_opts | default('') }}
+            {{ ai_milvus_device }}
+          when: milvus_fstype == "xfs"
+          become: yes
+          become_method: sudo
+
+        - name: Format device with Btrfs
+          command: mkfs.btrfs {{ ai_milvus_btrfs_mkfs_opts | default('-f') }} {{ ai_milvus_device }}
+          when: milvus_fstype == "btrfs"
+          become: yes
+          become_method: sudo
+
+        - name: Format device with ext4
+          command: mkfs.ext4 {{ milvus_ext4_opts | default(ai_milvus_ext4_mkfs_opts | default('-F')) }} {{ ai_milvus_device }}
+          when: milvus_fstype == "ext4"
+          become: yes
+          become_method: sudo
+
+        - name: Mount Milvus storage filesystem
+          mount:
+            path: "{{ ai_milvus_mount_point }}"
+            src: "{{ ai_milvus_device }}"
+            fstype: "{{ milvus_fstype }}"
+            opts: defaults,noatime
+            state: mounted
+          become: yes
+          become_method: sudo
+
+        - name: Add Milvus storage mount to fstab
+          mount:
+            path: "{{ ai_milvus_mount_point }}"
+            src: "{{ ai_milvus_device }}"
+            fstype: "{{ milvus_fstype }}"
+            opts: defaults,noatime
+            state: present
+          become: yes
+          become_method: sudo
+
+    - name: Ensure Milvus directories exist with proper permissions
+      file:
+        path: "{{ item }}"
+        state: directory
+        mode: '0755'
+        owner: root
+        group: root
+      become: yes
+      become_method: sudo
+      loop:
+        - "{{ ai_milvus_mount_point }}"
+        - "{{ ai_milvus_mount_point }}/data"
+        - "{{ ai_milvus_mount_point }}/etcd"
+        - "{{ ai_milvus_mount_point }}/minio"
+
+    - name: Display Milvus storage setup complete
+      debug:
+        msg: "Milvus storage has been prepared at: {{ ai_milvus_mount_point }} with filesystem: {{ milvus_fstype | default(ai_milvus_fstype | default('xfs')) }}"
diff --git a/playbooks/roles/ai_multifs_run/tasks/generate_comparison.yml b/playbooks/roles/ai_multifs_run/tasks/generate_comparison.yml
new file mode 100644
index 00000000..b4453b81
--- /dev/null
+++ b/playbooks/roles/ai_multifs_run/tasks/generate_comparison.yml
@@ -0,0 +1,279 @@
+---
+- name: Create multi-filesystem comparison script
+  copy:
+    content: |
+      #!/usr/bin/env python3
+      """
+      Multi-Filesystem AI Benchmark Comparison Report Generator
+
+      This script analyzes AI benchmark results across different filesystem
+      configurations and generates a comprehensive comparison report.
+      """
+
+      import json
+      import glob
+      import os
+      import sys
+      from datetime import datetime
+      from typing import Dict, List, Any
+
+      def load_filesystem_results(results_dir: str) -> Dict[str, Any]:
+          """Load results from all filesystem configurations"""
+          fs_results = {}
+
+          # Find all filesystem configuration directories
+          fs_dirs = [d for d in os.listdir(results_dir)
+                    if os.path.isdir(os.path.join(results_dir, d)) and d != 'comparison']
+
+          for fs_name in fs_dirs:
+              fs_path = os.path.join(results_dir, fs_name)
+
+              # Load configuration
+              config_file = os.path.join(fs_path, 'filesystem_config.txt')
+              config_info = {}
+              if os.path.exists(config_file):
+                  with open(config_file, 'r') as f:
+                      config_info['config_text'] = f.read()
+
+              # Load benchmark results
+              result_files = glob.glob(os.path.join(fs_path, 'results_*.json'))
+              benchmark_results = []
+
+              for result_file in result_files:
+                  try:
+                      with open(result_file, 'r') as f:
+                          data = json.load(f)
+                          benchmark_results.append(data)
+                  except Exception as e:
+                      print(f"Error loading {result_file}: {e}")
+
+              fs_results[fs_name] = {
+                  'config': config_info,
+                  'results': benchmark_results,
+                  'path': fs_path
+              }
+
+          return fs_results
+
+      def generate_comparison_report(fs_results: Dict[str, Any], output_dir: str):
+          """Generate HTML comparison report"""
+          html = []
+
+          # HTML header
+          html.append("<!DOCTYPE html>")
+          html.append("<html lang='en'>")
+          html.append("<head>")
+          html.append("    <meta charset='UTF-8'>")
+          html.append("    <title>AI Multi-Filesystem Benchmark Comparison</title>")
+          html.append("    <style>")
+          html.append("        body { font-family: Arial, sans-serif; margin: 20px; }")
+          html.append("        .header { background-color: #f0f8ff; padding: 20px; border-radius: 5px; margin-bottom: 20px; }")
+          html.append("        .fs-section { margin-bottom: 30px; border: 1px solid #ddd; padding: 15px; border-radius: 5px; }")
+          html.append("        .comparison-table { width: 100%; border-collapse: collapse; margin: 20px 0; }")
+          html.append("        .comparison-table th, .comparison-table td { border: 1px solid #ddd; padding: 8px; text-align: left; }")
+          html.append("        .comparison-table th { background-color: #f2f2f2; }")
+          html.append("        .metric-best { background-color: #d4edda; font-weight: bold; }")
+          html.append("        .metric-worst { background-color: #f8d7da; }")
+          html.append("        .chart-container { margin: 20px 0; padding: 15px; background-color: #f9f9f9; border-radius: 5px; }")
+          html.append("    </style>")
+          html.append("</head>")
+          html.append("<body>")
+
+          # Report header
+          html.append("    <div class='header'>")
+          html.append("        <h1>🗂️ AI Multi-Filesystem Benchmark Comparison</h1>")
+          html.append(f"        <p><strong>Generated:</strong> {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}</p>")
+          html.append(f"        <p><strong>Filesystem Configurations Tested:</strong> {len(fs_results)}</p>")
+          html.append("    </div>")
+
+          # Performance comparison table
+          html.append("    <h2>📊 Performance Comparison Summary</h2>")
+          html.append("    <table class='comparison-table'>")
+          html.append("        <tr>")
+          html.append("            <th>Filesystem</th>")
+          html.append("            <th>Avg Insert Rate (vectors/sec)</th>")
+          html.append("            <th>Avg Index Time (sec)</th>")
+          html.append("            <th>Avg Query QPS (Top-10, Batch-1)</th>")
+          html.append("            <th>Avg Query Latency (ms)</th>")
+          html.append("        </tr>")
+
+          # Calculate metrics for comparison
+          fs_metrics = {}
+          for fs_name, fs_data in fs_results.items():
+              if not fs_data['results']:
+                  continue
+
+              # Calculate averages across all iterations
+              insert_rates = []
+              index_times = []
+              query_qps = []
+              query_latencies = []
+
+              for result in fs_data['results']:
+                  if 'insert_performance' in result:
+                      insert_rates.append(result['insert_performance'].get('vectors_per_second', 0))
+
+                  if 'index_performance' in result:
+                      index_times.append(result['index_performance'].get('creation_time_seconds', 0))
+
+                  if 'query_performance' in result:
+                      qp = result['query_performance']
+                      if 'topk_10' in qp and 'batch_1' in qp['topk_10']:
+                          batch_data = qp['topk_10']['batch_1']
+                          query_qps.append(batch_data.get('queries_per_second', 0))
+                          query_latencies.append(batch_data.get('average_time_seconds', 0) * 1000)
+
+              fs_metrics[fs_name] = {
+                  'insert_rate': sum(insert_rates) / len(insert_rates) if insert_rates else 0,
+                  'index_time': sum(index_times) / len(index_times) if index_times else 0,
+                  'query_qps': sum(query_qps) / len(query_qps) if query_qps else 0,
+                  'query_latency': sum(query_latencies) / len(query_latencies) if query_latencies else 0
+              }
+
+          # Find best/worst for highlighting
+          if fs_metrics:
+              best_insert = max(fs_metrics.keys(), key=lambda x: fs_metrics[x]['insert_rate'])
+              best_index = min(fs_metrics.keys(), key=lambda x: fs_metrics[x]['index_time'])
+              best_qps = max(fs_metrics.keys(), key=lambda x: fs_metrics[x]['query_qps'])
+              best_latency = min(fs_metrics.keys(), key=lambda x: fs_metrics[x]['query_latency'])
+
+              worst_insert = min(fs_metrics.keys(), key=lambda x: fs_metrics[x]['insert_rate'])
+              worst_index = max(fs_metrics.keys(), key=lambda x: fs_metrics[x]['index_time'])
+              worst_qps = min(fs_metrics.keys(), key=lambda x: fs_metrics[x]['query_qps'])
+              worst_latency = max(fs_metrics.keys(), key=lambda x: fs_metrics[x]['query_latency'])
+
+          # Generate comparison rows
+          for fs_name, metrics in fs_metrics.items():
+              html.append("        <tr>")
+              html.append(f"            <td><strong>{fs_name}</strong></td>")
+
+              # Insert rate
+              cell_class = ""
+              if fs_name == best_insert:
+                  cell_class = "metric-best"
+              elif fs_name == worst_insert:
+                  cell_class = "metric-worst"
+              html.append(f"            <td class='{cell_class}'>{metrics['insert_rate']:.2f}</td>")
+
+              # Index time
+              cell_class = ""
+              if fs_name == best_index:
+                  cell_class = "metric-best"
+              elif fs_name == worst_index:
+                  cell_class = "metric-worst"
+              html.append(f"            <td class='{cell_class}'>{metrics['index_time']:.2f}</td>")
+
+              # Query QPS
+              cell_class = ""
+              if fs_name == best_qps:
+                  cell_class = "metric-best"
+              elif fs_name == worst_qps:
+                  cell_class = "metric-worst"
+              html.append(f"            <td class='{cell_class}'>{metrics['query_qps']:.2f}</td>")
+
+              # Query latency
+              cell_class = ""
+              if fs_name == best_latency:
+                  cell_class = "metric-best"
+              elif fs_name == worst_latency:
+                  cell_class = "metric-worst"
+              html.append(f"            <td class='{cell_class}'>{metrics['query_latency']:.2f}</td>")
+
+              html.append("        </tr>")
+
+          html.append("    </table>")
+
+          # Individual filesystem details
+          html.append("    <h2>📁 Individual Filesystem Details</h2>")
+          for fs_name, fs_data in fs_results.items():
+              html.append(f"    <div class='fs-section'>")
+              html.append(f"        <h3>{fs_name}</h3>")
+
+              if 'config_text' in fs_data['config']:
+                  html.append("        <h4>Configuration:</h4>")
+                  html.append("        <pre>" + fs_data['config']['config_text'][:500] + "</pre>")
+
+              html.append(f"        <p><strong>Benchmark Iterations:</strong> {len(fs_data['results'])}</p>")
+
+              if fs_name in fs_metrics:
+                  metrics = fs_metrics[fs_name]
+                  html.append("        <table class='comparison-table'>")
+                  html.append("            <tr><th>Metric</th><th>Value</th></tr>")
+                  html.append(f"            <tr><td>Average Insert Rate</td><td>{metrics['insert_rate']:.2f} vectors/sec</td></tr>")
+                  html.append(f"            <tr><td>Average Index Time</td><td>{metrics['index_time']:.2f} seconds</td></tr>")
+                  html.append(f"            <tr><td>Average Query QPS</td><td>{metrics['query_qps']:.2f}</td></tr>")
+                  html.append(f"            <tr><td>Average Query Latency</td><td>{metrics['query_latency']:.2f} ms</td></tr>")
+                  html.append("        </table>")
+
+              html.append("    </div>")
+
+          # Footer
+          html.append("    <div style='margin-top: 40px; padding: 20px; background-color: #f8f9fa; border-radius: 5px;'>")
+          html.append("        <h3>📝 Analysis Notes</h3>")
+          html.append("        <ul>")
+          html.append("            <li>Green highlighting indicates the best performing filesystem for each metric</li>")
+          html.append("            <li>Red highlighting indicates the worst performing filesystem for each metric</li>")
+          html.append("            <li>Results are averaged across all benchmark iterations for each filesystem</li>")
+          html.append("            <li>Performance can vary based on hardware, kernel version, and workload characteristics</li>")
+          html.append("        </ul>")
+          html.append("    </div>")
+
+          html.append("</body>")
+          html.append("</html>")
+
+          # Write HTML report
+          report_file = os.path.join(output_dir, "multi_filesystem_comparison.html")
+          with open(report_file, 'w') as f:
+              f.write("\n".join(html))
+
+          print(f"Multi-filesystem comparison report generated: {report_file}")
+
+          # Generate JSON summary
+          summary_data = {
+              'generation_time': datetime.now().isoformat(),
+              'filesystem_count': len(fs_results),
+              'metrics_summary': fs_metrics,
+              'raw_results': {fs: data['results'] for fs, data in fs_results.items()}
+          }
+
+          summary_file = os.path.join(output_dir, "multi_filesystem_summary.json")
+          with open(summary_file, 'w') as f:
+              json.dump(summary_data, f, indent=2)
+
+          print(f"Multi-filesystem summary data: {summary_file}")
+
+      def main():
+          results_dir = "{{ ai_multifs_results_dir }}"
+          comparison_dir = os.path.join(results_dir, "comparison")
+          os.makedirs(comparison_dir, exist_ok=True)
+
+          print("Loading filesystem results...")
+          fs_results = load_filesystem_results(results_dir)
+
+          if not fs_results:
+              print("No filesystem results found!")
+              return 1
+
+          print(f"Found results for {len(fs_results)} filesystem configurations")
+          print("Generating comparison report...")
+
+          generate_comparison_report(fs_results, comparison_dir)
+
+          print("Multi-filesystem comparison completed!")
+          return 0
+
+      if __name__ == "__main__":
+          sys.exit(main())
+    dest: "{{ ai_multifs_results_dir }}/generate_comparison.py"
+    mode: '0755'
+
+- name: Run multi-filesystem comparison analysis
+  command: python3 {{ ai_multifs_results_dir }}/generate_comparison.py
+  register: comparison_result
+
+- name: Display comparison completion message
+  debug:
+    msg: |
+      Multi-filesystem comparison completed!
+      Comparison report: {{ ai_multifs_results_dir }}/comparison/multi_filesystem_comparison.html
+      Summary data: {{ ai_multifs_results_dir }}/comparison/multi_filesystem_summary.json
diff --git a/playbooks/roles/ai_multifs_run/tasks/main.yml b/playbooks/roles/ai_multifs_run/tasks/main.yml
new file mode 100644
index 00000000..38dbba12
--- /dev/null
+++ b/playbooks/roles/ai_multifs_run/tasks/main.yml
@@ -0,0 +1,23 @@
+---
+- name: Import optional extra_args file
+  include_vars: "{{ item }}"
+  ignore_errors: yes
+  with_items:
+    - "../extra_vars.yaml"
+  tags: vars
+
+- name: Filter enabled filesystem configurations
+  set_fact:
+    enabled_fs_configs: "{{ ai_multifs_configurations | selectattr('enabled', 'equalto', true) | list }}"
+
+- name: Run AI benchmarks on each filesystem configuration
+  include_tasks: run_single_filesystem.yml
+  loop: "{{ enabled_fs_configs }}"
+  loop_control:
+    loop_var: fs_config
+    index_var: fs_index
+  when: enabled_fs_configs | length > 0
+
+- name: Generate multi-filesystem comparison report
+  include_tasks: generate_comparison.yml
+  when: enabled_fs_configs | length > 1
diff --git a/playbooks/roles/ai_multifs_run/tasks/run_single_filesystem.yml b/playbooks/roles/ai_multifs_run/tasks/run_single_filesystem.yml
new file mode 100644
index 00000000..fd194550
--- /dev/null
+++ b/playbooks/roles/ai_multifs_run/tasks/run_single_filesystem.yml
@@ -0,0 +1,104 @@
+---
+- name: Display current filesystem configuration
+  debug:
+    msg: "Testing filesystem configuration {{ fs_index + 1 }}/{{ enabled_fs_configs | length }}: {{ fs_config.name }}"
+
+- name: Unmount filesystem if mounted
+  mount:
+    path: "{{ ai_multifs_mount_point }}"
+    state: unmounted
+  ignore_errors: yes
+
+- name: Create filesystem with specific configuration
+  shell: "{{ fs_config.mkfs_cmd }} {{ ai_multifs_device }}"
+  register: mkfs_result
+
+- name: Display mkfs output
+  debug:
+    msg: "mkfs output: {{ mkfs_result.stdout }}"
+  when: mkfs_result.stdout != ""
+
+- name: Mount filesystem with specific options
+  mount:
+    path: "{{ ai_multifs_mount_point }}"
+    src: "{{ ai_multifs_device }}"
+    fstype: "{{ fs_config.filesystem }}"
+    opts: "{{ fs_config.mount_opts }}"
+    state: mounted
+
+- name: Create filesystem-specific results directory
+  file:
+    path: "{{ ai_multifs_results_dir }}/{{ fs_config.name }}"
+    state: directory
+    mode: '0755'
+
+- name: Update AI benchmark configuration for current filesystem
+  set_fact:
+    current_fs_benchmark_dir: "{{ ai_multifs_mount_point }}/ai-benchmark-data"
+    current_fs_results_dir: "{{ ai_multifs_results_dir }}/{{ fs_config.name }}"
+
+- name: Create AI benchmark data directory on current filesystem
+  file:
+    path: "{{ current_fs_benchmark_dir }}"
+    state: directory
+    mode: '0755'
+
+- name: Generate AI benchmark configuration for current filesystem
+  template:
+    src: milvus_config.json.j2
+    dest: "{{ current_fs_results_dir }}/milvus_config.json"
+    mode: '0644'
+
+- name: Run AI benchmark on current filesystem
+  shell: |
+    cd {{ current_fs_benchmark_dir }}
+    python3 {{ playbook_dir }}/roles/ai_run_benchmarks/files/milvus_benchmark.py \
+      --config {{ current_fs_results_dir }}/milvus_config.json \
+      --output {{ current_fs_results_dir }}/results_{{ fs_config.name }}_$(date +%Y%m%d_%H%M%S).json
+  register: benchmark_result
+  async: 7200  # 2 hour timeout
+  poll: 30
+
+- name: Display benchmark completion
+  debug:
+    msg: "Benchmark completed for {{ fs_config.name }}: {{ benchmark_result.stdout_lines[-5:] | default(['No output']) }}"
+
+- name: Record filesystem configuration metadata
+  copy:
+    content: |
+      # Filesystem Configuration: {{ fs_config.name }}
+      Filesystem Type: {{ fs_config.filesystem }}
+      mkfs Command: {{ fs_config.mkfs_cmd }}
+      Mount Options: {{ fs_config.mount_opts }}
+      Device: {{ ai_multifs_device }}
+      Mount Point: {{ ai_multifs_mount_point }}
+      Data Directory: {{ current_fs_benchmark_dir }}
+      Results Directory: {{ current_fs_results_dir }}
+      Test Start Time: {{ ansible_date_time.iso8601 }}
+
+      mkfs Output:
+      {{ mkfs_result.stdout }}
+      {{ mkfs_result.stderr }}
+    dest: "{{ current_fs_results_dir }}/filesystem_config.txt"
+    mode: '0644'
+
+- name: Capture filesystem statistics after benchmark
+  shell: |
+    echo "=== Filesystem Usage ===" > {{ current_fs_results_dir }}/filesystem_stats.txt
+    df -h {{ ai_multifs_mount_point }} >> {{ current_fs_results_dir }}/filesystem_stats.txt
+    echo "" >> {{ current_fs_results_dir }}/filesystem_stats.txt
+    echo "=== Filesystem Info ===" >> {{ current_fs_results_dir }}/filesystem_stats.txt
+    {% if fs_config.filesystem == 'xfs' %}
+    xfs_info {{ ai_multifs_mount_point }} >> {{ current_fs_results_dir }}/filesystem_stats.txt 2>&1 || true
+    {% elif fs_config.filesystem == 'ext4' %}
+    tune2fs -l {{ ai_multifs_device }} >> {{ current_fs_results_dir }}/filesystem_stats.txt 2>&1 || true
+    {% elif fs_config.filesystem == 'btrfs' %}
+    btrfs filesystem show {{ ai_multifs_mount_point }} >> {{ current_fs_results_dir }}/filesystem_stats.txt 2>&1 || true
+    btrfs filesystem usage {{ ai_multifs_mount_point }} >> {{ current_fs_results_dir }}/filesystem_stats.txt 2>&1 || true
+    {% endif %}
+  ignore_errors: yes
+
+- name: Unmount filesystem after benchmark
+  mount:
+    path: "{{ ai_multifs_mount_point }}"
+    state: unmounted
diff --git a/playbooks/roles/ai_multifs_run/templates/milvus_config.json.j2 b/playbooks/roles/ai_multifs_run/templates/milvus_config.json.j2
new file mode 100644
index 00000000..6216bf46
--- /dev/null
+++ b/playbooks/roles/ai_multifs_run/templates/milvus_config.json.j2
@@ -0,0 +1,42 @@
+{
+    "milvus": {
+        "host": "{{ ai_milvus_host }}",
+        "port": {{ ai_milvus_port }},
+        "database_name": "{{ ai_milvus_database_name }}_{{ fs_config.name }}"
+    },
+    "benchmark": {
+        "vector_dataset_size": {{ ai_vector_dataset_size }},
+        "vector_dimensions": {{ ai_vector_dimensions }},
+        "index_type": "{{ ai_index_type }}",
+        "iterations": {{ ai_benchmark_iterations }},
+        "runtime_seconds": {{ ai_benchmark_runtime }},
+        "warmup_seconds": {{ ai_benchmark_warmup_time }},
+        "query_patterns": {
+            "topk_1": {{ ai_benchmark_query_topk_1 | lower }},
+            "topk_10": {{ ai_benchmark_query_topk_10 | lower }},
+            "topk_100": {{ ai_benchmark_query_topk_100 | lower }}
+        },
+        "batch_sizes": {
+            "batch_1": {{ ai_benchmark_batch_1 | lower }},
+            "batch_10": {{ ai_benchmark_batch_10 | lower }},
+            "batch_100": {{ ai_benchmark_batch_100 | lower }}
+        }
+    },
+    "index_params": {
+{% if ai_index_type == "HNSW" %}
+        "M": {{ ai_index_hnsw_m }},
+        "efConstruction": {{ ai_index_hnsw_ef_construction }},
+        "ef": {{ ai_index_hnsw_ef }}
+{% elif ai_index_type == "IVF_FLAT" %}
+        "nlist": {{ ai_index_ivf_nlist }},
+        "nprobe": {{ ai_index_ivf_nprobe }}
+{% endif %}
+    },
+    "filesystem": {
+        "name": "{{ fs_config.name }}",
+        "type": "{{ fs_config.filesystem }}",
+        "mkfs_cmd": "{{ fs_config.mkfs_cmd }}",
+        "mount_opts": "{{ fs_config.mount_opts }}",
+        "data_directory": "{{ current_fs_benchmark_dir }}"
+    }
+}
diff --git a/playbooks/roles/ai_multifs_setup/defaults/main.yml b/playbooks/roles/ai_multifs_setup/defaults/main.yml
new file mode 100644
index 00000000..c35d179f
--- /dev/null
+++ b/playbooks/roles/ai_multifs_setup/defaults/main.yml
@@ -0,0 +1,49 @@
+---
+# Default values for AI multi-filesystem testing
+ai_multifs_results_dir: "/data/ai-multifs-benchmark"
+ai_multifs_device: "/dev/vdb"
+ai_multifs_mount_point: "/mnt/ai-multifs-test"
+
+# Filesystem configurations to test
+ai_multifs_configurations:
+  - name: "xfs_4k_4ks"
+    filesystem: "xfs"
+    mkfs_cmd: "mkfs.xfs -f -s size=4096 -b size=4096"
+    mount_opts: "rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota"
+    enabled: "{{ ai_multifs_test_xfs and ai_multifs_xfs_4k_4ks }}"
+
+  - name: "xfs_16k_4ks"
+    filesystem: "xfs"
+    mkfs_cmd: "mkfs.xfs -f -s size=4096 -b size=16384"
+    mount_opts: "rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota"
+    enabled: "{{ ai_multifs_test_xfs and ai_multifs_xfs_16k_4ks }}"
+
+  - name: "xfs_32k_4ks"
+    filesystem: "xfs"
+    mkfs_cmd: "mkfs.xfs -f -s size=4096 -b size=32768"
+    mount_opts: "rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota"
+    enabled: "{{ ai_multifs_test_xfs and ai_multifs_xfs_32k_4ks }}"
+
+  - name: "xfs_64k_4ks"
+    filesystem: "xfs"
+    mkfs_cmd: "mkfs.xfs -f -s size=4096 -b size=65536"
+    mount_opts: "rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota"
+    enabled: "{{ ai_multifs_test_xfs and ai_multifs_xfs_64k_4ks }}"
+
+  - name: "ext4_4k"
+    filesystem: "ext4"
+    mkfs_cmd: "mkfs.ext4 -F -b 4096"
+    mount_opts: "rw,relatime,data=ordered"
+    enabled: "{{ ai_multifs_test_ext4 and ai_multifs_ext4_4k }}"
+
+  - name: "ext4_16k_bigalloc"
+    filesystem: "ext4"
+    mkfs_cmd: "mkfs.ext4 -F -b 4096 -C 16384"
+    mount_opts: "rw,relatime,data=ordered"
+    enabled: "{{ ai_multifs_test_ext4 and ai_multifs_ext4_16k_bigalloc }}"
+
+  - name: "btrfs_default"
+    filesystem: "btrfs"
+    mkfs_cmd: "mkfs.btrfs -f"
+    mount_opts: "rw,relatime,space_cache=v2,discard=async"
+    enabled: "{{ ai_multifs_test_btrfs and ai_multifs_btrfs_default }}"
diff --git a/playbooks/roles/ai_multifs_setup/tasks/main.yml b/playbooks/roles/ai_multifs_setup/tasks/main.yml
new file mode 100644
index 00000000..28f3ec40
--- /dev/null
+++ b/playbooks/roles/ai_multifs_setup/tasks/main.yml
@@ -0,0 +1,70 @@
+---
+- name: Import optional extra_args file
+  include_vars: "{{ item }}"
+  ignore_errors: yes
+  with_items:
+    - "../extra_vars.yaml"
+  tags: vars
+
+- name: Create multi-filesystem results directory
+  file:
+    path: "{{ ai_multifs_results_dir }}"
+    state: directory
+    mode: '0755'
+
+- name: Create mount point directory
+  file:
+    path: "{{ ai_multifs_mount_point }}"
+    state: directory
+    mode: '0755'
+
+- name: Unmount any existing filesystem on mount point
+  mount:
+    path: "{{ ai_multifs_mount_point }}"
+    state: unmounted
+  ignore_errors: yes
+
+- name: Install required filesystem utilities
+  package:
+    name:
+      - xfsprogs
+      - e2fsprogs
+      - btrfs-progs
+    state: present
+
+- name: Filter enabled filesystem configurations
+  set_fact:
+    enabled_fs_configs: "{{ ai_multifs_configurations | selectattr('enabled', 'equalto', true) | list }}"
+
+- name: Display enabled filesystem configurations
+  debug:
+    msg: "Will test {{ enabled_fs_configs | length }} filesystem configurations: {{ enabled_fs_configs | map(attribute='name') | list }}"
+
+- name: Validate that device exists
+  stat:
+    path: "{{ ai_multifs_device }}"
+  register: device_stat
+  failed_when: not device_stat.stat.exists
+
+- name: Display device information
+  debug:
+    msg: "Using device {{ ai_multifs_device }} for multi-filesystem testing"
+
+- name: Create filesystem configuration summary
+  copy:
+    content: |
+      # AI Multi-Filesystem Testing Configuration
+      Generated: {{ ansible_date_time.iso8601 }}
+      Device: {{ ai_multifs_device }}
+      Mount Point: {{ ai_multifs_mount_point }}
+      Results Directory: {{ ai_multifs_results_dir }}
+
+      Enabled Filesystem Configurations:
+      {% for config in enabled_fs_configs %}
+      - {{ config.name }}:
+          Filesystem: {{ config.filesystem }}
+          mkfs command: {{ config.mkfs_cmd }}
+          Mount options: {{ config.mount_opts }}
+      {% endfor %}
+    dest: "{{ ai_multifs_results_dir }}/test_configuration.txt"
+    mode: '0644'
diff --git a/playbooks/roles/ai_run_benchmarks/files/milvus_benchmark.py b/playbooks/roles/ai_run_benchmarks/files/milvus_benchmark.py
index 4ce14fb7..2aaa54ba 100644
--- a/playbooks/roles/ai_run_benchmarks/files/milvus_benchmark.py
+++ b/playbooks/roles/ai_run_benchmarks/files/milvus_benchmark.py
@@ -54,67 +54,83 @@ class MilvusBenchmark:
         )
         self.logger = logging.getLogger(__name__)
 
-    def get_filesystem_info(self, path: str = "/data") -> Dict[str, str]:
+    def get_filesystem_info(self, path: str = "/data/milvus") -> Dict[str, str]:
         """Detect filesystem type for the given path"""
-        try:
-            # Use df -T to get filesystem type
-            result = subprocess.run(
-                ["df", "-T", path], capture_output=True, text=True, check=True
-            )
-
-            lines = result.stdout.strip().split("\n")
-            if len(lines) >= 2:
-                # Second line contains the filesystem info
-                # Format: Filesystem Type 1K-blocks Used Available Use% Mounted on
-                parts = lines[1].split()
-                if len(parts) >= 2:
-                    filesystem_type = parts[1]
-                    mount_point = parts[-1] if len(parts) >= 7 else path
+        # Try primary path first, fallback to /data for backwards compatibility
+        paths_to_try = [path]
+        if path != "/data" and not os.path.exists(path):
+            paths_to_try.append("/data")
+
+        for check_path in paths_to_try:
+            try:
+                # Use df -T to get filesystem type
+                result = subprocess.run(
+                    ["df", "-T", check_path], capture_output=True, text=True, check=True
+                )
+
+                lines = result.stdout.strip().split("\n")
+                if len(lines) >= 2:
+                    # Second line contains the filesystem info
+                    # Format: Filesystem Type 1K-blocks Used Available Use% Mounted on
+                    parts = lines[1].split()
+                    if len(parts) >= 2:
+                        filesystem_type = parts[1]
+                        mount_point = parts[-1] if len(parts) >= 7 else check_path
+
+                        return {
+                            "filesystem": filesystem_type,
+                            "mount_point": mount_point,
+                            "data_path": check_path,
+                        }
+            except subprocess.CalledProcessError as e:
+                self.logger.warning(
+                    f"Failed to detect filesystem for {check_path}: {e}"
+                )
+                continue
+            except Exception as e:
+                self.logger.warning(f"Error detecting filesystem for {check_path}: {e}")
+                continue
 
+        # Fallback: try to detect from /proc/mounts
+        for check_path in paths_to_try:
+            try:
+                with open("/proc/mounts", "r") as f:
+                    mounts = f.readlines()
+
+                # Find the mount that contains our path
+                best_match = ""
+                best_fs = "unknown"
+
+                for line in mounts:
+                    parts = line.strip().split()
+                    if len(parts) >= 3:
+                        mount_point = parts[1]
+                        fs_type = parts[2]
+
+                        # Check if this mount point is a prefix of our path
+                        if check_path.startswith(mount_point) and len(
+                            mount_point
+                        ) > len(best_match):
+                            best_match = mount_point
+                            best_fs = fs_type
+
+                if best_fs != "unknown":
                     return {
-                        "filesystem": filesystem_type,
-                        "mount_point": mount_point,
-                        "data_path": path,
+                        "filesystem": best_fs,
+                        "mount_point": best_match,
+                        "data_path": check_path,
                     }
-        except subprocess.CalledProcessError as e:
-            self.logger.warning(f"Failed to detect filesystem for {path}: {e}")
-        except Exception as e:
-            self.logger.warning(f"Error detecting filesystem for {path}: {e}")
 
-        # Fallback: try to detect from /proc/mounts
-        try:
-            with open("/proc/mounts", "r") as f:
-                mounts = f.readlines()
-
-            # Find the mount that contains our path
-            best_match = ""
-            best_fs = "unknown"
-
-            for line in mounts:
-                parts = line.strip().split()
-                if len(parts) >= 3:
-                    mount_point = parts[1]
-                    fs_type = parts[2]
-
-                    # Check if this mount point is a prefix of our path
-                    if path.startswith(mount_point) and len(mount_point) > len(
-                        best_match
-                    ):
-                        best_match = mount_point
-                        best_fs = fs_type
-
-            if best_fs != "unknown":
-                return {
-                    "filesystem": best_fs,
-                    "mount_point": best_match,
-                    "data_path": path,
-                }
-
-        except Exception as e:
-            self.logger.warning(f"Error reading /proc/mounts: {e}")
+            except Exception as e:
+                self.logger.warning(f"Error reading /proc/mounts for {check_path}: {e}")
+                continue
 
         # Final fallback
-        return {"filesystem": "unknown", "mount_point": "/", "data_path": path}
+        return {
+            "filesystem": "unknown",
+            "mount_point": "/",
+            "data_path": paths_to_try[0],
+        }
 
     def connect_to_milvus(self) -> bool:
         """Connect to Milvus server"""
@@ -440,13 +456,47 @@ class MilvusBenchmark:
         """Run complete benchmark suite"""
         self.logger.info("Starting Milvus benchmark suite...")
 
-        # Detect filesystem information
-        fs_info = self.get_filesystem_info("/data")
+        # Detect filesystem information - Milvus data path first
+        milvus_data_path = "/data/milvus"
+        if os.path.exists(milvus_data_path):
+            # Multi-fs mode: Milvus data is on dedicated filesystem
+            fs_info = self.get_filesystem_info(milvus_data_path)
+            self.logger.info(
+                f"Multi-filesystem mode: Using {milvus_data_path} for filesystem detection"
+            )
+        else:
+            # Single-fs mode: fallback to /data
+            fs_info = self.get_filesystem_info("/data")
+            self.logger.info(
+                f"Single-filesystem mode: Using /data for filesystem detection"
+            )
+
         self.results["system_info"] = fs_info
+        
+        # Add kernel version and hostname to system info
+        try:
+            import socket
+            
+            # Get hostname
+            self.results["system_info"]["hostname"] = socket.gethostname()
+            
+            # Get kernel version using uname -r
+            kernel_result = subprocess.run(['uname', '-r'], capture_output=True, text=True, check=True)
+            self.results["system_info"]["kernel_version"] = kernel_result.stdout.strip()
+            
+            self.logger.info(
+                f"System info: hostname={self.results['system_info']['hostname']}, "
+                f"kernel={self.results['system_info']['kernel_version']}"
+            )
+        except Exception as e:
+            self.logger.warning(f"Could not collect kernel info: {e}")
+            self.results["system_info"]["kernel_version"] = "unknown"
+            self.results["system_info"]["hostname"] = "unknown"
+        
         # Also add filesystem at top level for compatibility with existing graphs
         self.results["filesystem"] = fs_info["filesystem"]
         self.logger.info(
-            f"Detected filesystem: {fs_info['filesystem']} at {fs_info['mount_point']}"
+            f"Detected filesystem: {fs_info['filesystem']} at {fs_info['mount_point']} (data path: {fs_info['data_path']})"
         )
 
         if not self.connect_to_milvus():
diff --git a/playbooks/roles/gen_hosts/tasks/main.yml b/playbooks/roles/gen_hosts/tasks/main.yml
index 4b35d9f6..d36790b0 100644
--- a/playbooks/roles/gen_hosts/tasks/main.yml
+++ b/playbooks/roles/gen_hosts/tasks/main.yml
@@ -381,6 +381,25 @@
     - workflows_reboot_limit
     - ansible_hosts_template.stat.exists
 
+- name: Load AI nodes configuration for multi-filesystem setup
+  include_vars:
+    file: "{{ topdir_path }}/{{ kdevops_nodes }}"
+    name: guestfs_nodes
+  when:
+    - kdevops_workflows_dedicated_workflow
+    - kdevops_workflow_enable_ai
+    - ai_enable_multifs_testing|default(false)|bool
+    - ansible_hosts_template.stat.exists
+
+- name: Extract AI node names for multi-filesystem setup
+  set_fact:
+    all_generic_nodes: "{{ guestfs_nodes.guestfs_nodes | map(attribute='name') | list }}"
+  when:
+    - kdevops_workflows_dedicated_workflow
+    - kdevops_workflow_enable_ai
+    - ai_enable_multifs_testing|default(false)|bool
+    - guestfs_nodes is defined
+
 - name: Generate the Ansible hosts file for a dedicated AI setup
   tags: ['hosts']
   ansible.builtin.template:
diff --git a/playbooks/roles/gen_hosts/templates/fstests.j2 b/playbooks/roles/gen_hosts/templates/fstests.j2
index ac086c6e..32d90abf 100644
--- a/playbooks/roles/gen_hosts/templates/fstests.j2
+++ b/playbooks/roles/gen_hosts/templates/fstests.j2
@@ -70,6 +70,7 @@ ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
 [krb5:vars]
 ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
 {% endif %}
+{% if kdevops_enable_iscsi or kdevops_nfsd_enable or kdevops_smbd_enable or kdevops_krb5_enable %}
 [service]
 {% if kdevops_enable_iscsi %}
 {{ kdevops_hosts_prefix }}-iscsi
@@ -85,3 +86,4 @@ ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
 {% endif %}
 [service:vars]
 ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
+{% endif %}
diff --git a/playbooks/roles/gen_hosts/templates/gitr.j2 b/playbooks/roles/gen_hosts/templates/gitr.j2
index 7f9094d4..3f30a5fb 100644
--- a/playbooks/roles/gen_hosts/templates/gitr.j2
+++ b/playbooks/roles/gen_hosts/templates/gitr.j2
@@ -38,6 +38,7 @@ ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
 [nfsd:vars]
 ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
 {% endif %}
+{% if kdevops_enable_iscsi or kdevops_nfsd_enable %}
 [service]
 {% if kdevops_enable_iscsi %}
 {{ kdevops_hosts_prefix }}-iscsi
@@ -47,3 +48,4 @@ ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
 {% endif %}
 [service:vars]
 ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
+{% endif %}
diff --git a/playbooks/roles/gen_hosts/templates/hosts.j2 b/playbooks/roles/gen_hosts/templates/hosts.j2
index cdcd1883..e9441605 100644
--- a/playbooks/roles/gen_hosts/templates/hosts.j2
+++ b/playbooks/roles/gen_hosts/templates/hosts.j2
@@ -119,39 +119,30 @@ ansible_python_interpreter = "{{ kdevops_python_interpreter }}"
 [ai:vars]
 ansible_python_interpreter = "{{ kdevops_python_interpreter }}"
 
-{% set fs_configs = [] %}
+{# Individual section groups for multi-filesystem testing #}
+{% set section_names = [] %}
 {% for node in all_generic_nodes %}
-{% set node_parts = node.split('-') %}
-{% if node_parts|length >= 3 %}
-{% set fs_type = node_parts[2] %}
-{% set fs_config = node_parts[3:] | select('ne', 'dev') | join('_') %}
-{% set fs_group = fs_type + '_' + fs_config if fs_config else fs_type %}
-{% if fs_group not in fs_configs %}
-{% set _ = fs_configs.append(fs_group) %}
+{% if not node.endswith('-dev') %}
+{% set section = node.replace(kdevops_host_prefix + '-ai-', '') %}
+{% if section != kdevops_host_prefix + '-ai' %}
+{% if section_names.append(section) %}{% endif %}
 {% endif %}
 {% endif %}
 {% endfor %}
 
-{% for fs_group in fs_configs %}
-[ai_{{ fs_group }}]
-{% for node in all_generic_nodes %}
-{% set node_parts = node.split('-') %}
-{% if node_parts|length >= 3 %}
-{% set fs_type = node_parts[2] %}
-{% set fs_config = node_parts[3:] | select('ne', 'dev') | join('_') %}
-{% set node_fs_group = fs_type + '_' + fs_config if fs_config else fs_type %}
-{% if node_fs_group == fs_group %}
-{{ node }}
-{% endif %}
+{% for section in section_names %}
+[ai_{{ section | replace('-', '_') }}]
+{{ kdevops_host_prefix }}-ai-{{ section }}
+{% if kdevops_baseline_and_dev %}
+{{ kdevops_host_prefix }}-ai-{{ section }}-dev
 {% endif %}
-{% endfor %}
 
-[ai_{{ fs_group }}:vars]
+[ai_{{ section | replace('-', '_') }}:vars]
 ansible_python_interpreter = "{{ kdevops_python_interpreter }}"
 
 {% endfor %}
 {% else %}
-{# Single-node AI hosts #}
+{# Single filesystem hosts (original behavior) #}
 [all]
 localhost ansible_connection=local
 {{ kdevops_host_prefix }}-ai
diff --git a/playbooks/roles/gen_hosts/templates/nfstest.j2 b/playbooks/roles/gen_hosts/templates/nfstest.j2
index e427ac34..709d871d 100644
--- a/playbooks/roles/gen_hosts/templates/nfstest.j2
+++ b/playbooks/roles/gen_hosts/templates/nfstest.j2
@@ -38,6 +38,7 @@ ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
 [nfsd:vars]
 ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
 {% endif %}
+{% if kdevops_enable_iscsi or kdevops_nfsd_enable %}
 [service]
 {% if kdevops_enable_iscsi %}
 {{ kdevops_hosts_prefix }}-iscsi
@@ -47,3 +48,4 @@ ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
 {% endif %}
 [service:vars]
 ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
+{% endif %}
diff --git a/playbooks/roles/gen_hosts/templates/pynfs.j2 b/playbooks/roles/gen_hosts/templates/pynfs.j2
index 85c87dae..55add4d1 100644
--- a/playbooks/roles/gen_hosts/templates/pynfs.j2
+++ b/playbooks/roles/gen_hosts/templates/pynfs.j2
@@ -23,6 +23,7 @@ ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
 {{ kdevops_hosts_prefix }}-nfsd
 [nfsd:vars]
 ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
+{% if true %}
 [service]
 {% if kdevops_enable_iscsi %}
 {{ kdevops_hosts_prefix }}-iscsi
@@ -30,3 +31,4 @@ ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
 {{ kdevops_hosts_prefix }}-nfsd
 [service:vars]
 ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
+{% endif %}
diff --git a/playbooks/roles/gen_nodes/tasks/main.yml b/playbooks/roles/gen_nodes/tasks/main.yml
index d54977be..b294d294 100644
--- a/playbooks/roles/gen_nodes/tasks/main.yml
+++ b/playbooks/roles/gen_nodes/tasks/main.yml
@@ -658,6 +658,7 @@
     - kdevops_workflow_enable_ai
     - ansible_nodes_template.stat.exists
     - not kdevops_baseline_and_dev
+    - not ai_enable_multifs_testing|default(false)|bool
 
 - name: Generate the AI kdevops nodes file with dev hosts using {{ kdevops_nodes_template }} as jinja2 source template
   tags: ['hosts']
@@ -675,6 +676,95 @@
     - kdevops_workflow_enable_ai
     - ansible_nodes_template.stat.exists
     - kdevops_baseline_and_dev
+    - not ai_enable_multifs_testing|default(false)|bool
+
+- name: Infer enabled AI multi-filesystem configurations
+  vars:
+    kdevops_config_data: "{{ lookup('file', topdir_path + '/.config') }}"
+    # Find all enabled AI multifs configurations
+    xfs_configs: >-
+      {{
+        kdevops_config_data | regex_findall('^CONFIG_AI_MULTIFS_XFS_(.*)=y$', multiline=True)
+        | map('lower')
+        | map('regex_replace', '_', '-')
+        | map('regex_replace', '^', 'xfs-')
+        | list
+        if kdevops_config_data | regex_search('^CONFIG_AI_MULTIFS_TEST_XFS=y$', multiline=True)
+        else []
+      }}
+    ext4_configs: >-
+      {{
+        kdevops_config_data | regex_findall('^CONFIG_AI_MULTIFS_EXT4_(.*)=y$', multiline=True)
+        | map('lower')
+        | map('regex_replace', '_', '-')
+        | map('regex_replace', '^', 'ext4-')
+        | list
+        if kdevops_config_data | regex_search('^CONFIG_AI_MULTIFS_TEST_EXT4=y$', multiline=True)
+        else []
+      }}
+    btrfs_configs: >-
+      {{
+        kdevops_config_data | regex_findall('^CONFIG_AI_MULTIFS_BTRFS_(.*)=y$', multiline=True)
+        | map('lower')
+        | map('regex_replace', '_', '-')
+        | map('regex_replace', '^', 'btrfs-')
+        | list
+        if kdevops_config_data | regex_search('^CONFIG_AI_MULTIFS_TEST_BTRFS=y$', multiline=True)
+        else []
+      }}
+  set_fact:
+    ai_multifs_enabled_configs: "{{ (xfs_configs + ext4_configs + btrfs_configs) | unique }}"
+  when:
+    - kdevops_workflows_dedicated_workflow
+    - kdevops_workflow_enable_ai
+    - ai_enable_multifs_testing|default(false)|bool
+    - ansible_nodes_template.stat.exists
+
+- name: Create AI nodes for each filesystem configuration (no dev)
+  vars:
+    filesystem_nodes: "{{ [kdevops_host_prefix + '-ai-'] | product(ai_multifs_enabled_configs | default([])) | map('join') | list }}"
+  set_fact:
+    ai_enabled_section_types: "{{ filesystem_nodes }}"
+  when:
+    - kdevops_workflows_dedicated_workflow
+    - kdevops_workflow_enable_ai
+    - ai_enable_multifs_testing|default(false)|bool
+    - ansible_nodes_template.stat.exists
+    - not kdevops_baseline_and_dev
+    - ai_multifs_enabled_configs is defined
+    - ai_multifs_enabled_configs | length > 0
+
+- name: Create AI nodes for each filesystem configuration with dev hosts
+  vars:
+    filesystem_nodes: "{{ [kdevops_host_prefix + '-ai-'] | product(ai_multifs_enabled_configs | default([])) | map('join') | list }}"
+  set_fact:
+    ai_enabled_section_types: "{{ filesystem_nodes | product(['', '-dev']) | map('join') | list }}"
+  when:
+    - kdevops_workflows_dedicated_workflow
+    - kdevops_workflow_enable_ai
+    - ai_enable_multifs_testing|default(false)|bool
+    - ansible_nodes_template.stat.exists
+    - kdevops_baseline_and_dev
+    - ai_multifs_enabled_configs is defined
+    - ai_multifs_enabled_configs | length > 0
+
+- name: Generate the AI multi-filesystem kdevops nodes file using {{ kdevops_nodes_template }} as jinja2 source template
+  tags: [ 'hosts' ]
+  vars:
+    node_template: "{{ kdevops_nodes_template | basename }}"
+    nodes: "{{ ai_enabled_section_types | regex_replace('\\[') | regex_replace('\\]') | replace(\"'\", '') | split(', ') }}"
+    all_generic_nodes: "{{ ai_enabled_section_types }}"
+  template:
+    src: "{{ node_template }}"
+    dest: "{{ topdir_path }}/{{ kdevops_nodes }}"
+    force: yes
+  when:
+    - kdevops_workflows_dedicated_workflow
+    - kdevops_workflow_enable_ai
+    - ai_enable_multifs_testing|default(false)|bool
+    - ansible_nodes_template.stat.exists
+    - ai_enabled_section_types is defined
+    - ai_enabled_section_types | length > 0
 
 - name: Get the control host's timezone
   ansible.builtin.command: "timedatectl show -p Timezone --value"
diff --git a/playbooks/roles/guestfs/tasks/bringup/main.yml b/playbooks/roles/guestfs/tasks/bringup/main.yml
index c131de25..bd9f5260 100644
--- a/playbooks/roles/guestfs/tasks/bringup/main.yml
+++ b/playbooks/roles/guestfs/tasks/bringup/main.yml
@@ -1,11 +1,16 @@
 ---
 - name: List defined libvirt guests
   run_once: true
+  delegate_to: localhost
   community.libvirt.virt:
     command: list_vms
     uri: "{{ libvirt_uri }}"
   register: defined_vms
 
+- name: Debug defined VMs
+  debug:
+    msg: "Hostname: {{ inventory_hostname }}, Defined VMs: {{ hostvars['localhost']['defined_vms']['list_vms'] | default([]) }}, Check: {{ inventory_hostname not in (hostvars['localhost']['defined_vms']['list_vms'] | default([])) }}"
+
 - name: Provision each target node
   when:
     - "inventory_hostname not in defined_vms.list_vms"
@@ -25,10 +30,13 @@
             path: "{{ ssh_key_dir }}"
             state: directory
             mode: "u=rwx"
+          delegate_to: localhost
 
         - name: Generate fresh keys for each target node
           ansible.builtin.command:
             cmd: 'ssh-keygen -q -t ed25519 -f {{ ssh_key }} -N ""'
+            creates: "{{ ssh_key }}"
+          delegate_to: localhost
 
     - name: Set the pathname of the root disk image for each target node
       ansible.builtin.set_fact:
@@ -38,15 +46,18 @@
       ansible.builtin.file:
         path: "{{ storagedir }}/{{ inventory_hostname }}"
         state: directory
+      delegate_to: localhost
 
     - name: Duplicate the root disk image for each target node
       ansible.builtin.command:
         cmd: "cp --reflink=auto {{ base_image }} {{ root_image }}"
+      delegate_to: localhost
 
     - name: Get the timezone of the control host
       ansible.builtin.command:
         cmd: "timedatectl show -p Timezone --value"
       register: host_timezone
+      delegate_to: localhost
 
     - name: Build the root image for each target node (as root)
       become: true
@@ -103,6 +114,7 @@
         name: "{{ inventory_hostname }}"
         xml: "{{ lookup('file', xml_file) }}"
         uri: "{{ libvirt_uri }}"
+      delegate_to: localhost
 
     - name: Find PCIe passthrough devices
       ansible.builtin.find:
@@ -110,6 +122,7 @@
         file_type: file
         patterns: "pcie_passthrough_*.xml"
       register: passthrough_devices
+      delegate_to: localhost
 
     - name: Attach PCIe passthrough devices to each target node
       environment:
@@ -124,6 +137,7 @@
       loop: "{{ passthrough_devices.files }}"
       loop_control:
         label: "Doing PCI-E passthrough for device {{ item }}"
+      delegate_to: localhost
       when:
         - passthrough_devices.matched > 0
 
@@ -142,3 +156,4 @@
     name: "{{ inventory_hostname }}"
     uri: "{{ libvirt_uri }}"
     state: running
+  delegate_to: localhost
diff --git a/scripts/guestfs.Makefile b/scripts/guestfs.Makefile
index bd03f58c..f6c350a4 100644
--- a/scripts/guestfs.Makefile
+++ b/scripts/guestfs.Makefile
@@ -79,7 +79,7 @@ bringup_guestfs: $(GUESTFS_BRINGUP_DEPS)
 		--extra-vars=@./extra_vars.yaml \
 		--tags network,pool,base_image
 	$(Q)ansible-playbook $(ANSIBLE_VERBOSE) \
-		--limit 'baseline:dev:service' \
+		--limit 'baseline:dev:service:ai' \
 		playbooks/guestfs.yml \
 		--extra-vars=@./extra_vars.yaml \
 		--tags bringup
diff --git a/workflows/ai/Kconfig b/workflows/ai/Kconfig
index 2ffc6b65..d04570d8 100644
--- a/workflows/ai/Kconfig
+++ b/workflows/ai/Kconfig
@@ -161,4 +161,17 @@ config AI_BENCHMARK_ITERATIONS
 # Docker storage configuration
 source "workflows/ai/Kconfig.docker-storage"
 
+# Multi-filesystem configuration
+config AI_MULTIFS_ENABLE
+	bool "Enable multi-filesystem benchmarking"
+	output yaml
+	default n
+	help
+	  Run AI benchmarks across multiple filesystem configurations
+	  to compare performance characteristics.
+
+if AI_MULTIFS_ENABLE
+source "workflows/ai/Kconfig.multifs"
+endif
+
 endif # KDEVOPS_WORKFLOW_ENABLE_AI
diff --git a/workflows/ai/Kconfig.fs b/workflows/ai/Kconfig.fs
new file mode 100644
index 00000000..a95d02c6
--- /dev/null
+++ b/workflows/ai/Kconfig.fs
@@ -0,0 +1,118 @@
+menu "Target filesystem to use"
+
+choice
+	prompt "Target filesystem"
+	default AI_FILESYSTEM_XFS
+
+config AI_FILESYSTEM_XFS
+	bool "xfs"
+	select HAVE_SUPPORTS_PURE_IOMAP if BOOTLINUX_TREE_LINUS || BOOTLINUX_TREE_STABLE
+	help
+	  This will target testing AI workloads on top of XFS.
+	  XFS provides excellent performance for large datasets
+	  and is commonly used in high-performance computing.
+
+config AI_FILESYSTEM_BTRFS
+	bool "btrfs"
+	help
+	  This will target testing AI workloads on top of btrfs.
+	  Btrfs provides features like snapshots and compression
+	  which can be useful for AI dataset management.
+
+config AI_FILESYSTEM_EXT4
+	bool "ext4"
+	help
+	  This will target testing AI workloads on top of ext4.
+	  Ext4 is widely supported and provides reliable performance
+	  for AI workloads.
+
+endchoice
+
+config AI_FILESYSTEM
+	string
+	output yaml
+	default "xfs" if AI_FILESYSTEM_XFS
+	default "btrfs" if AI_FILESYSTEM_BTRFS
+	default "ext4" if AI_FILESYSTEM_EXT4
+
+config AI_FSTYPE
+	string
+	output yaml
+	default "xfs" if AI_FILESYSTEM_XFS
+	default "btrfs" if AI_FILESYSTEM_BTRFS
+	default "ext4" if AI_FILESYSTEM_EXT4
+
+if AI_FILESYSTEM_XFS
+
+menu "XFS configuration"
+
+config AI_XFS_MKFS_OPTS
+	string "mkfs.xfs options"
+	output yaml
+	default "-f -s size=4096"
+	help
+	  Additional options to pass to mkfs.xfs when creating
+	  the filesystem for AI workloads.
+
+config AI_XFS_MOUNT_OPTS
+	string "XFS mount options"
+	output yaml
+	default "rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota"
+	help
+	  Mount options for XFS filesystem. These options are
+	  optimized for AI workloads with large sequential I/O.
+
+endmenu
+
+endif # AI_FILESYSTEM_XFS
+
+if AI_FILESYSTEM_BTRFS
+
+menu "Btrfs configuration"
+
+config AI_BTRFS_MKFS_OPTS
+	string "mkfs.btrfs options"
+	output yaml
+	default "-f"
+	help
+	  Additional options to pass to mkfs.btrfs when creating
+	  the filesystem for AI workloads.
+
+config AI_BTRFS_MOUNT_OPTS
+	string "Btrfs mount options"
+	output yaml
+	default "rw,relatime,compress=lz4,space_cache=v2"
+	help
+	  Mount options for Btrfs filesystem. LZ4 compression
+	  can help with AI datasets while maintaining performance.
+
+endmenu
+
+endif # AI_FILESYSTEM_BTRFS
+
+if AI_FILESYSTEM_EXT4
+
+menu "Ext4 configuration"
+
+config AI_EXT4_MKFS_OPTS
+	string "mkfs.ext4 options"
+	output yaml
+	default "-F"
+	help
+	  Additional options to pass to mkfs.ext4 when creating
+	  the filesystem for AI workloads.
+
+config AI_EXT4_MOUNT_OPTS
+	string "Ext4 mount options"
+	output yaml
+	default "rw,relatime,data=ordered"
+	help
+	  Mount options for Ext4 filesystem optimized for
+	  AI workload patterns.
+
+endmenu
+
+endif # AI_FILESYSTEM_EXT4
+
+
+endmenu
diff --git a/workflows/ai/Kconfig.multifs b/workflows/ai/Kconfig.multifs
new file mode 100644
index 00000000..2b72dd6c
--- /dev/null
+++ b/workflows/ai/Kconfig.multifs
@@ -0,0 +1,184 @@
+menu "Multi-filesystem testing configuration"
+
+config AI_ENABLE_MULTIFS_TESTING
+	bool "Enable multi-filesystem testing"
+	default n
+	output yaml
+	help
+	  Enable testing the same AI workload across multiple filesystem
+	  configurations. This allows comparing performance characteristics
+	  between different filesystems and their configurations.
+
+	  When enabled, the AI benchmark will run sequentially across all
+	  selected filesystem configurations, allowing for detailed
+	  performance analysis across different storage backends.
+
+if AI_ENABLE_MULTIFS_TESTING
+
+config AI_MULTIFS_TEST_XFS
+	bool "Test XFS configurations"
+	default y
+	output yaml
+	help
+	  Enable testing AI workloads on XFS filesystem with different
+	  block size configurations.
+
+if AI_MULTIFS_TEST_XFS
+
+menu "XFS configuration profiles"
+
+config AI_MULTIFS_XFS_4K_4KS
+	bool "XFS 4k block size - 4k sector size"
+	default y
+	output yaml
+	help
+	  Test AI workloads on XFS with 4k filesystem block size
+	  and 4k sector size. This is the most common configuration
+	  and provides good performance for most workloads.
+
+config AI_MULTIFS_XFS_16K_4KS
+	bool "XFS 16k block size - 4k sector size"
+	default y
+	output yaml
+	help
+	  Test AI workloads on XFS with 16k filesystem block size
+	  and 4k sector size. Larger block sizes can improve performance
+	  for sequential I/O patterns common in AI workloads.
+
+config AI_MULTIFS_XFS_32K_4KS
+	bool "XFS 32k block size - 4k sector size"
+	default y
+	output yaml
+	help
+	  Test AI workloads on XFS with 32k filesystem block size
+	  and 4k sector size. Even larger block sizes can provide
+	  benefits for large sequential I/O operations typical in
+	  AI vector database workloads.
+
+config AI_MULTIFS_XFS_64K_4KS
+	bool "XFS 64k block size - 4k sector size"
+	default y
+	output yaml
+	help
+	  Test AI workloads on XFS with 64k filesystem block size
+	  and 4k sector size. Maximum supported block size for XFS,
+	  optimized for very large file operations and high-throughput
+	  AI workloads with substantial data transfers.
+
+endmenu
+
+endif # AI_MULTIFS_TEST_XFS
+
+config AI_MULTIFS_TEST_EXT4
+	bool "Test ext4 configurations"
+	default y
+	output yaml
+	help
+	  Enable testing AI workloads on ext4 filesystem with different
+	  configurations including bigalloc options.
+
+if AI_MULTIFS_TEST_EXT4
+
+menu "ext4 configuration profiles"
+
+config AI_MULTIFS_EXT4_4K
+	bool "ext4 4k block size"
+	default y
+	output yaml
+	help
+	  Test AI workloads on ext4 with standard 4k block size.
+	  This is the default ext4 configuration.
+
+config AI_MULTIFS_EXT4_16K_BIGALLOC
+	bool "ext4 16k bigalloc"
+	default y
+	output yaml
+	help
+	  Test AI workloads on ext4 with 16k bigalloc enabled.
+	  Bigalloc reduces metadata overhead and can improve
+	  performance for large file workloads.
+
+endmenu
+
+endif # AI_MULTIFS_TEST_EXT4
+
+config AI_MULTIFS_TEST_BTRFS
+	bool "Test btrfs configurations"
+	default y
+	output yaml
+	help
+	  Enable testing AI workloads on btrfs filesystem with
+	  common default configuration profile.
+
+if AI_MULTIFS_TEST_BTRFS
+
+menu "btrfs configuration profiles"
+
+config AI_MULTIFS_BTRFS_DEFAULT
+	bool "btrfs default profile"
+	default y
+	output yaml
+	help
+	  Test AI workloads on btrfs with default configuration.
+	  This includes modern defaults with free-space-tree and
+	  no-holes features enabled.
+
+endmenu
+
+endif # AI_MULTIFS_TEST_BTRFS
+
+config AI_MULTIFS_RESULTS_DIR
+	string "Multi-filesystem results directory"
+	output yaml
+	default "/data/ai-multifs-benchmark"
+	help
+	  Directory where multi-filesystem test results and logs will be stored.
+	  Each filesystem configuration will have its own subdirectory.
+
+config AI_MILVUS_STORAGE_ENABLE
+	bool "Enable dedicated Milvus storage with filesystem matching node profile"
+	default y
+	output yaml
+	help
+	  Configure a dedicated storage device for Milvus data including
+	  vector data (MinIO), metadata (etcd), and local cache. The filesystem
+	  type will automatically match the node's configuration profile.
+
+config AI_MILVUS_DEVICE
+	string "Device to use for Milvus storage"
+	output yaml
+	default "/dev/disk/by-id/nvme-QEMU_NVMe_Ctrl_kdevops3" if LIBVIRT && LIBVIRT_EXTRA_STORAGE_DRIVE_NVME
+	default "/dev/disk/by-id/virtio-kdevops3" if LIBVIRT && LIBVIRT_EXTRA_STORAGE_DRIVE_VIRTIO
+	default "/dev/disk/by-id/ata-QEMU_HARDDISK_kdevops3" if LIBVIRT && LIBVIRT_EXTRA_STORAGE_DRIVE_IDE
+	default "/dev/nvme3n1" if TERRAFORM_AWS_INSTANCE_M5AD_2XLARGE
+	default "/dev/nvme3n1" if TERRAFORM_AWS_INSTANCE_M5AD_4XLARGE
+	default "/dev/nvme3n1" if TERRAFORM_GCE
+	default "/dev/sde" if TERRAFORM_AZURE
+	default TERRAFORM_OCI_SPARSE_VOLUME_DEVICE_FILE_NAME if TERRAFORM_OCI
+	help
+	  The device to use for Milvus storage. This device will be
+	  formatted with the filesystem type matching the node's profile
+	  and mounted at /data/milvus.
+
+config AI_MILVUS_MOUNT_POINT
+	string "Mount point for Milvus storage"
+	output yaml
+	default "/data/milvus"
+	help
+	  The path where the Milvus storage filesystem will be mounted.
+	  All Milvus data directories (data/, etcd/, minio/) will be
+	  created under this mount point.
+
+config AI_MILVUS_USE_NODE_FS
+	bool "Automatically detect filesystem type from node name"
+	default y
+	output yaml
+	help
+	  When enabled, the filesystem type for Milvus storage will be
+	  automatically determined based on the node's configuration name.
+	  For example, nodes named *-xfs-* will use XFS, *-ext4-* will
+	  use ext4, and *-btrfs-* will use Btrfs.
+
+endif # AI_ENABLE_MULTIFS_TESTING
+
+endmenu
diff --git a/workflows/ai/scripts/analysis_config.json b/workflows/ai/scripts/analysis_config.json
index 2f90f4d5..5f0a9328 100644
--- a/workflows/ai/scripts/analysis_config.json
+++ b/workflows/ai/scripts/analysis_config.json
@@ -2,5 +2,5 @@
   "enable_graphing": true,
   "graph_format": "png",
   "graph_dpi": 150,
-  "graph_theme": "seaborn"
+  "graph_theme": "default"
 }
diff --git a/workflows/ai/scripts/analyze_results.py b/workflows/ai/scripts/analyze_results.py
index 3d11fb11..2dc4a1d6 100755
--- a/workflows/ai/scripts/analyze_results.py
+++ b/workflows/ai/scripts/analyze_results.py
@@ -226,6 +226,68 @@ class ResultsAnalyzer:
 
         return fs_info
 
+    def _extract_filesystem_config(
+        self, result: Dict[str, Any]
+    ) -> tuple[str, str, str]:
+        """Extract filesystem type and block size from result data.
+        Returns (fs_type, block_size, config_key)"""
+        filename = result.get("_file", "")
+
+        # Primary: Extract filesystem type from filename (more reliable than JSON)
+        fs_type = "unknown"
+        block_size = "default"
+
+        if "xfs" in filename:
+            fs_type = "xfs"
+            # Check larger sizes first to avoid substring matches
+            if "64k" in filename and "64k-" in filename:
+                block_size = "64k"
+            elif "32k" in filename and "32k-" in filename:
+                block_size = "32k"
+            elif "16k" in filename and "16k-" in filename:
+                block_size = "16k"
+            elif "4k" in filename and "4k-" in filename:
+                block_size = "4k"
+        elif "ext4" in filename:
+            fs_type = "ext4"
+            if "16k" in filename:
+                block_size = "16k"
+            elif "4k" in filename:
+                block_size = "4k"
+        elif "btrfs" in filename:
+            fs_type = "btrfs"
+            block_size = "default"
+        else:
+            # Fallback to JSON data if filename parsing fails
+            fs_type = result.get("filesystem", "unknown")
+            self.logger.warning(
+                f"Could not determine filesystem from filename {filename}, using JSON data: {fs_type}"
+            )
+
+        config_key = f"{fs_type}-{block_size}" if block_size != "default" else fs_type
+        return fs_type, block_size, config_key
+
+    def _extract_node_info(self, result: Dict[str, Any]) -> tuple[str, bool]:
+        """Extract node hostname and determine if it's a dev node.
+        Returns (hostname, is_dev_node)"""
+        # Get hostname from system_info (preferred) or fall back to filename
+        system_info = result.get("system_info", {})
+        hostname = system_info.get("hostname", "")
+
+        # If no hostname in system_info, try extracting from filename
+        if not hostname:
+            filename = result.get("_file", "")
+            # Remove results_ prefix and .json suffix
+            hostname = filename.replace("results_", "").replace(".json", "")
+            # Remove iteration number if present (_1, _2, etc.)
+            if "_" in hostname and hostname.split("_")[-1].isdigit():
+                hostname = "_".join(hostname.split("_")[:-1])
+
+        # Determine if this is a dev node
+        is_dev = hostname.endswith("-dev")
+
+        return hostname, is_dev
+
     def load_results(self) -> bool:
         """Load all result files from the results directory"""
         try:
@@ -391,6 +453,8 @@ class ResultsAnalyzer:
             html.append(
                 "        .highlight { background-color: #fff3cd; padding: 10px; border-radius: 3px; }"
             )
+            html.append("        .baseline-row { background-color: #e8f5e9; }")
+            html.append("        .dev-row { background-color: #e3f2fd; }")
             html.append("    </style>")
             html.append("</head>")
             html.append("<body>")
@@ -486,26 +550,69 @@ class ResultsAnalyzer:
             else:
                 html.append("        <p>No storage device information available.</p>")
 
-            # Filesystem section
-            html.append("        <h3>🗂️ Filesystem Configuration</h3>")
-            fs_info = self.system_info.get("filesystem_info", {})
-            html.append("        <table class='config-table'>")
-            html.append(
-                "            <tr><td>Filesystem Type</td><td>"
-                + str(fs_info.get("filesystem_type", "Unknown"))
-                + "</td></tr>"
-            )
-            html.append(
-                "            <tr><td>Mount Point</td><td>"
-                + str(fs_info.get("mount_point", "Unknown"))
-                + "</td></tr>"
-            )
-            html.append(
-                "            <tr><td>Mount Options</td><td>"
-                + str(fs_info.get("mount_options", "Unknown"))
-                + "</td></tr>"
-            )
-            html.append("        </table>")
+            # Node Configuration section - Extract from actual benchmark results
+            html.append("        <h3>🗂️ Node Configuration</h3>")
+
+            # Collect node and filesystem information from benchmark results
+            node_configs = {}
+            for result in self.results_data:
+                # Extract node information
+                hostname, is_dev = self._extract_node_info(result)
+                fs_type, block_size, config_key = self._extract_filesystem_config(
+                    result
+                )
+
+                system_info = result.get("system_info", {})
+                data_path = system_info.get("data_path", "/data/milvus")
+                mount_point = system_info.get("mount_point", "/data")
+                kernel_version = system_info.get("kernel_version", "unknown")
+
+                if hostname not in node_configs:
+                    node_configs[hostname] = {
+                        "hostname": hostname,
+                        "node_type": "Development" if is_dev else "Baseline",
+                        "filesystem": fs_type,
+                        "block_size": block_size,
+                        "data_path": data_path,
+                        "mount_point": mount_point,
+                        "kernel": kernel_version,
+                        "test_count": 0,
+                    }
+                node_configs[hostname]["test_count"] += 1
+
+            if node_configs:
+                html.append("        <table class='config-table'>")
+                html.append(
+                    "            <tr><th>Node</th><th>Type</th><th>Filesystem</th><th>Block Size</th><th>Data Path</th><th>Mount Point</th><th>Kernel</th><th>Tests</th></tr>"
+                )
+                # Sort nodes with baseline first, then dev
+                sorted_nodes = sorted(
+                    node_configs.items(),
+                    key=lambda x: (x[1]["node_type"] != "Baseline", x[0]),
+                )
+                for hostname, config_info in sorted_nodes:
+                    row_class = (
+                        "dev-row"
+                        if config_info["node_type"] == "Development"
+                        else "baseline-row"
+                    )
+                    html.append(f"            <tr class='{row_class}'>")
+                    html.append(f"                <td><strong>{hostname}</strong></td>")
+                    html.append(f"                <td>{config_info['node_type']}</td>")
+                    html.append(f"                <td>{config_info['filesystem']}</td>")
+                    html.append(f"                <td>{config_info['block_size']}</td>")
+                    html.append(f"                <td>{config_info['data_path']}</td>")
+                    html.append(
+                        f"                <td>{config_info['mount_point']}</td>"
+                    )
+                    html.append(f"                <td>{config_info['kernel']}</td>")
+                    html.append(f"                <td>{config_info['test_count']}</td>")
+                    html.append(f"            </tr>")
+                html.append("        </table>")
+            else:
+                html.append(
+                    "        <p>No node configuration data found in results.</p>"
+                )
             html.append("    </div>")
 
             # Test Configuration Section
@@ -551,92 +658,192 @@ class ResultsAnalyzer:
                 html.append("        </table>")
                 html.append("    </div>")
 
-            # Performance Results Section
+            # Performance Results Section - Per Node
             html.append("    <div class='section'>")
-            html.append("        <h2>📊 Performance Results Summary</h2>")
+            html.append("        <h2>📊 Performance Results by Node</h2>")
 
             if self.results_data:
-                # Insert performance
-                insert_times = [
-                    r.get("insert_performance", {}).get("total_time_seconds", 0)
-                    for r in self.results_data
-                ]
-                insert_rates = [
-                    r.get("insert_performance", {}).get("vectors_per_second", 0)
-                    for r in self.results_data
-                ]
-
-                if insert_times and any(t > 0 for t in insert_times):
-                    html.append("        <h3>📈 Vector Insert Performance</h3>")
-                    html.append("        <table class='metric-table'>")
-                    html.append(
-                        f"            <tr><td>Average Insert Time</td><td>{np.mean(insert_times):.2f} seconds</td></tr>"
-                    )
-                    html.append(
-                        f"            <tr><td>Average Insert Rate</td><td>{np.mean(insert_rates):.2f} vectors/sec</td></tr>"
+                # Group results by node
+                node_performance = {}
+
+                for result in self.results_data:
+                    # Use node hostname as the grouping key
+                    hostname, is_dev = self._extract_node_info(result)
+                    fs_type, block_size, config_key = self._extract_filesystem_config(
+                        result
                     )
-                    html.append(
-                        f"            <tr><td>Insert Rate Range</td><td>{np.min(insert_rates):.2f} - {np.max(insert_rates):.2f} vectors/sec</td></tr>"
-                    )
-                    html.append("        </table>")
 
-                # Index performance
-                index_times = [
-                    r.get("index_performance", {}).get("creation_time_seconds", 0)
-                    for r in self.results_data
-                ]
-                if index_times and any(t > 0 for t in index_times):
-                    html.append("        <h3>🔗 Index Creation Performance</h3>")
-                    html.append("        <table class='metric-table'>")
-                    html.append(
-                        f"            <tr><td>Average Index Creation Time</td><td>{np.mean(index_times):.2f} seconds</td></tr>"
+                    if hostname not in node_performance:
+                        node_performance[hostname] = {
+                            "hostname": hostname,
+                            "node_type": "Development" if is_dev else "Baseline",
+                            "insert_rates": [],
+                            "insert_times": [],
+                            "index_times": [],
+                            "query_performance": {},
+                            "filesystem": fs_type,
+                            "block_size": block_size,
+                        }
+
+                    # Add insert performance
+                    insert_perf = result.get("insert_performance", {})
+                    if insert_perf:
+                        rate = insert_perf.get("vectors_per_second", 0)
+                        time = insert_perf.get("total_time_seconds", 0)
+                        if rate > 0:
+                            node_performance[hostname]["insert_rates"].append(rate)
+                        if time > 0:
+                            node_performance[hostname]["insert_times"].append(time)
+
+                    # Add index performance
+                    index_perf = result.get("index_performance", {})
+                    if index_perf:
+                        time = index_perf.get("creation_time_seconds", 0)
+                        if time > 0:
+                            node_performance[hostname]["index_times"].append(time)
+
+                    # Collect query performance (use first result for each node)
+                    query_perf = result.get("query_performance", {})
+                    if (
+                        query_perf
+                        and not node_performance[hostname]["query_performance"]
+                    ):
+                        node_performance[hostname]["query_performance"] = query_perf
+
+                # Display results for each node, sorted with baseline first
+                sorted_nodes = sorted(
+                    node_performance.items(),
+                    key=lambda x: (x[1]["node_type"] != "Baseline", x[0]),
+                )
+                for hostname, perf_data in sorted_nodes:
+                    node_type_badge = (
+                        "🔵" if perf_data["node_type"] == "Development" else "🟢"
                     )
                     html.append(
-                        f"            <tr><td>Index Time Range</td><td>{np.min(index_times):.2f} - {np.max(index_times):.2f} seconds</td></tr>"
+                        f"        <h3>{node_type_badge} {hostname} ({perf_data['node_type']})</h3>"
                     )
-                    html.append("        </table>")
-
-                # Query performance
-                html.append("        <h3>🔍 Query Performance</h3>")
-                first_query_perf = self.results_data[0].get("query_performance", {})
-                if first_query_perf:
-                    html.append("        <table>")
                     html.append(
-                        "            <tr><th>Query Type</th><th>Batch Size</th><th>QPS</th><th>Avg Latency (ms)</th></tr>"
+                        f"        <p>Filesystem: {perf_data['filesystem']}, Block Size: {perf_data['block_size']}</p>"
                     )
 
-                    for topk, topk_data in first_query_perf.items():
-                        for batch, batch_data in topk_data.items():
-                            qps = batch_data.get("queries_per_second", 0)
-                            avg_time = batch_data.get("average_time_seconds", 0) * 1000
-
-                            # Color coding for performance
-                            qps_class = ""
-                            if qps > 1000:
-                                qps_class = "performance-good"
-                            elif qps > 100:
-                                qps_class = "performance-warning"
-                            else:
-                                qps_class = "performance-poor"
-
-                            html.append(f"            <tr>")
-                            html.append(
-                                f"                <td>{topk.replace('topk_', 'Top-')}</td>"
-                            )
-                            html.append(
-                                f"                <td>{batch.replace('batch_', 'Batch ')}</td>"
-                            )
-                            html.append(
-                                f"                <td class='{qps_class}'>{qps:.2f}</td>"
-                            )
-                            html.append(f"                <td>{avg_time:.2f}</td>")
-                            html.append(f"            </tr>")
+                    # Insert performance
+                    insert_rates = perf_data["insert_rates"]
+                    if insert_rates:
+                        html.append("        <h4>📈 Vector Insert Performance</h4>")
+                        html.append("        <table class='metric-table'>")
+                        html.append(
+                            f"            <tr><td>Average Insert Rate</td><td>{np.mean(insert_rates):.2f} vectors/sec</td></tr>"
+                        )
+                        html.append(
+                            f"            <tr><td>Insert Rate Range</td><td>{np.min(insert_rates):.2f} - {np.max(insert_rates):.2f} vectors/sec</td></tr>"
+                        )
+                        html.append(
+                            f"            <tr><td>Test Iterations</td><td>{len(insert_rates)}</td></tr>"
+                        )
+                        html.append("        </table>")
+
+                    # Index performance
+                    index_times = perf_data["index_times"]
+                    if index_times:
+                        html.append("        <h4>🔗 Index Creation Performance</h4>")
+                        html.append("        <table class='metric-table'>")
+                        html.append(
+                            f"            <tr><td>Average Index Creation Time</td><td>{np.mean(index_times):.3f} seconds</td></tr>"
+                        )
+                        html.append(
+                            f"            <tr><td>Index Time Range</td><td>{np.min(index_times):.3f} - {np.max(index_times):.3f} seconds</td></tr>"
+                        )
+                        html.append("        </table>")
+
+                    # Query performance
+                    query_perf = perf_data["query_performance"]
+                    if query_perf:
+                        html.append("        <h4>🔍 Query Performance</h4>")
+                        html.append("        <table>")
+                        html.append(
+                            "            <tr><th>Query Type</th><th>Batch Size</th><th>QPS</th><th>Avg Latency (ms)</th></tr>"
+                        )
 
-                    html.append("        </table>")
+                        for topk, topk_data in query_perf.items():
+                            for batch, batch_data in topk_data.items():
+                                qps = batch_data.get("queries_per_second", 0)
+                                avg_time = (
+                                    batch_data.get("average_time_seconds", 0) * 1000
+                                )
+
+                                # Color coding for performance
+                                qps_class = ""
+                                if qps > 1000:
+                                    qps_class = "performance-good"
+                                elif qps > 100:
+                                    qps_class = "performance-warning"
+                                else:
+                                    qps_class = "performance-poor"
+
+                                html.append(f"            <tr>")
+                                html.append(
+                                    f"                <td>{topk.replace('topk_', 'Top-')}</td>"
+                                )
+                                html.append(
+                                    f"                <td>{batch.replace('batch_', 'Batch ')}</td>"
+                                )
+                                html.append(
+                                    f"                <td class='{qps_class}'>{qps:.2f}</td>"
+                                )
+                                html.append(f"                <td>{avg_time:.2f}</td>")
+                                html.append(f"            </tr>")
+                        html.append("        </table>")
+
+                    html.append("        <br>")  # Add spacing between configurations
 
-                html.append("    </div>")
+            html.append("    </div>")
 
             # Footer
+            # Performance Graphs Section
+            html.append("    <div class='section'>")
+            html.append("        <h2>📈 Performance Visualizations</h2>")
+            html.append(
+                "        <p>The following graphs provide visual analysis of the benchmark results across all tested filesystem configurations:</p>"
+            )
+            html.append("        <ul>")
+            html.append(
+                "            <li><strong>Insert Performance:</strong> Shows vector insertion rates and times for each filesystem configuration</li>"
+            )
+            html.append(
+                "            <li><strong>Query Performance:</strong> Displays query performance heatmaps for different Top-K and batch sizes</li>"
+            )
+            html.append(
+                "            <li><strong>Index Performance:</strong> Compares index creation times across filesystems</li>"
+            )
+            html.append(
+                "            <li><strong>Performance Matrix:</strong> Comprehensive comparison matrix of all metrics</li>"
+            )
+            html.append(
+                "            <li><strong>Filesystem Comparison:</strong> Side-by-side comparison of filesystem performance</li>"
+            )
+            html.append("        </ul>")
+            html.append(
+                "        <p><em>Note: Graphs are generated as separate PNG files in the same directory as this report.</em></p>"
+            )
+            html.append("        <div style='margin-top: 20px;'>")
+            html.append(
+                "            <img src='insert_performance.png' alt='Insert Performance' style='max-width: 100%; height: auto; margin-bottom: 20px;'>"
+            )
+            html.append(
+                "            <img src='query_performance.png' alt='Query Performance' style='max-width: 100%; height: auto; margin-bottom: 20px;'>"
+            )
+            html.append(
+                "            <img src='index_performance.png' alt='Index Performance' style='max-width: 100%; height: auto; margin-bottom: 20px;'>"
+            )
+            html.append(
+                "            <img src='performance_matrix.png' alt='Performance Matrix' style='max-width: 100%; height: auto; margin-bottom: 20px;'>"
+            )
+            html.append(
+                "            <img src='filesystem_comparison.png' alt='Filesystem Comparison' style='max-width: 100%; height: auto; margin-bottom: 20px;'>"
+            )
+            html.append("        </div>")
+            html.append("    </div>")
+
             html.append("    <div class='section'>")
             html.append("        <h2>📝 Notes</h2>")
             html.append("        <ul>")
@@ -661,10 +868,11 @@ class ResultsAnalyzer:
             return "\n".join(html)
 
         except Exception as e:
-            self.logger.error(f"Error generating HTML report: {e}")
-            return (
-                f"<html><body><h1>Error generating HTML report: {e}</h1></body></html>"
-            )
+            import traceback
+
+            tb = traceback.format_exc()
+            self.logger.error(f"Error generating HTML report: {e}\n{tb}")
+            return f"<html><body><h1>Error generating HTML report: {e}</h1><pre>{tb}</pre></body></html>"
 
     def generate_graphs(self) -> bool:
         """Generate performance visualization graphs"""
@@ -691,6 +899,9 @@ class ResultsAnalyzer:
             # Graph 4: Performance Comparison Matrix
             self._plot_performance_matrix()
 
+            # Graph 5: Multi-filesystem Comparison (if applicable)
+            self._plot_filesystem_comparison()
+
             self.logger.info("Graphs generated successfully")
             return True
 
@@ -699,34 +910,188 @@ class ResultsAnalyzer:
             return False
 
     def _plot_insert_performance(self):
-        """Plot insert performance metrics"""
-        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
+        """Plot insert performance metrics with node differentiation"""
+        # Group data by node
+        node_performance = {}
 
-        # Extract insert data
-        iterations = []
-        insert_rates = []
-        insert_times = []
+        for result in self.results_data:
+            hostname, is_dev = self._extract_node_info(result)
+
+            if hostname not in node_performance:
+                node_performance[hostname] = {
+                    "insert_rates": [],
+                    "insert_times": [],
+                    "iterations": [],
+                    "is_dev": is_dev,
+                }
 
-        for i, result in enumerate(self.results_data):
             insert_perf = result.get("insert_performance", {})
             if insert_perf:
-                iterations.append(i + 1)
-                insert_rates.append(insert_perf.get("vectors_per_second", 0))
-                insert_times.append(insert_perf.get("total_time_seconds", 0))
-
-        # Plot insert rate
-        ax1.plot(iterations, insert_rates, "b-o", linewidth=2, markersize=6)
-        ax1.set_xlabel("Iteration")
-        ax1.set_ylabel("Vectors/Second")
-        ax1.set_title("Vector Insert Rate Performance")
-        ax1.grid(True, alpha=0.3)
-
-        # Plot insert time
-        ax2.plot(iterations, insert_times, "r-o", linewidth=2, markersize=6)
-        ax2.set_xlabel("Iteration")
-        ax2.set_ylabel("Total Time (seconds)")
-        ax2.set_title("Vector Insert Time Performance")
-        ax2.grid(True, alpha=0.3)
+                node_performance[hostname]["insert_rates"].append(
+                    insert_perf.get("vectors_per_second", 0)
+                )
+                node_performance[hostname]["insert_times"].append(
+                    insert_perf.get("total_time_seconds", 0)
+                )
+                node_performance[hostname]["iterations"].append(
+                    len(node_performance[hostname]["insert_rates"])
+                )
+
+        # Check if we have multiple nodes
+        if len(node_performance) > 1:
+            # Multi-node mode: separate lines for each node
+            fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(18, 7))
+
+            # Sort nodes with baseline first, then dev
+            sorted_nodes = sorted(
+                node_performance.items(), key=lambda x: (x[1]["is_dev"], x[0])
+            )
+
+            # Create color palettes for baseline and dev nodes
+            baseline_colors = [
+                "#2E7D32",
+                "#43A047",
+                "#66BB6A",
+                "#81C784",
+                "#A5D6A7",
+                "#C8E6C9",
+            ]  # Greens
+            dev_colors = [
+                "#0D47A1",
+                "#1565C0",
+                "#1976D2",
+                "#1E88E5",
+                "#2196F3",
+                "#42A5F5",
+                "#64B5F6",
+            ]  # Blues
+
+            # Additional colors if needed
+            extra_colors = [
+                "#E65100",
+                "#F57C00",
+                "#FF9800",
+                "#FFB300",
+                "#FFC107",
+                "#FFCA28",
+            ]  # Oranges
+
+            # Line styles to cycle through
+            line_styles = ["-", "--", "-.", ":"]
+            markers = ["o", "s", "^", "v", "D", "p", "*", "h"]
+
+            baseline_idx = 0
+            dev_idx = 0
+
+            # Use different colors and styles for each node
+            for idx, (hostname, perf_data) in enumerate(sorted_nodes):
+                if not perf_data["insert_rates"]:
+                    continue
+
+                # Choose color and style based on node type and index
+                if perf_data["is_dev"]:
+                    # Development nodes - blues
+                    color = dev_colors[dev_idx % len(dev_colors)]
+                    linestyle = line_styles[
+                        (dev_idx // len(dev_colors)) % len(line_styles)
+                    ]
+                    marker = markers[4 + (dev_idx % 4)]  # Use markers 4-7 for dev
+                    label = f"{hostname} (Dev)"
+                    dev_idx += 1
+                else:
+                    # Baseline nodes - greens
+                    color = baseline_colors[baseline_idx % len(baseline_colors)]
+                    linestyle = line_styles[
+                        (baseline_idx // len(baseline_colors)) % len(line_styles)
+                    ]
+                    marker = markers[
+                        baseline_idx % 4
+                    ]  # Use first 4 markers for baseline
+                    label = f"{hostname} (Baseline)"
+                    baseline_idx += 1
+
+                iterations = list(range(1, len(perf_data["insert_rates"]) + 1))
+
+                # Plot insert rate with alpha for better visibility
+                ax1.plot(
+                    iterations,
+                    perf_data["insert_rates"],
+                    color=color,
+                    linestyle=linestyle,
+                    marker=marker,
+                    linewidth=1.5,
+                    markersize=5,
+                    label=label,
+                    alpha=0.8,
+                )
+
+                # Plot insert time
+                ax2.plot(
+                    iterations,
+                    perf_data["insert_times"],
+                    color=color,
+                    linestyle=linestyle,
+                    marker=marker,
+                    linewidth=1.5,
+                    markersize=5,
+                    label=label,
+                    alpha=0.8,
+                )
+
+            ax1.set_xlabel("Iteration")
+            ax1.set_ylabel("Vectors/Second")
+            ax1.set_title("Milvus Insert Rate by Node")
+            ax1.grid(True, alpha=0.3)
+            # Position legend outside plot area for better visibility with many nodes
+            ax1.legend(bbox_to_anchor=(1.05, 1), loc="upper left", fontsize=7, ncol=1)
+
+            ax2.set_xlabel("Iteration")
+            ax2.set_ylabel("Total Time (seconds)")
+            ax2.set_title("Milvus Insert Time by Node")
+            ax2.grid(True, alpha=0.3)
+            # Position legend outside plot area for better visibility with many nodes
+            ax2.legend(bbox_to_anchor=(1.05, 1), loc="upper left", fontsize=7, ncol=1)
+
+            plt.suptitle(
+                "Insert Performance Analysis: Baseline vs Development",
+                fontsize=14,
+                y=1.02,
+            )
+        else:
+            # Single node mode: original behavior
+            fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
+
+            # Extract insert data from single node
+            hostname = list(node_performance.keys())[0] if node_performance else None
+            if hostname:
+                perf_data = node_performance[hostname]
+                iterations = list(range(1, len(perf_data["insert_rates"]) + 1))
+
+                # Plot insert rate
+                ax1.plot(
+                    iterations,
+                    perf_data["insert_rates"],
+                    "b-o",
+                    linewidth=2,
+                    markersize=6,
+                )
+                ax1.set_xlabel("Iteration")
+                ax1.set_ylabel("Vectors/Second")
+                ax1.set_title(f"Vector Insert Rate Performance - {hostname}")
+                ax1.grid(True, alpha=0.3)
+
+                # Plot insert time
+                ax2.plot(
+                    iterations,
+                    perf_data["insert_times"],
+                    "r-o",
+                    linewidth=2,
+                    markersize=6,
+                )
+                ax2.set_xlabel("Iteration")
+                ax2.set_ylabel("Total Time (seconds)")
+                ax2.set_title(f"Vector Insert Time Performance - {hostname}")
+                ax2.grid(True, alpha=0.3)
 
         plt.tight_layout()
         output_file = os.path.join(
@@ -739,52 +1104,110 @@ class ResultsAnalyzer:
         plt.close()
 
     def _plot_query_performance(self):
-        """Plot query performance metrics"""
+        """Plot query performance metrics comparing baseline vs dev nodes"""
         if not self.results_data:
             return
 
-        # Collect query performance data
-        query_data = []
+        # Group data by filesystem configuration
+        fs_groups = {}
         for result in self.results_data:
+            hostname, is_dev = self._extract_node_info(result)
+            fs_type, block_size, config_key = self._extract_filesystem_config(result)
+
+            if config_key not in fs_groups:
+                fs_groups[config_key] = {"baseline": [], "dev": []}
+
             query_perf = result.get("query_performance", {})
-            for topk, topk_data in query_perf.items():
-                for batch, batch_data in topk_data.items():
-                    query_data.append(
-                        {
-                            "topk": topk.replace("topk_", ""),
-                            "batch": batch.replace("batch_", ""),
-                            "qps": batch_data.get("queries_per_second", 0),
-                            "avg_time": batch_data.get("average_time_seconds", 0)
-                            * 1000,  # Convert to ms
-                        }
-                    )
+            if query_perf:
+                node_type = "dev" if is_dev else "baseline"
+                for topk, topk_data in query_perf.items():
+                    for batch, batch_data in topk_data.items():
+                        fs_groups[config_key][node_type].append(
+                            {
+                                "hostname": hostname,
+                                "topk": topk.replace("topk_", ""),
+                                "batch": batch.replace("batch_", ""),
+                                "qps": batch_data.get("queries_per_second", 0),
+                                "avg_time": batch_data.get("average_time_seconds", 0)
+                                * 1000,
+                            }
+                        )
 
-        if not query_data:
+        if not fs_groups:
             return
 
-        df = pd.DataFrame(query_data)
+        # Create subplots for each filesystem config
+        n_configs = len(fs_groups)
+        fig_height = max(8, 4 * n_configs)
+        fig, axes = plt.subplots(n_configs, 2, figsize=(16, fig_height))
 
-        # Create subplots
-        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
+        if n_configs == 1:
+            axes = axes.reshape(1, -1)
 
-        # QPS heatmap
-        qps_pivot = df.pivot_table(
-            values="qps", index="topk", columns="batch", aggfunc="mean"
-        )
-        sns.heatmap(qps_pivot, annot=True, fmt=".1f", ax=ax1, cmap="YlOrRd")
-        ax1.set_title("Queries Per Second (QPS)")
-        ax1.set_xlabel("Batch Size")
-        ax1.set_ylabel("Top-K")
-
-        # Latency heatmap
-        latency_pivot = df.pivot_table(
-            values="avg_time", index="topk", columns="batch", aggfunc="mean"
-        )
-        sns.heatmap(latency_pivot, annot=True, fmt=".1f", ax=ax2, cmap="YlOrRd")
-        ax2.set_title("Average Query Latency (ms)")
-        ax2.set_xlabel("Batch Size")
-        ax2.set_ylabel("Top-K")
+        for idx, (config_key, data) in enumerate(sorted(fs_groups.items())):
+            # Create DataFrames for baseline and dev
+            baseline_df = (
+                pd.DataFrame(data["baseline"]) if data["baseline"] else pd.DataFrame()
+            )
+            dev_df = pd.DataFrame(data["dev"]) if data["dev"] else pd.DataFrame()
+
+            # Baseline QPS heatmap
+            ax_base = axes[idx][0]
+            if not baseline_df.empty:
+                baseline_pivot = baseline_df.pivot_table(
+                    values="qps", index="topk", columns="batch", aggfunc="mean"
+                )
+                sns.heatmap(
+                    baseline_pivot,
+                    annot=True,
+                    fmt=".1f",
+                    ax=ax_base,
+                    cmap="Greens",
+                    cbar_kws={"label": "QPS"},
+                )
+                ax_base.set_title(f"{config_key.upper()} - Baseline QPS")
+                ax_base.set_xlabel("Batch Size")
+                ax_base.set_ylabel("Top-K")
+            else:
+                ax_base.text(
+                    0.5,
+                    0.5,
+                    f"No baseline data for {config_key}",
+                    ha="center",
+                    va="center",
+                    transform=ax_base.transAxes,
+                )
+                ax_base.set_title(f"{config_key.upper()} - Baseline QPS")
 
+            # Dev QPS heatmap
+            ax_dev = axes[idx][1]
+            if not dev_df.empty:
+                dev_pivot = dev_df.pivot_table(
+                    values="qps", index="topk", columns="batch", aggfunc="mean"
+                )
+                sns.heatmap(
+                    dev_pivot,
+                    annot=True,
+                    fmt=".1f",
+                    ax=ax_dev,
+                    cmap="Blues",
+                    cbar_kws={"label": "QPS"},
+                )
+                ax_dev.set_title(f"{config_key.upper()} - Development QPS")
+                ax_dev.set_xlabel("Batch Size")
+                ax_dev.set_ylabel("Top-K")
+            else:
+                ax_dev.text(
+                    0.5,
+                    0.5,
+                    f"No dev data for {config_key}",
+                    ha="center",
+                    va="center",
+                    transform=ax_dev.transAxes,
+                )
+                ax_dev.set_title(f"{config_key.upper()} - Development QPS")
+
+        plt.suptitle("Query Performance: Baseline vs Development", fontsize=16, y=1.02)
         plt.tight_layout()
         output_file = os.path.join(
             self.output_dir,
@@ -796,32 +1219,101 @@ class ResultsAnalyzer:
         plt.close()
 
     def _plot_index_performance(self):
-        """Plot index creation performance"""
-        iterations = []
-        index_times = []
+        """Plot index creation performance comparing baseline vs dev"""
+        # Group by filesystem configuration
+        fs_groups = {}
+
+        for result in self.results_data:
+            hostname, is_dev = self._extract_node_info(result)
+            fs_type, block_size, config_key = self._extract_filesystem_config(result)
+
+            if config_key not in fs_groups:
+                fs_groups[config_key] = {"baseline": [], "dev": []}
 
-        for i, result in enumerate(self.results_data):
             index_perf = result.get("index_performance", {})
             if index_perf:
-                iterations.append(i + 1)
-                index_times.append(index_perf.get("creation_time_seconds", 0))
+                time = index_perf.get("creation_time_seconds", 0)
+                if time > 0:
+                    node_type = "dev" if is_dev else "baseline"
+                    fs_groups[config_key][node_type].append(time)
 
-        if not index_times:
+        if not fs_groups:
             return
 
-        plt.figure(figsize=(10, 6))
-        plt.bar(iterations, index_times, alpha=0.7, color="green")
-        plt.xlabel("Iteration")
-        plt.ylabel("Index Creation Time (seconds)")
-        plt.title("Index Creation Performance")
-        plt.grid(True, alpha=0.3)
-
-        # Add average line
-        avg_time = np.mean(index_times)
-        plt.axhline(
-            y=avg_time, color="red", linestyle="--", label=f"Average: {avg_time:.2f}s"
+        # Create comparison bar chart
+        fig, ax = plt.subplots(figsize=(14, 8))
+
+        configs = sorted(fs_groups.keys())
+        x = np.arange(len(configs))
+        width = 0.35
+
+        # Calculate averages for each config
+        baseline_avgs = []
+        dev_avgs = []
+        baseline_stds = []
+        dev_stds = []
+
+        for config in configs:
+            baseline_times = fs_groups[config]["baseline"]
+            dev_times = fs_groups[config]["dev"]
+
+            baseline_avgs.append(np.mean(baseline_times) if baseline_times else 0)
+            dev_avgs.append(np.mean(dev_times) if dev_times else 0)
+            baseline_stds.append(np.std(baseline_times) if baseline_times else 0)
+            dev_stds.append(np.std(dev_times) if dev_times else 0)
+
+        # Create bars
+        bars1 = ax.bar(
+            x - width / 2,
+            baseline_avgs,
+            width,
+            yerr=baseline_stds,
+            label="Baseline",
+            color="#4CAF50",
+            capsize=5,
+        )
+        bars2 = ax.bar(
+            x + width / 2,
+            dev_avgs,
+            width,
+            yerr=dev_stds,
+            label="Development",
+            color="#2196F3",
+            capsize=5,
         )
-        plt.legend()
+
+        # Add value labels on bars
+        for bar, val in zip(bars1, baseline_avgs):
+            if val > 0:
+                height = bar.get_height()
+                ax.text(
+                    bar.get_x() + bar.get_width() / 2.0,
+                    height,
+                    f"{val:.3f}s",
+                    ha="center",
+                    va="bottom",
+                    fontsize=9,
+                )
+
+        for bar, val in zip(bars2, dev_avgs):
+            if val > 0:
+                height = bar.get_height()
+                ax.text(
+                    bar.get_x() + bar.get_width() / 2.0,
+                    height,
+                    f"{val:.3f}s",
+                    ha="center",
+                    va="bottom",
+                    fontsize=9,
+                )
+
+        ax.set_xlabel("Filesystem Configuration", fontsize=12)
+        ax.set_ylabel("Index Creation Time (seconds)", fontsize=12)
+        ax.set_title("Index Creation Performance: Baseline vs Development", fontsize=14)
+        ax.set_xticks(x)
+        ax.set_xticklabels([c.upper() for c in configs], rotation=45, ha="right")
+        ax.legend(loc="upper right")
+        ax.grid(True, alpha=0.3, axis="y")
 
         output_file = os.path.join(
             self.output_dir,
@@ -833,61 +1325,148 @@ class ResultsAnalyzer:
         plt.close()
 
     def _plot_performance_matrix(self):
-        """Plot comprehensive performance comparison matrix"""
+        """Plot performance comparison matrix for each filesystem config"""
         if len(self.results_data) < 2:
             return
 
-        # Extract key metrics for comparison
-        metrics = []
-        for i, result in enumerate(self.results_data):
+        # Group by filesystem configuration
+        fs_metrics = {}
+
+        for result in self.results_data:
+            hostname, is_dev = self._extract_node_info(result)
+            fs_type, block_size, config_key = self._extract_filesystem_config(result)
+
+            if config_key not in fs_metrics:
+                fs_metrics[config_key] = {"baseline": [], "dev": []}
+
+            # Collect metrics
             insert_perf = result.get("insert_performance", {})
             index_perf = result.get("index_performance", {})
+            query_perf = result.get("query_performance", {})
 
             metric = {
-                "iteration": i + 1,
+                "hostname": hostname,
                 "insert_rate": insert_perf.get("vectors_per_second", 0),
                 "index_time": index_perf.get("creation_time_seconds", 0),
             }
 
-            # Add query metrics
-            query_perf = result.get("query_performance", {})
+            # Get representative query performance (topk_10, batch_1)
             if "topk_10" in query_perf and "batch_1" in query_perf["topk_10"]:
                 metric["query_qps"] = query_perf["topk_10"]["batch_1"].get(
                     "queries_per_second", 0
                 )
+            else:
+                metric["query_qps"] = 0
 
-            metrics.append(metric)
+            node_type = "dev" if is_dev else "baseline"
+            fs_metrics[config_key][node_type].append(metric)
 
-        df = pd.DataFrame(metrics)
+        if not fs_metrics:
+            return
 
-        # Normalize metrics for comparison
-        numeric_cols = ["insert_rate", "index_time", "query_qps"]
-        for col in numeric_cols:
-            if col in df.columns:
-                df[f"{col}_norm"] = (df[col] - df[col].min()) / (
-                    df[col].max() - df[col].min() + 1e-6
-                )
+        # Create subplots for each filesystem
+        n_configs = len(fs_metrics)
+        n_cols = min(3, n_configs)
+        n_rows = (n_configs + n_cols - 1) // n_cols
+
+        fig, axes = plt.subplots(n_rows, n_cols, figsize=(n_cols * 6, n_rows * 5))
+        if n_rows == 1 and n_cols == 1:
+            axes = [[axes]]
+        elif n_rows == 1:
+            axes = [axes]
+        elif n_cols == 1:
+            axes = [[ax] for ax in axes]
+
+        for idx, (config_key, data) in enumerate(sorted(fs_metrics.items())):
+            row = idx // n_cols
+            col = idx % n_cols
+            ax = axes[row][col]
+
+            # Calculate averages
+            baseline_metrics = data["baseline"]
+            dev_metrics = data["dev"]
+
+            if baseline_metrics and dev_metrics:
+                categories = ["Insert Rate\n(vec/s)", "Index Time\n(s)", "Query QPS"]
+
+                baseline_avg = [
+                    np.mean([m["insert_rate"] for m in baseline_metrics]),
+                    np.mean([m["index_time"] for m in baseline_metrics]),
+                    np.mean([m["query_qps"] for m in baseline_metrics]),
+                ]
 
-        # Create radar chart
-        fig, ax = plt.subplots(figsize=(10, 8), subplot_kw=dict(projection="polar"))
+                dev_avg = [
+                    np.mean([m["insert_rate"] for m in dev_metrics]),
+                    np.mean([m["index_time"] for m in dev_metrics]),
+                    np.mean([m["query_qps"] for m in dev_metrics]),
+                ]
 
-        angles = np.linspace(0, 2 * np.pi, len(numeric_cols), endpoint=False).tolist()
-        angles += angles[:1]  # Complete the circle
+                x = np.arange(len(categories))
+                width = 0.35
 
-        for i, row in df.iterrows():
-            values = [row.get(f"{col}_norm", 0) for col in numeric_cols]
-            values += values[:1]  # Complete the circle
+                bars1 = ax.bar(
+                    x - width / 2,
+                    baseline_avg,
+                    width,
+                    label="Baseline",
+                    color="#4CAF50",
+                )
+                bars2 = ax.bar(
+                    x + width / 2, dev_avg, width, label="Development", color="#2196F3"
+                )
 
-            ax.plot(
-                angles, values, "o-", linewidth=2, label=f'Iteration {row["iteration"]}'
-            )
-            ax.fill(angles, values, alpha=0.25)
+                # Add value labels
+                for bar, val in zip(bars1, baseline_avg):
+                    height = bar.get_height()
+                    ax.text(
+                        bar.get_x() + bar.get_width() / 2.0,
+                        height,
+                        f"{val:.0f}" if val > 100 else f"{val:.2f}",
+                        ha="center",
+                        va="bottom",
+                        fontsize=8,
+                    )
 
-        ax.set_xticks(angles[:-1])
-        ax.set_xticklabels(["Insert Rate", "Index Time (inv)", "Query QPS"])
-        ax.set_ylim(0, 1)
-        ax.set_title("Performance Comparison Matrix (Normalized)", y=1.08)
-        ax.legend(loc="upper right", bbox_to_anchor=(1.3, 1.0))
+                for bar, val in zip(bars2, dev_avg):
+                    height = bar.get_height()
+                    ax.text(
+                        bar.get_x() + bar.get_width() / 2.0,
+                        height,
+                        f"{val:.0f}" if val > 100 else f"{val:.2f}",
+                        ha="center",
+                        va="bottom",
+                        fontsize=8,
+                    )
+
+                ax.set_xlabel("Metrics")
+                ax.set_ylabel("Value")
+                ax.set_title(f"{config_key.upper()}")
+                ax.set_xticks(x)
+                ax.set_xticklabels(categories)
+                ax.legend(loc="upper right", fontsize=8)
+                ax.grid(True, alpha=0.3, axis="y")
+            else:
+                ax.text(
+                    0.5,
+                    0.5,
+                    f"Insufficient data\nfor {config_key}",
+                    ha="center",
+                    va="center",
+                    transform=ax.transAxes,
+                )
+                ax.set_title(f"{config_key.upper()}")
+
+        # Hide unused subplots
+        for idx in range(n_configs, n_rows * n_cols):
+            row = idx // n_cols
+            col = idx % n_cols
+            axes[row][col].set_visible(False)
+
+        plt.suptitle(
+            "Performance Comparison Matrix: Baseline vs Development",
+            fontsize=14,
+            y=1.02,
+        )
 
         output_file = os.path.join(
             self.output_dir,
@@ -898,6 +1477,149 @@ class ResultsAnalyzer:
         )
         plt.close()
 
+    def _plot_filesystem_comparison(self):
+        """Plot node performance comparison chart"""
+        if len(self.results_data) < 2:
+            return
+
+        # Group results by node
+        node_performance = {}
+
+        for result in self.results_data:
+            hostname, is_dev = self._extract_node_info(result)
+
+            if hostname not in node_performance:
+                node_performance[hostname] = {
+                    "insert_rates": [],
+                    "index_times": [],
+                    "query_qps": [],
+                    "is_dev": is_dev,
+                }
+
+            # Collect metrics
+            insert_perf = result.get("insert_performance", {})
+            if insert_perf:
+                node_performance[hostname]["insert_rates"].append(
+                    insert_perf.get("vectors_per_second", 0)
+                )
+
+            index_perf = result.get("index_performance", {})
+            if index_perf:
+                node_performance[hostname]["index_times"].append(
+                    index_perf.get("creation_time_seconds", 0)
+                )
+
+            # Get top-10 batch-1 query performance as representative
+            query_perf = result.get("query_performance", {})
+            if "topk_10" in query_perf and "batch_1" in query_perf["topk_10"]:
+                qps = query_perf["topk_10"]["batch_1"].get("queries_per_second", 0)
+                node_performance[hostname]["query_qps"].append(qps)
+
+        # Only create comparison if we have multiple nodes
+        if len(node_performance) > 1:
+            # Calculate averages
+            node_metrics = {}
+            for hostname, perf_data in node_performance.items():
+                node_metrics[hostname] = {
+                    "avg_insert_rate": (
+                        np.mean(perf_data["insert_rates"])
+                        if perf_data["insert_rates"]
+                        else 0
+                    ),
+                    "avg_index_time": (
+                        np.mean(perf_data["index_times"])
+                        if perf_data["index_times"]
+                        else 0
+                    ),
+                    "avg_query_qps": (
+                        np.mean(perf_data["query_qps"]) if perf_data["query_qps"] else 0
+                    ),
+                    "is_dev": perf_data["is_dev"],
+                }
+
+            # Create comparison bar chart with more space
+            fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(24, 8))
+
+            # Sort nodes with baseline first
+            sorted_nodes = sorted(
+                node_metrics.items(), key=lambda x: (x[1]["is_dev"], x[0])
+            )
+            node_names = [hostname for hostname, _ in sorted_nodes]
+
+            # Use different colors for baseline vs dev
+            colors = [
+                "#4CAF50" if not node_metrics[hostname]["is_dev"] else "#2196F3"
+                for hostname in node_names
+            ]
+
+            # Add labels for clarity
+            labels = [
+                f"{hostname}\n({'Dev' if node_metrics[hostname]['is_dev'] else 'Baseline'})"
+                for hostname in node_names
+            ]
+
+            # Insert rate comparison
+            insert_rates = [
+                node_metrics[hostname]["avg_insert_rate"] for hostname in node_names
+            ]
+            bars1 = ax1.bar(labels, insert_rates, color=colors)
+            ax1.set_title("Average Milvus Insert Rate by Node")
+            ax1.set_ylabel("Vectors/Second")
+            # Rotate labels for better readability
+            ax1.set_xticklabels(labels, rotation=45, ha="right", fontsize=8)
+
+            # Index time comparison (lower is better)
+            index_times = [
+                node_metrics[hostname]["avg_index_time"] for hostname in node_names
+            ]
+            bars2 = ax2.bar(labels, index_times, color=colors)
+            ax2.set_title("Average Milvus Index Time by Node")
+            ax2.set_ylabel("Seconds (Lower is Better)")
+            ax2.set_xticklabels(labels, rotation=45, ha="right", fontsize=8)
+
+            # Query QPS comparison
+            query_qps = [
+                node_metrics[hostname]["avg_query_qps"] for hostname in node_names
+            ]
+            bars3 = ax3.bar(labels, query_qps, color=colors)
+            ax3.set_title("Average Milvus Query QPS by Node")
+            ax3.set_ylabel("Queries/Second")
+            ax3.set_xticklabels(labels, rotation=45, ha="right", fontsize=8)
+
+            # Add value labels on bars
+            for bars, values in [
+                (bars1, insert_rates),
+                (bars2, index_times),
+                (bars3, query_qps),
+            ]:
+                for bar, value in zip(bars, values):
+                    height = bar.get_height()
+                    ax = bar.axes
+                    ax.text(
+                        bar.get_x() + bar.get_width() / 2.0,
+                        height + height * 0.01,
+                        f"{value:.1f}",
+                        ha="center",
+                        va="bottom",
+                        fontsize=10,
+                    )
+
+            plt.suptitle(
+                "Milvus Performance Comparison: Baseline vs Development Nodes",
+                fontsize=16,
+                y=1.02,
+            )
+            plt.tight_layout()
+
+            output_file = os.path.join(
+                self.output_dir,
+                f"filesystem_comparison.{self.config.get('graph_format', 'png')}",
+            )
+            plt.savefig(
+                output_file, dpi=self.config.get("graph_dpi", 300), bbox_inches="tight"
+            )
+            plt.close()
+
     def analyze(self) -> bool:
         """Run complete analysis"""
         self.logger.info("Starting results analysis...")
diff --git a/workflows/ai/scripts/generate_graphs.py b/workflows/ai/scripts/generate_graphs.py
index 2e183e86..fafc62bf 100755
--- a/workflows/ai/scripts/generate_graphs.py
+++ b/workflows/ai/scripts/generate_graphs.py
@@ -9,7 +9,6 @@ import sys
 import glob
 import numpy as np
 import matplotlib
-
 matplotlib.use("Agg")  # Use non-interactive backend
 import matplotlib.pyplot as plt
 from datetime import datetime
@@ -17,6 +16,66 @@ from pathlib import Path
 from collections import defaultdict
 
 
+def _extract_filesystem_config(result):
+    """Extract filesystem type and block size from result data.
+    Returns (fs_type, block_size, config_key)"""
+    filename = result.get("_file", "")
+
+    # Primary: Extract filesystem type from filename (more reliable than JSON)
+    fs_type = "unknown"
+    block_size = "default"
+
+    if "xfs" in filename:
+        fs_type = "xfs"
+        # Check larger sizes first to avoid substring matches
+        if "64k" in filename and "64k-" in filename:
+            block_size = "64k"
+        elif "32k" in filename and "32k-" in filename:
+            block_size = "32k"
+        elif "16k" in filename and "16k-" in filename:
+            block_size = "16k"
+        elif "4k" in filename and "4k-" in filename:
+            block_size = "4k"
+    elif "ext4" in filename:
+        fs_type = "ext4"
+        if "4k" in filename and "4k-" in filename:
+            block_size = "4k"
+        elif "16k" in filename and "16k-" in filename:
+            block_size = "16k"
+    elif "btrfs" in filename:
+        fs_type = "btrfs"
+
+    # Fallback: Check JSON data if filename parsing failed
+    if fs_type == "unknown":
+        fs_type = result.get("filesystem", "unknown")
+
+    # Create descriptive config key
+    config_key = f"{fs_type}-{block_size}" if block_size != "default" else fs_type
+    return fs_type, block_size, config_key
+
+
+def _extract_node_info(result):
+    """Extract node hostname and determine if it's a dev node.
+    Returns (hostname, is_dev_node)"""
+    # Get hostname from system_info (preferred) or fall back to filename
+    system_info = result.get("system_info", {})
+    hostname = system_info.get("hostname", "")
+    
+    # If no hostname in system_info, try extracting from filename
+    if not hostname:
+        filename = result.get("_file", "")
+        # Remove results_ prefix and .json suffix
+        hostname = filename.replace("results_", "").replace(".json", "")
+        # Remove iteration number if present (_1, _2, etc.)
+        if "_" in hostname and hostname.split("_")[-1].isdigit():
+            hostname = "_".join(hostname.split("_")[:-1])
+    
+    # Determine if this is a dev node
+    is_dev = hostname.endswith("-dev")
+    
+    return hostname, is_dev
+
+
 def load_results(results_dir):
     """Load all JSON result files from the directory"""
     results = []
@@ -27,63 +86,8 @@ def load_results(results_dir):
         try:
             with open(json_file, "r") as f:
                 data = json.load(f)
-                # Extract filesystem info - prefer from JSON data over filename
-                filename = os.path.basename(json_file)
-
-                # First, try to get filesystem from the JSON data itself
-                fs_type = data.get("filesystem", None)
-
-                # If not in JSON, try to parse from filename (backwards compatibility)
-                if not fs_type:
-                    parts = (
-                        filename.replace("results_", "").replace(".json", "").split("-")
-                    )
-
-                    # Parse host info
-                    if "debian13-ai-" in filename:
-                        host_parts = (
-                            filename.replace("results_debian13-ai-", "")
-                            .replace("_1.json", "")
-                            .replace("_2.json", "")
-                            .replace("_3.json", "")
-                            .split("-")
-                        )
-                        if "xfs" in host_parts[0]:
-                            fs_type = "xfs"
-                            # Extract block size (e.g., "4k", "16k", etc.)
-                            block_size = (
-                                host_parts[1] if len(host_parts) > 1 else "unknown"
-                            )
-                        elif "ext4" in host_parts[0]:
-                            fs_type = "ext4"
-                            block_size = host_parts[1] if len(host_parts) > 1 else "4k"
-                        elif "btrfs" in host_parts[0]:
-                            fs_type = "btrfs"
-                            block_size = "default"
-                        else:
-                            fs_type = "unknown"
-                            block_size = "unknown"
-                    else:
-                        fs_type = "unknown"
-                        block_size = "unknown"
-                else:
-                    # If filesystem came from JSON, set appropriate block size
-                    if fs_type == "btrfs":
-                        block_size = "default"
-                    elif fs_type in ["ext4", "xfs"]:
-                        block_size = data.get("block_size", "4k")
-                    else:
-                        block_size = data.get("block_size", "default")
-
-                is_dev = "dev" in filename
-
-                # Use filesystem from JSON if available, otherwise use parsed value
-                if "filesystem" not in data:
-                    data["filesystem"] = fs_type
-                data["block_size"] = block_size
-                data["is_dev"] = is_dev
-                data["filename"] = filename
-
+                # Add filename for filesystem detection
+                data["_file"] = os.path.basename(json_file)
                 results.append(data)
         except Exception as e:
             print(f"Error loading {json_file}: {e}")
@@ -91,1023 +95,240 @@ def load_results(results_dir):
     return results
 
 
-def create_filesystem_comparison_chart(results, output_dir):
-    """Create a bar chart comparing performance across filesystems"""
-    # Group by filesystem and baseline/dev
-    fs_data = defaultdict(lambda: {"baseline": [], "dev": []})
-
-    for result in results:
-        fs = result.get("filesystem", "unknown")
-        category = "dev" if result.get("is_dev", False) else "baseline"
-
-        # Extract actual performance data from results
-        if "insert_performance" in result:
-            insert_qps = result["insert_performance"].get("vectors_per_second", 0)
-        else:
-            insert_qps = 0
-        fs_data[fs][category].append(insert_qps)
-
-    # Prepare data for plotting
-    filesystems = list(fs_data.keys())
-    baseline_means = [
-        np.mean(fs_data[fs]["baseline"]) if fs_data[fs]["baseline"] else 0
-        for fs in filesystems
-    ]
-    dev_means = [
-        np.mean(fs_data[fs]["dev"]) if fs_data[fs]["dev"] else 0 for fs in filesystems
-    ]
-
-    x = np.arange(len(filesystems))
-    width = 0.35
-
-    fig, ax = plt.subplots(figsize=(10, 6))
-    baseline_bars = ax.bar(
-        x - width / 2, baseline_means, width, label="Baseline", color="#1f77b4"
-    )
-    dev_bars = ax.bar(
-        x + width / 2, dev_means, width, label="Development", color="#ff7f0e"
-    )
-
-    ax.set_xlabel("Filesystem")
-    ax.set_ylabel("Insert QPS")
-    ax.set_title("Vector Database Performance by Filesystem")
-    ax.set_xticks(x)
-    ax.set_xticklabels(filesystems)
-    ax.legend()
-    ax.grid(True, alpha=0.3)
-
-    # Add value labels on bars
-    for bars in [baseline_bars, dev_bars]:
-        for bar in bars:
-            height = bar.get_height()
-            if height > 0:
-                ax.annotate(
-                    f"{height:.0f}",
-                    xy=(bar.get_x() + bar.get_width() / 2, height),
-                    xytext=(0, 3),
-                    textcoords="offset points",
-                    ha="center",
-                    va="bottom",
-                )
-
-    plt.tight_layout()
-    plt.savefig(os.path.join(output_dir, "filesystem_comparison.png"), dpi=150)
-    plt.close()
-
-
-def create_block_size_analysis(results, output_dir):
-    """Create analysis for different block sizes (XFS specific)"""
-    # Filter XFS results
-    xfs_results = [r for r in results if r.get("filesystem") == "xfs"]
-
-    if not xfs_results:
+def create_simple_performance_trends(results, output_dir):
+    """Create multi-node performance trends chart"""
+    if not results:
         return
 
-    # Group by block size
-    block_size_data = defaultdict(lambda: {"baseline": [], "dev": []})
-
-    for result in xfs_results:
-        block_size = result.get("block_size", "unknown")
-        category = "dev" if result.get("is_dev", False) else "baseline"
-        if "insert_performance" in result:
-            insert_qps = result["insert_performance"].get("vectors_per_second", 0)
-        else:
-            insert_qps = 0
-        block_size_data[block_size][category].append(insert_qps)
-
-    # Sort block sizes
-    block_sizes = sorted(
-        block_size_data.keys(),
-        key=lambda x: (
-            int(x.replace("k", "").replace("s", ""))
-            if x not in ["unknown", "default"]
-            else 0
-        ),
-    )
-
-    # Create grouped bar chart
-    baseline_means = [
-        (
-            np.mean(block_size_data[bs]["baseline"])
-            if block_size_data[bs]["baseline"]
-            else 0
-        )
-        for bs in block_sizes
-    ]
-    dev_means = [
-        np.mean(block_size_data[bs]["dev"]) if block_size_data[bs]["dev"] else 0
-        for bs in block_sizes
-    ]
-
-    x = np.arange(len(block_sizes))
-    width = 0.35
-
-    fig, ax = plt.subplots(figsize=(12, 6))
-    baseline_bars = ax.bar(
-        x - width / 2, baseline_means, width, label="Baseline", color="#2ca02c"
-    )
-    dev_bars = ax.bar(
-        x + width / 2, dev_means, width, label="Development", color="#d62728"
-    )
-
-    ax.set_xlabel("Block Size")
-    ax.set_ylabel("Insert QPS")
-    ax.set_title("XFS Performance by Block Size")
-    ax.set_xticks(x)
-    ax.set_xticklabels(block_sizes)
-    ax.legend()
-    ax.grid(True, alpha=0.3)
-
-    # Add value labels
-    for bars in [baseline_bars, dev_bars]:
-        for bar in bars:
-            height = bar.get_height()
-            if height > 0:
-                ax.annotate(
-                    f"{height:.0f}",
-                    xy=(bar.get_x() + bar.get_width() / 2, height),
-                    xytext=(0, 3),
-                    textcoords="offset points",
-                    ha="center",
-                    va="bottom",
-                )
-
-    plt.tight_layout()
-    plt.savefig(os.path.join(output_dir, "xfs_block_size_analysis.png"), dpi=150)
-    plt.close()
-
-
-def create_heatmap_analysis(results, output_dir):
-    """Create a heatmap showing AVERAGE performance across all test iterations"""
-    # Group data by configuration and version, collecting ALL values for averaging
-    config_data = defaultdict(
-        lambda: {
-            "baseline": {"insert": [], "query": [], "count": 0},
-            "dev": {"insert": [], "query": [], "count": 0},
-        }
-    )
+    # Group results by node
+    node_performance = defaultdict(lambda: {
+        "insert_rates": [],
+        "insert_times": [],
+        "iterations": [],
+        "is_dev": False,
+    })
 
     for result in results:
-        fs = result.get("filesystem", "unknown")
-        block_size = result.get("block_size", "default")
-        config = f"{fs}-{block_size}"
-        version = "dev" if result.get("is_dev", False) else "baseline"
-
-        # Get actual insert performance
-        if "insert_performance" in result:
-            insert_qps = result["insert_performance"].get("vectors_per_second", 0)
-        else:
-            insert_qps = 0
-
-        # Calculate average query QPS
-        query_qps = 0
-        if "query_performance" in result:
-            qp = result["query_performance"]
-            total_qps = 0
-            count = 0
-            for topk_key in ["topk_1", "topk_10", "topk_100"]:
-                if topk_key in qp:
-                    for batch_key in ["batch_1", "batch_10", "batch_100"]:
-                        if batch_key in qp[topk_key]:
-                            total_qps += qp[topk_key][batch_key].get(
-                                "queries_per_second", 0
-                            )
-                            count += 1
-            if count > 0:
-                query_qps = total_qps / count
-
-        # Collect all values for averaging
-        config_data[config][version]["insert"].append(insert_qps)
-        config_data[config][version]["query"].append(query_qps)
-        config_data[config][version]["count"] += 1
-
-    # Sort configurations
-    configs = sorted(config_data.keys())
-
-    # Calculate averages for heatmap
-    insert_baseline = []
-    insert_dev = []
-    query_baseline = []
-    query_dev = []
-    iteration_counts = {"baseline": 0, "dev": 0}
-
-    for c in configs:
-        # Calculate average insert QPS
-        baseline_insert_vals = config_data[c]["baseline"]["insert"]
-        insert_baseline.append(
-            np.mean(baseline_insert_vals) if baseline_insert_vals else 0
-        )
-
-        dev_insert_vals = config_data[c]["dev"]["insert"]
-        insert_dev.append(np.mean(dev_insert_vals) if dev_insert_vals else 0)
-
-        # Calculate average query QPS
-        baseline_query_vals = config_data[c]["baseline"]["query"]
-        query_baseline.append(
-            np.mean(baseline_query_vals) if baseline_query_vals else 0
-        )
-
-        dev_query_vals = config_data[c]["dev"]["query"]
-        query_dev.append(np.mean(dev_query_vals) if dev_query_vals else 0)
-
-        # Track iteration counts
-        iteration_counts["baseline"] = max(
-            iteration_counts["baseline"], len(baseline_insert_vals)
-        )
-        iteration_counts["dev"] = max(iteration_counts["dev"], len(dev_insert_vals))
-
-    # Create figure with custom heatmap
-    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 8))
-
-    # Create data matrices
-    insert_data = np.array([insert_baseline, insert_dev]).T
-    query_data = np.array([query_baseline, query_dev]).T
-
-    # Insert QPS heatmap
-    im1 = ax1.imshow(insert_data, cmap="YlOrRd", aspect="auto")
-    ax1.set_xticks([0, 1])
-    ax1.set_xticklabels(["Baseline", "Development"])
-    ax1.set_yticks(range(len(configs)))
-    ax1.set_yticklabels(configs)
-    ax1.set_title(
-        f"Insert Performance - AVERAGE across {iteration_counts['baseline']} iterations\n(1M vectors, 128 dims, HNSW index)"
-    )
-    ax1.set_ylabel("Configuration")
-
-    # Add text annotations with dynamic color based on background
-    # Get the colormap to determine actual colors
-    cmap1 = plt.cm.YlOrRd
-    norm1 = plt.Normalize(vmin=insert_data.min(), vmax=insert_data.max())
-
-    for i in range(len(configs)):
-        for j in range(2):
-            # Get the actual color from the colormap
-            val = insert_data[i, j]
-            rgba = cmap1(norm1(val))
-            # Calculate luminance using standard formula
-            # Perceived luminance: 0.299*R + 0.587*G + 0.114*B
-            luminance = 0.299 * rgba[0] + 0.587 * rgba[1] + 0.114 * rgba[2]
-            # Use white text on dark backgrounds (low luminance)
-            text_color = "white" if luminance < 0.5 else "black"
+        hostname, is_dev = _extract_node_info(result)
+        
+        if hostname not in node_performance:
+            node_performance[hostname] = {
+                "insert_rates": [],
+                "insert_times": [],
+                "iterations": [],
+                "is_dev": is_dev,
+            }
 
-            # Show average value with indicator
-            text = ax1.text(
-                j,
-                i,
-                f"{int(insert_data[i, j])}\n(avg)",
-                ha="center",
-                va="center",
-                color=text_color,
-                fontweight="bold",
-                fontsize=9,
+        insert_perf = result.get("insert_performance", {})
+        if insert_perf:
+            node_performance[hostname]["insert_rates"].append(
+                insert_perf.get("vectors_per_second", 0)
             )
-
-    # Add colorbar
-    cbar1 = plt.colorbar(im1, ax=ax1)
-    cbar1.set_label("Insert QPS")
-
-    # Query QPS heatmap
-    im2 = ax2.imshow(query_data, cmap="YlGnBu", aspect="auto")
-    ax2.set_xticks([0, 1])
-    ax2.set_xticklabels(["Baseline", "Development"])
-    ax2.set_yticks(range(len(configs)))
-    ax2.set_yticklabels(configs)
-    ax2.set_title(
-        f"Query Performance - AVERAGE across {iteration_counts['dev']} iterations\n(1M vectors, 128 dims, HNSW index)"
-    )
-
-    # Add text annotations with dynamic color based on background
-    # Get the colormap to determine actual colors
-    cmap2 = plt.cm.YlGnBu
-    norm2 = plt.Normalize(vmin=query_data.min(), vmax=query_data.max())
-
-    for i in range(len(configs)):
-        for j in range(2):
-            # Get the actual color from the colormap
-            val = query_data[i, j]
-            rgba = cmap2(norm2(val))
-            # Calculate luminance using standard formula
-            # Perceived luminance: 0.299*R + 0.587*G + 0.114*B
-            luminance = 0.299 * rgba[0] + 0.587 * rgba[1] + 0.114 * rgba[2]
-            # Use white text on dark backgrounds (low luminance)
-            text_color = "white" if luminance < 0.5 else "black"
-
-            # Show average value with indicator
-            text = ax2.text(
-                j,
-                i,
-                f"{int(query_data[i, j])}\n(avg)",
-                ha="center",
-                va="center",
-                color=text_color,
-                fontweight="bold",
-                fontsize=9,
+            fs_performance[config_key]["insert_times"].append(
+                insert_perf.get("total_time_seconds", 0)
+            )
+            fs_performance[config_key]["iterations"].append(
+                len(fs_performance[config_key]["insert_rates"])
             )
 
-    # Add colorbar
-    cbar2 = plt.colorbar(im2, ax=ax2)
-    cbar2.set_label("Query QPS")
-
-    # Add overall figure title
-    fig.suptitle(
-        "Performance Heatmap - Showing AVERAGES across Multiple Test Iterations",
-        fontsize=14,
-        fontweight="bold",
-        y=1.02,
-    )
-
-    plt.tight_layout()
-    plt.savefig(
-        os.path.join(output_dir, "performance_heatmap.png"),
-        dpi=150,
-        bbox_inches="tight",
-    )
-    plt.close()
-
-
-def create_performance_trends(results, output_dir):
-    """Create line charts showing performance trends"""
-    # Group by filesystem type
-    fs_types = defaultdict(
-        lambda: {
-            "configs": [],
-            "baseline_insert": [],
-            "dev_insert": [],
-            "baseline_query": [],
-            "dev_query": [],
-        }
-    )
-
-    for result in results:
-        fs = result.get("filesystem", "unknown")
-        block_size = result.get("block_size", "default")
-        config = f"{block_size}"
-
-        if config not in fs_types[fs]["configs"]:
-            fs_types[fs]["configs"].append(config)
-            fs_types[fs]["baseline_insert"].append(0)
-            fs_types[fs]["dev_insert"].append(0)
-            fs_types[fs]["baseline_query"].append(0)
-            fs_types[fs]["dev_query"].append(0)
-
-        idx = fs_types[fs]["configs"].index(config)
-
-        # Calculate average query QPS from all test configurations
-        query_qps = 0
-        if "query_performance" in result:
-            qp = result["query_performance"]
-            total_qps = 0
-            count = 0
-            for topk_key in ["topk_1", "topk_10", "topk_100"]:
-                if topk_key in qp:
-                    for batch_key in ["batch_1", "batch_10", "batch_100"]:
-                        if batch_key in qp[topk_key]:
-                            total_qps += qp[topk_key][batch_key].get(
-                                "queries_per_second", 0
-                            )
-                            count += 1
-            if count > 0:
-                query_qps = total_qps / count
-
-        if result.get("is_dev", False):
-            if "insert_performance" in result:
-                fs_types[fs]["dev_insert"][idx] = result["insert_performance"].get(
-                    "vectors_per_second", 0
-                )
-            fs_types[fs]["dev_query"][idx] = query_qps
-        else:
-            if "insert_performance" in result:
-                fs_types[fs]["baseline_insert"][idx] = result["insert_performance"].get(
-                    "vectors_per_second", 0
-                )
-            fs_types[fs]["baseline_query"][idx] = query_qps
-
-    # Create separate plots for each filesystem
-    for fs, data in fs_types.items():
-        if not data["configs"]:
-            continue
-
-        fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 10))
-
-        x = range(len(data["configs"]))
-
-        # Insert performance
-        ax1.plot(
-            x,
-            data["baseline_insert"],
-            "o-",
-            label="Baseline",
-            linewidth=2,
-            markersize=8,
-        )
-        ax1.plot(
-            x, data["dev_insert"], "s-", label="Development", linewidth=2, markersize=8
-        )
-        ax1.set_xlabel("Configuration")
-        ax1.set_ylabel("Insert QPS")
-        ax1.set_title(f"{fs.upper()} Insert Performance")
-        ax1.set_xticks(x)
-        ax1.set_xticklabels(data["configs"])
-        ax1.legend()
+    # Check if we have multi-filesystem data
+    if len(fs_performance) > 1:
+        # Multi-filesystem mode: separate lines for each filesystem
+        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
+        
+        colors = ["b", "r", "g", "m", "c", "y", "k"]
+        color_idx = 0
+        
+        for config_key, perf_data in fs_performance.items():
+            if not perf_data["insert_rates"]:
+                continue
+                
+            color = colors[color_idx % len(colors)]
+            iterations = list(range(1, len(perf_data["insert_rates"]) + 1))
+            
+            # Plot insert rate  
+            ax1.plot(
+                iterations,
+                perf_data["insert_rates"], 
+                f"{color}-o",
+                linewidth=2,
+                markersize=6,
+                label=config_key.upper(),
+            )
+            
+            # Plot insert time
+            ax2.plot(
+                iterations,
+                perf_data["insert_times"],
+                f"{color}-o", 
+                linewidth=2,
+                markersize=6,
+                label=config_key.upper(),
+            )
+            
+            color_idx += 1
+            
+        ax1.set_xlabel("Iteration")
+        ax1.set_ylabel("Vectors/Second")
+        ax1.set_title("Milvus Insert Rate by Storage Filesystem")
         ax1.grid(True, alpha=0.3)
-
-        # Query performance
-        ax2.plot(
-            x, data["baseline_query"], "o-", label="Baseline", linewidth=2, markersize=8
-        )
-        ax2.plot(
-            x, data["dev_query"], "s-", label="Development", linewidth=2, markersize=8
-        )
-        ax2.set_xlabel("Configuration")
-        ax2.set_ylabel("Query QPS")
-        ax2.set_title(f"{fs.upper()} Query Performance")
-        ax2.set_xticks(x)
-        ax2.set_xticklabels(data["configs"])
-        ax2.legend()
+        ax1.legend()
+        
+        ax2.set_xlabel("Iteration")
+        ax2.set_ylabel("Total Time (seconds)")
+        ax2.set_title("Milvus Insert Time by Storage Filesystem")
         ax2.grid(True, alpha=0.3)
-
-        plt.tight_layout()
-        plt.savefig(os.path.join(output_dir, f"{fs}_performance_trends.png"), dpi=150)
-        plt.close()
-
-
-def create_simple_performance_trends(results, output_dir):
-    """Create a simple performance trends chart for basic Milvus testing"""
-    if not results:
-        return
-
-    # Extract configuration from first result for display
-    config_text = ""
-    if results:
-        first_result = results[0]
-        if "config" in first_result:
-            cfg = first_result["config"]
-            config_text = (
-                f"Test Config:\n"
-                f"• {cfg.get('vector_dataset_size', 'N/A'):,} vectors/iteration\n"
-                f"• {cfg.get('vector_dimensions', 'N/A')} dimensions\n"
-                f"• {cfg.get('index_type', 'N/A')} index"
+        ax2.legend()
+    else:
+        # Single filesystem mode: original behavior
+        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
+        
+        # Extract insert data from single filesystem
+        config_key = list(fs_performance.keys())[0] if fs_performance else None
+        if config_key:
+            perf_data = fs_performance[config_key]
+            iterations = list(range(1, len(perf_data["insert_rates"]) + 1))
+            
+            # Plot insert rate
+            ax1.plot(
+                iterations,
+                perf_data["insert_rates"],
+                "b-o",
+                linewidth=2,
+                markersize=6,
             )
-
-    # Separate baseline and dev results
-    baseline_results = [r for r in results if not r.get("is_dev", False)]
-    dev_results = [r for r in results if r.get("is_dev", False)]
-
-    if not baseline_results and not dev_results:
-        return
-
-    fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 10))
-
-    # Prepare data
-    baseline_insert = []
-    baseline_query = []
-    dev_insert = []
-    dev_query = []
-    labels = []
-
-    # Process baseline results
-    for i, result in enumerate(baseline_results):
-        if "insert_performance" in result:
-            baseline_insert.append(
-                result["insert_performance"].get("vectors_per_second", 0)
+            ax1.set_xlabel("Iteration")
+            ax1.set_ylabel("Vectors/Second") 
+            ax1.set_title("Vector Insert Rate Performance")
+            ax1.grid(True, alpha=0.3)
+            
+            # Plot insert time
+            ax2.plot(
+                iterations,
+                perf_data["insert_times"],
+                "r-o",
+                linewidth=2,
+                markersize=6,
             )
-        else:
-            baseline_insert.append(0)
-
-        # Calculate average query QPS
-        query_qps = 0
-        if "query_performance" in result:
-            qp = result["query_performance"]
-            total_qps = 0
-            count = 0
-            for topk_key in ["topk_1", "topk_10", "topk_100"]:
-                if topk_key in qp:
-                    for batch_key in ["batch_1", "batch_10", "batch_100"]:
-                        if batch_key in qp[topk_key]:
-                            total_qps += qp[topk_key][batch_key].get(
-                                "queries_per_second", 0
-                            )
-                            count += 1
-            if count > 0:
-                query_qps = total_qps / count
-        baseline_query.append(query_qps)
-        labels.append(f"Iteration {i+1}")
-
-    # Process dev results
-    for result in dev_results:
-        if "insert_performance" in result:
-            dev_insert.append(result["insert_performance"].get("vectors_per_second", 0))
-        else:
-            dev_insert.append(0)
-
-        query_qps = 0
-        if "query_performance" in result:
-            qp = result["query_performance"]
-            total_qps = 0
-            count = 0
-            for topk_key in ["topk_1", "topk_10", "topk_100"]:
-                if topk_key in qp:
-                    for batch_key in ["batch_1", "batch_10", "batch_100"]:
-                        if batch_key in qp[topk_key]:
-                            total_qps += qp[topk_key][batch_key].get(
-                                "queries_per_second", 0
-                            )
-                            count += 1
-            if count > 0:
-                query_qps = total_qps / count
-        dev_query.append(query_qps)
-
-    x = range(len(baseline_results) if baseline_results else len(dev_results))
-
-    # Insert performance - with visible markers for all points
-    if baseline_insert:
-        # Line plot with smaller markers
-        ax1.plot(
-            x,
-            baseline_insert,
-            "-",
-            label="Baseline",
-            linewidth=1.5,
-            color="blue",
-            alpha=0.6,
-        )
-        # Add distinct markers for each point
-        ax1.scatter(
-            x,
-            baseline_insert,
-            s=30,
-            color="blue",
-            alpha=0.8,
-            edgecolors="darkblue",
-            linewidth=0.5,
-            zorder=5,
-        )
-    if dev_insert:
-        # Line plot with smaller markers
-        ax1.plot(
-            x[: len(dev_insert)],
-            dev_insert,
-            "-",
-            label="Development",
-            linewidth=1.5,
-            color="red",
-            alpha=0.6,
-        )
-        # Add distinct markers for each point
-        ax1.scatter(
-            x[: len(dev_insert)],
-            dev_insert,
-            s=30,
-            color="red",
-            alpha=0.8,
-            edgecolors="darkred",
-            linewidth=0.5,
-            marker="s",
-            zorder=5,
-        )
-    ax1.set_xlabel("Test Iteration (same configuration, repeated for reliability)")
-    ax1.set_ylabel("Insert QPS (vectors/second)")
-    ax1.set_title("Milvus Insert Performance")
-
-    # Handle x-axis labels to prevent overlap
-    num_points = len(x)
-    if num_points > 20:
-        # Show every 5th label for many iterations
-        step = 5
-        tick_positions = list(range(0, num_points, step))
-        tick_labels = [
-            labels[i] if labels else f"Iteration {i+1}" for i in tick_positions
-        ]
-        ax1.set_xticks(tick_positions)
-        ax1.set_xticklabels(tick_labels, rotation=45, ha="right")
-    elif num_points > 10:
-        # Show every 2nd label for moderate iterations
-        step = 2
-        tick_positions = list(range(0, num_points, step))
-        tick_labels = [
-            labels[i] if labels else f"Iteration {i+1}" for i in tick_positions
-        ]
-        ax1.set_xticks(tick_positions)
-        ax1.set_xticklabels(tick_labels, rotation=45, ha="right")
-    else:
-        # Show all labels for few iterations
-        ax1.set_xticks(x)
-        ax1.set_xticklabels(labels if labels else [f"Iteration {i+1}" for i in x])
-
-    ax1.legend()
-    ax1.grid(True, alpha=0.3)
-
-    # Add configuration text box - compact
-    if config_text:
-        ax1.text(
-            0.02,
-            0.98,
-            config_text,
-            transform=ax1.transAxes,
-            fontsize=6,
-            verticalalignment="top",
-            bbox=dict(boxstyle="round,pad=0.3", facecolor="wheat", alpha=0.85),
-        )
-
-    # Query performance - with visible markers for all points
-    if baseline_query:
-        # Line plot
-        ax2.plot(
-            x,
-            baseline_query,
-            "-",
-            label="Baseline",
-            linewidth=1.5,
-            color="blue",
-            alpha=0.6,
-        )
-        # Add distinct markers for each point
-        ax2.scatter(
-            x,
-            baseline_query,
-            s=30,
-            color="blue",
-            alpha=0.8,
-            edgecolors="darkblue",
-            linewidth=0.5,
-            zorder=5,
-        )
-    if dev_query:
-        # Line plot
-        ax2.plot(
-            x[: len(dev_query)],
-            dev_query,
-            "-",
-            label="Development",
-            linewidth=1.5,
-            color="red",
-            alpha=0.6,
-        )
-        # Add distinct markers for each point
-        ax2.scatter(
-            x[: len(dev_query)],
-            dev_query,
-            s=30,
-            color="red",
-            alpha=0.8,
-            edgecolors="darkred",
-            linewidth=0.5,
-            marker="s",
-            zorder=5,
-        )
-    ax2.set_xlabel("Test Iteration (same configuration, repeated for reliability)")
-    ax2.set_ylabel("Query QPS (queries/second)")
-    ax2.set_title("Milvus Query Performance")
-
-    # Handle x-axis labels to prevent overlap
-    num_points = len(x)
-    if num_points > 20:
-        # Show every 5th label for many iterations
-        step = 5
-        tick_positions = list(range(0, num_points, step))
-        tick_labels = [
-            labels[i] if labels else f"Iteration {i+1}" for i in tick_positions
-        ]
-        ax2.set_xticks(tick_positions)
-        ax2.set_xticklabels(tick_labels, rotation=45, ha="right")
-    elif num_points > 10:
-        # Show every 2nd label for moderate iterations
-        step = 2
-        tick_positions = list(range(0, num_points, step))
-        tick_labels = [
-            labels[i] if labels else f"Iteration {i+1}" for i in tick_positions
-        ]
-        ax2.set_xticks(tick_positions)
-        ax2.set_xticklabels(tick_labels, rotation=45, ha="right")
-    else:
-        # Show all labels for few iterations
-        ax2.set_xticks(x)
-        ax2.set_xticklabels(labels if labels else [f"Iteration {i+1}" for i in x])
-
-    ax2.legend()
-    ax2.grid(True, alpha=0.3)
-
-    # Add configuration text box - compact
-    if config_text:
-        ax2.text(
-            0.02,
-            0.98,
-            config_text,
-            transform=ax2.transAxes,
-            fontsize=6,
-            verticalalignment="top",
-            bbox=dict(boxstyle="round,pad=0.3", facecolor="wheat", alpha=0.85),
-        )
-
+            ax2.set_xlabel("Iteration")
+            ax2.set_ylabel("Total Time (seconds)")
+            ax2.set_title("Vector Insert Time Performance") 
+            ax2.grid(True, alpha=0.3)
+            
     plt.tight_layout()
     plt.savefig(os.path.join(output_dir, "performance_trends.png"), dpi=150)
     plt.close()
 
 
-def generate_summary_statistics(results, output_dir):
-    """Generate summary statistics and save to JSON"""
-    # Get unique filesystems, excluding "unknown"
-    filesystems = set()
-    for r in results:
-        fs = r.get("filesystem", "unknown")
-        if fs != "unknown":
-            filesystems.add(fs)
-
-    summary = {
-        "total_tests": len(results),
-        "filesystems_tested": sorted(list(filesystems)),
-        "configurations": {},
-        "performance_summary": {
-            "best_insert_qps": {"value": 0, "config": ""},
-            "best_query_qps": {"value": 0, "config": ""},
-            "average_insert_qps": 0,
-            "average_query_qps": 0,
-        },
-    }
-
-    # Calculate statistics
-    all_insert_qps = []
-    all_query_qps = []
-
-    for result in results:
-        fs = result.get("filesystem", "unknown")
-        block_size = result.get("block_size", "default")
-        is_dev = "dev" if result.get("is_dev", False) else "baseline"
-        config_name = f"{fs}-{block_size}-{is_dev}"
-
-        # Get actual performance metrics
-        if "insert_performance" in result:
-            insert_qps = result["insert_performance"].get("vectors_per_second", 0)
-        else:
-            insert_qps = 0
-
-        # Calculate average query QPS
-        query_qps = 0
-        if "query_performance" in result:
-            qp = result["query_performance"]
-            total_qps = 0
-            count = 0
-            for topk_key in ["topk_1", "topk_10", "topk_100"]:
-                if topk_key in qp:
-                    for batch_key in ["batch_1", "batch_10", "batch_100"]:
-                        if batch_key in qp[topk_key]:
-                            total_qps += qp[topk_key][batch_key].get(
-                                "queries_per_second", 0
-                            )
-                            count += 1
-            if count > 0:
-                query_qps = total_qps / count
-
-        all_insert_qps.append(insert_qps)
-        all_query_qps.append(query_qps)
-
-        summary["configurations"][config_name] = {
-            "insert_qps": insert_qps,
-            "query_qps": query_qps,
-            "host": result.get("host", "unknown"),
-        }
-
-        if insert_qps > summary["performance_summary"]["best_insert_qps"]["value"]:
-            summary["performance_summary"]["best_insert_qps"] = {
-                "value": insert_qps,
-                "config": config_name,
-            }
-
-        if query_qps > summary["performance_summary"]["best_query_qps"]["value"]:
-            summary["performance_summary"]["best_query_qps"] = {
-                "value": query_qps,
-                "config": config_name,
-            }
-
-    summary["performance_summary"]["average_insert_qps"] = (
-        np.mean(all_insert_qps) if all_insert_qps else 0
-    )
-    summary["performance_summary"]["average_query_qps"] = (
-        np.mean(all_query_qps) if all_query_qps else 0
-    )
-
-    # Save summary
-    with open(os.path.join(output_dir, "summary.json"), "w") as f:
-        json.dump(summary, f, indent=2)
-
-    return summary
-
-
-def create_comprehensive_fs_comparison(results, output_dir):
-    """Create comprehensive filesystem performance comparison including all configurations"""
-    import matplotlib.pyplot as plt
-    import numpy as np
-    from collections import defaultdict
-
-    # Collect data for all filesystem configurations
-    config_data = defaultdict(lambda: {"baseline": [], "dev": []})
-
-    for result in results:
-        fs = result.get("filesystem", "unknown")
-        block_size = result.get("block_size", "")
-
-        # Create configuration label
-        if block_size and block_size != "default":
-            config_label = f"{fs}-{block_size}"
-        else:
-            config_label = fs
-
-        category = "dev" if result.get("is_dev", False) else "baseline"
-
-        # Extract performance metrics
-        if "insert_performance" in result:
-            insert_qps = result["insert_performance"].get("vectors_per_second", 0)
-        else:
-            insert_qps = 0
-
-        config_data[config_label][category].append(insert_qps)
-
-    # Sort configurations for consistent display
-    configs = sorted(config_data.keys())
-
-    # Calculate means and standard deviations
-    baseline_means = []
-    baseline_stds = []
-    dev_means = []
-    dev_stds = []
-
-    for config in configs:
-        baseline_vals = config_data[config]["baseline"]
-        dev_vals = config_data[config]["dev"]
-
-        baseline_means.append(np.mean(baseline_vals) if baseline_vals else 0)
-        baseline_stds.append(np.std(baseline_vals) if baseline_vals else 0)
-        dev_means.append(np.mean(dev_vals) if dev_vals else 0)
-        dev_stds.append(np.std(dev_vals) if dev_vals else 0)
-
-    # Create the plot
-    fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(14, 10))
-
-    x = np.arange(len(configs))
-    width = 0.35
-
-    # Top plot: Absolute performance
-    baseline_bars = ax1.bar(
-        x - width / 2,
-        baseline_means,
-        width,
-        yerr=baseline_stds,
-        label="Baseline",
-        color="#1f77b4",
-        capsize=5,
-    )
-    dev_bars = ax1.bar(
-        x + width / 2,
-        dev_means,
-        width,
-        yerr=dev_stds,
-        label="Development",
-        color="#ff7f0e",
-        capsize=5,
-    )
-
-    ax1.set_ylabel("Insert QPS")
-    ax1.set_title("Vector Database Performance Across Filesystem Configurations")
-    ax1.set_xticks(x)
-    ax1.set_xticklabels(configs, rotation=45, ha="right")
-    ax1.legend()
-    ax1.grid(True, alpha=0.3)
-
-    # Add value labels on bars
-    for bars in [baseline_bars, dev_bars]:
-        for bar in bars:
-            height = bar.get_height()
-            if height > 0:
-                ax1.annotate(
-                    f"{height:.0f}",
-                    xy=(bar.get_x() + bar.get_width() / 2, height),
-                    xytext=(0, 3),
-                    textcoords="offset points",
-                    ha="center",
-                    va="bottom",
-                    fontsize=8,
-                )
-
-    # Bottom plot: Percentage improvement (dev vs baseline)
-    improvements = []
-    for i in range(len(configs)):
-        if baseline_means[i] > 0:
-            improvement = ((dev_means[i] - baseline_means[i]) / baseline_means[i]) * 100
-        else:
-            improvement = 0
-        improvements.append(improvement)
-
-    colors = ["green" if x > 0 else "red" for x in improvements]
-    improvement_bars = ax2.bar(x, improvements, color=colors, alpha=0.7)
-
-    ax2.set_ylabel("Performance Change (%)")
-    ax2.set_title("Development vs Baseline Performance Change")
-    ax2.set_xticks(x)
-    ax2.set_xticklabels(configs, rotation=45, ha="right")
-    ax2.axhline(y=0, color="black", linestyle="-", linewidth=0.5)
-    ax2.grid(True, alpha=0.3)
-
-    # Add percentage labels
-    for bar, val in zip(improvement_bars, improvements):
-        ax2.annotate(
-            f"{val:.1f}%",
-            xy=(bar.get_x() + bar.get_width() / 2, val),
-            xytext=(0, 3 if val > 0 else -15),
-            textcoords="offset points",
-            ha="center",
-            va="bottom" if val > 0 else "top",
-            fontsize=8,
-        )
-
-    plt.tight_layout()
-    plt.savefig(os.path.join(output_dir, "comprehensive_fs_comparison.png"), dpi=150)
-    plt.close()
-
-
-def create_fs_latency_comparison(results, output_dir):
-    """Create latency comparison across filesystems"""
-    import matplotlib.pyplot as plt
-    import numpy as np
-    from collections import defaultdict
-
-    # Collect latency data
-    config_latency = defaultdict(lambda: {"baseline": [], "dev": []})
-
-    for result in results:
-        fs = result.get("filesystem", "unknown")
-        block_size = result.get("block_size", "")
-
-        if block_size and block_size != "default":
-            config_label = f"{fs}-{block_size}"
-        else:
-            config_label = fs
-
-        category = "dev" if result.get("is_dev", False) else "baseline"
-
-        # Extract latency metrics
-        if "query_performance" in result:
-            latency_p99 = result["query_performance"].get("latency_p99_ms", 0)
-        else:
-            latency_p99 = 0
-
-        if latency_p99 > 0:
-            config_latency[config_label][category].append(latency_p99)
-
-    if not config_latency:
+def create_heatmap_analysis(results, output_dir):
+    """Create multi-filesystem heatmap showing query performance"""
+    if not results:
         return
 
-    # Sort configurations
-    configs = sorted(config_latency.keys())
-
-    # Calculate statistics
-    baseline_p99 = []
-    dev_p99 = []
-
-    for config in configs:
-        baseline_vals = config_latency[config]["baseline"]
-        dev_vals = config_latency[config]["dev"]
-
-        baseline_p99.append(np.mean(baseline_vals) if baseline_vals else 0)
-        dev_p99.append(np.mean(dev_vals) if dev_vals else 0)
-
-    # Create plot
-    fig, ax = plt.subplots(figsize=(12, 6))
-
-    x = np.arange(len(configs))
-    width = 0.35
-
-    baseline_bars = ax.bar(
-        x - width / 2, baseline_p99, width, label="Baseline P99", color="#9467bd"
-    )
-    dev_bars = ax.bar(
-        x + width / 2, dev_p99, width, label="Development P99", color="#e377c2"
-    )
-
-    ax.set_xlabel("Filesystem Configuration")
-    ax.set_ylabel("Latency P99 (ms)")
-    ax.set_title("Query Latency (P99) Comparison Across Filesystems")
-    ax.set_xticks(x)
-    ax.set_xticklabels(configs, rotation=45, ha="right")
-    ax.legend()
-    ax.grid(True, alpha=0.3)
-
-    # Add value labels
-    for bars in [baseline_bars, dev_bars]:
-        for bar in bars:
-            height = bar.get_height()
-            if height > 0:
-                ax.annotate(
-                    f"{height:.1f}",
-                    xy=(bar.get_x() + bar.get_width() / 2, height),
-                    xytext=(0, 3),
-                    textcoords="offset points",
-                    ha="center",
-                    va="bottom",
-                    fontsize=8,
-                )
+    # Group data by filesystem configuration
+    fs_performance = defaultdict(lambda: {
+        "query_data": [],
+        "config_key": "",
+    })
 
+    for result in results:
+        fs_type, block_size, config_key = _extract_filesystem_config(result)
+        
+        query_perf = result.get("query_performance", {})
+        for topk, topk_data in query_perf.items():
+            for batch, batch_data in topk_data.items():
+                qps = batch_data.get("queries_per_second", 0)
+                fs_performance[config_key]["query_data"].append({
+                    "topk": topk,
+                    "batch": batch,
+                    "qps": qps,
+                })
+                fs_performance[config_key]["config_key"] = config_key
+
+    # Check if we have multi-filesystem data
+    if len(fs_performance) > 1:
+        # Multi-filesystem mode: separate heatmaps for each filesystem
+        num_fs = len(fs_performance)
+        fig, axes = plt.subplots(1, num_fs, figsize=(5*num_fs, 6))
+        if num_fs == 1:
+            axes = [axes]
+        
+        # Define common structure for consistency
+        topk_order = ["topk_1", "topk_10", "topk_100"]
+        batch_order = ["batch_1", "batch_10", "batch_100"]
+        
+        for idx, (config_key, perf_data) in enumerate(fs_performance.items()):
+            # Create matrix for this filesystem
+            matrix = np.zeros((len(topk_order), len(batch_order)))
+            
+            # Fill matrix with data
+            query_dict = {}
+            for item in perf_data["query_data"]:
+                query_dict[(item["topk"], item["batch"])] = item["qps"]
+                
+            for i, topk in enumerate(topk_order):
+                for j, batch in enumerate(batch_order):
+                    matrix[i, j] = query_dict.get((topk, batch), 0)
+            
+            # Plot heatmap
+            im = axes[idx].imshow(matrix, cmap='viridis', aspect='auto')
+            axes[idx].set_title(f"{config_key.upper()} Query Performance")
+            axes[idx].set_xticks(range(len(batch_order)))
+            axes[idx].set_xticklabels([b.replace("batch_", "Batch ") for b in batch_order])
+            axes[idx].set_yticks(range(len(topk_order)))
+            axes[idx].set_yticklabels([t.replace("topk_", "Top-") for t in topk_order])
+            
+            # Add text annotations
+            for i in range(len(topk_order)):
+                for j in range(len(batch_order)):
+                    axes[idx].text(j, i, f'{matrix[i, j]:.0f}',
+                                 ha="center", va="center", color="white", fontweight="bold")
+            
+            # Add colorbar
+            cbar = plt.colorbar(im, ax=axes[idx])
+            cbar.set_label('Queries Per Second (QPS)')
+    else:
+        # Single filesystem mode
+        fig, ax = plt.subplots(1, 1, figsize=(8, 6))
+        
+        if fs_performance:
+            config_key = list(fs_performance.keys())[0]
+            perf_data = fs_performance[config_key]
+            
+            # Create matrix
+            topk_order = ["topk_1", "topk_10", "topk_100"]
+            batch_order = ["batch_1", "batch_10", "batch_100"]
+            matrix = np.zeros((len(topk_order), len(batch_order)))
+            
+            # Fill matrix with data
+            query_dict = {}
+            for item in perf_data["query_data"]:
+                query_dict[(item["topk"], item["batch"])] = item["qps"]
+                
+            for i, topk in enumerate(topk_order):
+                for j, batch in enumerate(batch_order):
+                    matrix[i, j] = query_dict.get((topk, batch), 0)
+            
+            # Plot heatmap
+            im = ax.imshow(matrix, cmap='viridis', aspect='auto')
+            ax.set_title("Milvus Query Performance Heatmap")
+            ax.set_xticks(range(len(batch_order)))
+            ax.set_xticklabels([b.replace("batch_", "Batch ") for b in batch_order])
+            ax.set_yticks(range(len(topk_order)))
+            ax.set_yticklabels([t.replace("topk_", "Top-") for t in topk_order])
+            
+            # Add text annotations
+            for i in range(len(topk_order)):
+                for j in range(len(batch_order)):
+                    ax.text(j, i, f'{matrix[i, j]:.0f}',
+                           ha="center", va="center", color="white", fontweight="bold")
+            
+            # Add colorbar
+            cbar = plt.colorbar(im, ax=ax)
+            cbar.set_label('Queries Per Second (QPS)')
+    
     plt.tight_layout()
-    plt.savefig(os.path.join(output_dir, "filesystem_latency_comparison.png"), dpi=150)
+    plt.savefig(os.path.join(output_dir, "performance_heatmap.png"), dpi=150, bbox_inches="tight")
     plt.close()
 
 
@@ -1119,56 +340,23 @@ def main():
     results_dir = sys.argv[1]
     output_dir = sys.argv[2]
 
-    # Create output directory
+    # Ensure output directory exists
     os.makedirs(output_dir, exist_ok=True)
 
     # Load results
     results = load_results(results_dir)
-
     if not results:
-        print("No results found to analyze")
+        print(f"No valid results found in {results_dir}")
         sys.exit(1)
 
     print(f"Loaded {len(results)} result files")
 
     # Generate graphs
-    print("Generating performance heatmap...")
-    create_heatmap_analysis(results, output_dir)
-
-    print("Generating performance trends...")
     create_simple_performance_trends(results, output_dir)
+    create_heatmap_analysis(results, output_dir)
 
-    print("Generating summary statistics...")
-    summary = generate_summary_statistics(results, output_dir)
-
-    # Check if we have multiple filesystems to compare
-    filesystems = set(r.get("filesystem", "unknown") for r in results)
-    if len(filesystems) > 1:
-        print("Generating filesystem comparison chart...")
-        create_filesystem_comparison_chart(results, output_dir)
-
-        print("Generating comprehensive filesystem comparison...")
-        create_comprehensive_fs_comparison(results, output_dir)
-
-        print("Generating filesystem latency comparison...")
-        create_fs_latency_comparison(results, output_dir)
-
-        # Check if we have XFS results with different block sizes
-        xfs_results = [r for r in results if r.get("filesystem") == "xfs"]
-        block_sizes = set(r.get("block_size", "unknown") for r in xfs_results)
-        if len(block_sizes) > 1:
-            print("Generating XFS block size analysis...")
-            create_block_size_analysis(results, output_dir)
-
-    print(f"\nAnalysis complete! Graphs saved to {output_dir}")
-    print(f"Total configurations tested: {summary['total_tests']}")
-    print(
-        f"Best insert QPS: {summary['performance_summary']['best_insert_qps']['value']} ({summary['performance_summary']['best_insert_qps']['config']})"
-    )
-    print(
-        f"Best query QPS: {summary['performance_summary']['best_query_qps']['value']} ({summary['performance_summary']['best_query_qps']['config']})"
-    )
+    print(f"Graphs generated in {output_dir}")
 
 
 if __name__ == "__main__":
-    main()
+    main()
\ No newline at end of file
diff --git a/workflows/ai/scripts/generate_html_report.py b/workflows/ai/scripts/generate_html_report.py
index 3aa8342f..01ec734c 100755
--- a/workflows/ai/scripts/generate_html_report.py
+++ b/workflows/ai/scripts/generate_html_report.py
@@ -180,7 +180,7 @@ HTML_TEMPLATE = """
 </head>
 <body>
     <div class="header">
-        <h1>AI Vector Database Benchmark Results</h1>
+        <h1>Milvus Vector Database Benchmark Results</h1>
         <div class="subtitle">Generated on {timestamp}</div>
     </div>
     
@@ -238,11 +238,13 @@ HTML_TEMPLATE = """
     </div>
     
     <div id="detailed-results" class="section">
-        <h2>Detailed Results Table</h2>
+        <h2>Milvus Performance by Storage Filesystem</h2>
+        <p>This table shows how Milvus vector database performs when its data is stored on different filesystem types and configurations.</p>
         <table class="results-table">
             <thead>
                 <tr>
-                    <th>Host</th>
+                    <th>Filesystem</th>
+                    <th>Configuration</th>
                     <th>Type</th>
                     <th>Insert QPS</th>
                     <th>Query QPS</th>
@@ -293,27 +295,53 @@ def load_results(results_dir):
                 # Get filesystem from JSON data
                 fs_type = data.get("filesystem", None)
 
-                # If not in JSON, try to parse from filename (backwards compatibility)
-                if not fs_type and "debian13-ai" in filename:
-                    host_parts = (
-                        filename.replace("results_debian13-ai-", "")
-                        .replace("_1.json", "")
+                # Always try to parse from filename first since JSON data might be wrong
+                if "-ai-" in filename:
+                    # Handle both debian13-ai- and prod-ai- prefixes
+                    cleaned_filename = filename.replace("results_", "")
+
+                    # Extract the part after -ai-
+                    if "debian13-ai-" in cleaned_filename:
+                        host_part = cleaned_filename.replace("debian13-ai-", "")
+                    elif "prod-ai-" in cleaned_filename:
+                        host_part = cleaned_filename.replace("prod-ai-", "")
+                    else:
+                        # Generic extraction
+                        ai_index = cleaned_filename.find("-ai-")
+                        if ai_index != -1:
+                            host_part = cleaned_filename[ai_index + 4 :]  # Skip "-ai-"
+                        else:
+                            host_part = cleaned_filename
+
+                    # Remove file extensions and dev suffix
+                    host_part = (
+                        host_part.replace("_1.json", "")
                         .replace("_2.json", "")
                         .replace("_3.json", "")
-                        .split("-")
+                        .replace("-dev", "")
                     )
-                    if "xfs" in host_parts[0]:
+
+                    # Parse filesystem type and block size
+                    if host_part.startswith("xfs-"):
                         fs_type = "xfs"
-                        block_size = host_parts[1] if len(host_parts) > 1 else "4k"
-                    elif "ext4" in host_parts[0]:
+                        # Extract block size: xfs-4k-4ks -> 4k
+                        parts = host_part.split("-")
+                        if len(parts) >= 2:
+                            block_size = parts[1]  # 4k, 16k, 32k, 64k
+                        else:
+                            block_size = "4k"
+                    elif host_part.startswith("ext4-"):
                         fs_type = "ext4"
-                        block_size = host_parts[1] if len(host_parts) > 1 else "4k"
-                    elif "btrfs" in host_parts[0]:
+                        parts = host_part.split("-")
+                        block_size = parts[1] if len(parts) > 1 else "4k"
+                    elif host_part.startswith("btrfs"):
                         fs_type = "btrfs"
                         block_size = "default"
                     else:
-                        fs_type = "unknown"
-                        block_size = "unknown"
+                        # Fallback to JSON data if available
+                        if not fs_type:
+                            fs_type = "unknown"
+                            block_size = "unknown"
                 else:
                     # Set appropriate block size based on filesystem
                     if fs_type == "btrfs":
@@ -371,12 +399,36 @@ def generate_table_rows(results, best_configs):
         if config_key in best_configs:
             row_class += " best-config"
 
+        # Generate descriptive labels showing Milvus is running on this filesystem
+        if result["filesystem"] == "xfs" and result["block_size"] != "default":
+            storage_label = f"XFS {result['block_size'].upper()}"
+            config_details = f"Block size: {result['block_size']}, Milvus data on XFS"
+        elif result["filesystem"] == "ext4":
+            storage_label = "EXT4"
+            if "bigalloc" in result.get("host", "").lower():
+                config_details = "EXT4 with bigalloc, Milvus data on ext4"
+            else:
+                config_details = (
+                    f"Block size: {result['block_size']}, Milvus data on ext4"
+                )
+        elif result["filesystem"] == "btrfs":
+            storage_label = "BTRFS"
+            config_details = "Default Btrfs settings, Milvus data on Btrfs"
+        else:
+            storage_label = result["filesystem"].upper()
+            config_details = f"Milvus data on {result['filesystem']}"
+
+        # Extract clean node identifier from hostname
+        node_name = result["host"].replace("results_", "").replace(".json", "")
+
         row = f"""
         <tr class="{row_class}">
-            <td>{result['host']}</td>
+            <td><strong>{storage_label}</strong></td>
+            <td>{config_details}</td>
             <td>{result['type']}</td>
             <td>{result['insert_qps']:,}</td>
             <td>{result['query_qps']:,}</td>
+            <td><code>{node_name}</code></td>
             <td>{result['timestamp']}</td>
         </tr>
         """
@@ -483,8 +535,8 @@ def generate_html_report(results_dir, graphs_dir, output_path):
             <li><a href="#block-size-analysis">Block Size Analysis</a></li>"""
 
         filesystem_comparison_section = """<div id="filesystem-comparison" class="section">
-        <h2>Filesystem Performance Comparison</h2>
-        <p>Comparison of vector database performance across different filesystems, showing both baseline and development kernel results.</p>
+        <h2>Milvus Storage Filesystem Comparison</h2>
+        <p>Comparison of Milvus vector database performance when its data is stored on different filesystem types (XFS, ext4, Btrfs) with various configurations.</p>
         <div class="graph-container">
             <img src="graphs/filesystem_comparison.png" alt="Filesystem Comparison">
         </div>
@@ -499,9 +551,9 @@ def generate_html_report(results_dir, graphs_dir, output_path):
     </div>"""
 
         # Multi-fs mode: show filesystem info
-        fourth_card_title = "Filesystems Tested"
+        fourth_card_title = "Storage Filesystems"
         fourth_card_value = str(len(filesystems_tested))
-        fourth_card_label = ", ".join(filesystems_tested).upper()
+        fourth_card_label = ", ".join(filesystems_tested).upper() + " for Milvus Data"
     else:
         # Single filesystem mode - hide multi-fs sections
         filesystem_nav_items = ""
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] ai: add multi-filesystem testing support for Milvus benchmarks
  2025-08-27  9:32 ` [PATCH 2/2] ai: add multi-filesystem testing support for Milvus benchmarks Luis Chamberlain
@ 2025-08-27 14:47   ` Chuck Lever
  2025-08-27 19:24     ` Luis Chamberlain
  2025-09-01 20:11   ` Daniel Gomez
  1 sibling, 1 reply; 8+ messages in thread
From: Chuck Lever @ 2025-08-27 14:47 UTC (permalink / raw)
  To: Luis Chamberlain, Daniel Gomez, hui81.qi, kundan.kumar, kdevops

On 8/27/25 5:32 AM, Luis Chamberlain wrote:
> Extend the AI workflow to support testing Milvus across multiple
> filesystem configurations simultaneously. This enables comprehensive
> performance comparisons between different filesystems and their
> configuration options.
> 
> Key features:
> - Dynamic node generation based on enabled filesystem configurations
> - Support for XFS, EXT4, and BTRFS with various mount options
> - Per-filesystem result collection and analysis
> - A/B testing across all filesystem configurations
> - Automated comparison graphs between filesystems
> 
> Filesystem configurations:
> - XFS: default, nocrc, bigtime with various block sizes (512, 1k, 2k, 4k)
> - EXT4: default, nojournal, bigalloc configurations
> - BTRFS: default, zlib, lzo, zstd compression options
> 
> Defconfigs:
> - ai-milvus-multifs: Test 7 filesystem configs with A/B testing
> - ai-milvus-multifs-distro: Test with distribution kernels
> - ai-milvus-multifs-extended: Extended configs (14 filesystems total)
> 
> Node generation:
> The system dynamically generates nodes based on enabled filesystem
> configurations. With A/B testing enabled, this creates baseline and
> dev nodes for each filesystem (e.g., debian13-ai-xfs-4k and
> debian13-ai-xfs-4k-dev).
> 
> Usage:
>   make defconfig-ai-milvus-multifs
>   make bringup    # Creates nodes for each filesystem
>   make ai         # Setup infrastructure on all nodes
>   make ai-tests   # Run benchmarks on all filesystems
>   make ai-results # Collect and compare results
> 
> This enables systematic evaluation of how different filesystems and
> their configurations affect vector database performance.
> 
> Generated-by: Claude AI
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

Hey Luis -

I'm looking at adding "AI optimized" and "GPU optimized" machine size
choices in the cloud provider Kconfigs. I assume this set can take
advantage of those. Any suggestions, or let me know if I'm way off base.


> ---
>  .github/workflows/docker-tests.yml            |    6 +
>  Makefile                                      |    2 +-
>  defconfigs/ai-milvus-multifs                  |   67 +
>  defconfigs/ai-milvus-multifs-distro           |  109 ++
>  defconfigs/ai-milvus-multifs-extended         |  108 ++
>  docs/ai/vector-databases/README.md            |    1 -
>  playbooks/ai_install.yml                      |    6 +
>  playbooks/ai_multifs.yml                      |   24 +
>  .../host_vars/debian13-ai-xfs-4k-4ks.yml      |   10 -
>  .../files/analyze_results.py                  | 1132 +++++++++++---
>  .../files/generate_better_graphs.py           |   16 +-
>  .../files/generate_graphs.py                  |  888 ++++-------
>  .../files/generate_html_report.py             |  263 +++-
>  .../roles/ai_collect_results/tasks/main.yml   |   42 +-
>  .../templates/analysis_config.json.j2         |    2 +-
>  .../roles/ai_milvus_storage/tasks/main.yml    |  161 ++
>  .../tasks/generate_comparison.yml             |  279 ++++
>  playbooks/roles/ai_multifs_run/tasks/main.yml |   23 +
>  .../tasks/run_single_filesystem.yml           |  104 ++
>  .../templates/milvus_config.json.j2           |   42 +
>  .../roles/ai_multifs_setup/defaults/main.yml  |   49 +
>  .../roles/ai_multifs_setup/tasks/main.yml     |   70 +
>  .../files/milvus_benchmark.py                 |  164 +-
>  playbooks/roles/gen_hosts/tasks/main.yml      |   19 +
>  .../roles/gen_hosts/templates/fstests.j2      |    2 +
>  playbooks/roles/gen_hosts/templates/gitr.j2   |    2 +
>  playbooks/roles/gen_hosts/templates/hosts.j2  |   35 +-
>  .../roles/gen_hosts/templates/nfstest.j2      |    2 +
>  playbooks/roles/gen_hosts/templates/pynfs.j2  |    2 +
>  playbooks/roles/gen_nodes/tasks/main.yml      |   90 ++
>  .../roles/guestfs/tasks/bringup/main.yml      |   15 +
>  scripts/guestfs.Makefile                      |    2 +-
>  workflows/ai/Kconfig                          |   13 +
>  workflows/ai/Kconfig.fs                       |  118 ++
>  workflows/ai/Kconfig.multifs                  |  184 +++
>  workflows/ai/scripts/analysis_config.json     |    2 +-
>  workflows/ai/scripts/analyze_results.py       | 1132 +++++++++++---
>  workflows/ai/scripts/generate_graphs.py       | 1372 ++++-------------
>  workflows/ai/scripts/generate_html_report.py  |   94 +-
>  39 files changed, 4356 insertions(+), 2296 deletions(-)
>  create mode 100644 defconfigs/ai-milvus-multifs
>  create mode 100644 defconfigs/ai-milvus-multifs-distro
>  create mode 100644 defconfigs/ai-milvus-multifs-extended
>  create mode 100644 playbooks/ai_multifs.yml
>  delete mode 100644 playbooks/host_vars/debian13-ai-xfs-4k-4ks.yml
>  create mode 100644 playbooks/roles/ai_milvus_storage/tasks/main.yml
>  create mode 100644 playbooks/roles/ai_multifs_run/tasks/generate_comparison.yml
>  create mode 100644 playbooks/roles/ai_multifs_run/tasks/main.yml
>  create mode 100644 playbooks/roles/ai_multifs_run/tasks/run_single_filesystem.yml
>  create mode 100644 playbooks/roles/ai_multifs_run/templates/milvus_config.json.j2
>  create mode 100644 playbooks/roles/ai_multifs_setup/defaults/main.yml
>  create mode 100644 playbooks/roles/ai_multifs_setup/tasks/main.yml
>  create mode 100644 workflows/ai/Kconfig.fs
>  create mode 100644 workflows/ai/Kconfig.multifs
> 
> diff --git a/.github/workflows/docker-tests.yml b/.github/workflows/docker-tests.yml
> index c0e0d03d..adea1182 100644
> --- a/.github/workflows/docker-tests.yml
> +++ b/.github/workflows/docker-tests.yml
> @@ -53,3 +53,9 @@ jobs:
>            echo "Running simple make targets on ${{ matrix.distro_container }} environment"
>            make mrproper
>  
> +      - name: Test fio-tests defconfig
> +        run: |
> +          echo "Testing fio-tests CI configuration"
> +          make defconfig-fio-tests-ci
> +          make
> +          echo "Configuration test passed for fio-tests"
> diff --git a/Makefile b/Makefile
> index 8755577e..83c67340 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -226,7 +226,7 @@ include scripts/bringup.Makefile
>  endif
>  
>  DEFAULT_DEPS += $(ANSIBLE_INVENTORY_FILE)
> -$(ANSIBLE_INVENTORY_FILE): .config $(ANSIBLE_CFG_FILE) $(KDEVOPS_HOSTS_TEMPLATE)
> +$(ANSIBLE_INVENTORY_FILE): .config $(ANSIBLE_CFG_FILE) $(KDEVOPS_HOSTS_TEMPLATE) $(KDEVOPS_NODES)
>  	$(Q)ANSIBLE_LOCALHOST_WARNING=False ANSIBLE_INVENTORY_UNPARSED_WARNING=False \
>  		ansible-playbook $(ANSIBLE_VERBOSE) \
>  		$(KDEVOPS_PLAYBOOKS_DIR)/gen_hosts.yml \
> diff --git a/defconfigs/ai-milvus-multifs b/defconfigs/ai-milvus-multifs
> new file mode 100644
> index 00000000..7e5ad971
> --- /dev/null
> +++ b/defconfigs/ai-milvus-multifs
> @@ -0,0 +1,67 @@
> +CONFIG_GUESTFS=y
> +CONFIG_LIBVIRT=y
> +
> +# Disable mirror features for CI/testing
> +# CONFIG_ENABLE_LOCAL_LINUX_MIRROR is not set
> +# CONFIG_USE_LOCAL_LINUX_MIRROR is not set
> +# CONFIG_INSTALL_ONLY_GIT_DAEMON is not set
> +# CONFIG_MIRROR_INSTALL is not set
> +
> +CONFIG_WORKFLOWS=y
> +CONFIG_WORKFLOW_LINUX_CUSTOM=y
> +
> +CONFIG_BOOTLINUX=y
> +CONFIG_BOOTLINUX_9P=y
> +
> +# Enable A/B testing with different kernel references
> +CONFIG_KDEVOPS_BASELINE_AND_DEV=y
> +CONFIG_BOOTLINUX_AB_DIFFERENT_REF=y
> +
> +# AI workflow configuration
> +CONFIG_WORKFLOWS_TESTS=y
> +CONFIG_WORKFLOWS_LINUX_TESTS=y
> +CONFIG_WORKFLOWS_DEDICATED_WORKFLOW=y
> +CONFIG_KDEVOPS_WORKFLOW_DEDICATE_AI=y
> +
> +# Vector database configuration
> +CONFIG_AI_TESTS_VECTOR_DATABASE=y
> +CONFIG_AI_VECTOR_DB_MILVUS=y
> +CONFIG_AI_VECTOR_DB_MILVUS_DOCKER=y
> +
> +# Enable multi-filesystem testing
> +CONFIG_AI_MULTIFS_ENABLE=y
> +CONFIG_AI_ENABLE_MULTIFS_TESTING=y
> +
> +# Enable dedicated Milvus storage with node-based filesystem
> +CONFIG_AI_MILVUS_STORAGE_ENABLE=y
> +CONFIG_AI_MILVUS_USE_NODE_FS=y
> +
> +# Test XFS with different block sizes
> +CONFIG_AI_MULTIFS_TEST_XFS=y
> +CONFIG_AI_MULTIFS_XFS_4K_4KS=y
> +CONFIG_AI_MULTIFS_XFS_16K_4KS=y
> +CONFIG_AI_MULTIFS_XFS_32K_4KS=y
> +CONFIG_AI_MULTIFS_XFS_64K_4KS=y
> +
> +# Test EXT4 configurations
> +CONFIG_AI_MULTIFS_TEST_EXT4=y
> +CONFIG_AI_MULTIFS_EXT4_4K=y
> +CONFIG_AI_MULTIFS_EXT4_16K_BIGALLOC=y
> +
> +# Test BTRFS
> +CONFIG_AI_MULTIFS_TEST_BTRFS=y
> +CONFIG_AI_MULTIFS_BTRFS_DEFAULT=y
> +
> +# Performance settings
> +CONFIG_AI_BENCHMARK_ENABLE_GRAPHING=y
> +CONFIG_AI_BENCHMARK_ITERATIONS=5
> +
> +# Dataset configuration for benchmarking
> +CONFIG_AI_VECTOR_DB_MILVUS_DATASET_SIZE=100000
> +CONFIG_AI_VECTOR_DB_MILVUS_BATCH_SIZE=10000
> +CONFIG_AI_VECTOR_DB_MILVUS_NUM_QUERIES=10000
> +
> +# Container configuration
> +CONFIG_AI_VECTOR_DB_MILVUS_CONTAINER_IMAGE_2_5=y
> +CONFIG_AI_VECTOR_DB_MILVUS_MEMORY_LIMIT="8g"
> +CONFIG_AI_VECTOR_DB_MILVUS_CPU_LIMIT="4.0"
> \ No newline at end of file
> diff --git a/defconfigs/ai-milvus-multifs-distro b/defconfigs/ai-milvus-multifs-distro
> new file mode 100644
> index 00000000..fb71f2b5
> --- /dev/null
> +++ b/defconfigs/ai-milvus-multifs-distro
> @@ -0,0 +1,109 @@
> +# AI Multi-Filesystem Performance Testing Configuration (Distro Kernel)
> +# This configuration enables testing AI workloads across multiple filesystem
> +# configurations including XFS (4k and 16k block sizes), ext4 (4k and 16k bigalloc),
> +# and btrfs (default profile) using the distribution kernel without A/B testing.
> +
> +# Base virtualization setup
> +CONFIG_LIBVIRT=y
> +CONFIG_LIBVIRT_MACHINE_TYPE_Q35=y
> +CONFIG_LIBVIRT_STORAGE_POOL_PATH="/opt/kdevops/libvirt"
> +CONFIG_LIBVIRT_ENABLE_LARGEIO=y
> +CONFIG_LIBVIRT_EXTRA_STORAGE_DRIVE_NVME=y
> +CONFIG_LIBVIRT_EXTRA_STORAGE_DRIVE_SIZE="50GiB"
> +
> +# Network configuration
> +CONFIG_LIBVIRT_ENABLE_BRIDGED_NETWORKING=y
> +CONFIG_LIBVIRT_NET_NAME="kdevops"
> +
> +# Host configuration
> +CONFIG_KDEVOPS_HOSTS_TEMPLATE="hosts.j2"
> +CONFIG_VAGRANT_NVME_DISK_SIZE="50GiB"
> +
> +# Base system requirements
> +CONFIG_WORKFLOWS=y
> +CONFIG_WORKFLOWS_TESTS=y
> +CONFIG_WORKFLOWS_LINUX_TESTS=y
> +CONFIG_WORKFLOWS_DEDICATED_WORKFLOW=y
> +CONFIG_KDEVOPS_WORKFLOW_DEDICATE_AI=y
> +
> +# AI Workflow Configuration
> +CONFIG_KDEVOPS_WORKFLOW_ENABLE_AI=y
> +CONFIG_AI_TESTS_VECTOR_DATABASE=y
> +CONFIG_AI_MILVUS_DOCKER=y
> +CONFIG_AI_VECTOR_DB_TYPE_MILVUS=y
> +
> +# Milvus Configuration
> +CONFIG_AI_MILVUS_HOST="localhost"
> +CONFIG_AI_MILVUS_PORT=19530
> +CONFIG_AI_MILVUS_DATABASE_NAME="ai_benchmark"
> +
> +# Test Parameters (optimized for multi-fs testing)
> +CONFIG_AI_BENCHMARK_ITERATIONS=3
> +CONFIG_AI_DATASET_1M=y
> +CONFIG_AI_VECTOR_DIM_128=y
> +CONFIG_AI_BENCHMARK_RUNTIME="180"
> +CONFIG_AI_BENCHMARK_WARMUP_TIME="30"
> +
> +# Query patterns
> +CONFIG_AI_BENCHMARK_QUERY_TOPK_1=y
> +CONFIG_AI_BENCHMARK_QUERY_TOPK_10=y
> +
> +# Batch sizes
> +CONFIG_AI_BENCHMARK_BATCH_1=y
> +CONFIG_AI_BENCHMARK_BATCH_10=y
> +
> +# Index configuration
> +CONFIG_AI_INDEX_HNSW=y
> +CONFIG_AI_INDEX_TYPE="HNSW"
> +CONFIG_AI_INDEX_HNSW_M=16
> +CONFIG_AI_INDEX_HNSW_EF_CONSTRUCTION=200
> +CONFIG_AI_INDEX_HNSW_EF=64
> +
> +# Results and visualization
> +CONFIG_AI_BENCHMARK_RESULTS_DIR="/data/ai-benchmark"
> +CONFIG_AI_BENCHMARK_ENABLE_GRAPHING=y
> +CONFIG_AI_BENCHMARK_GRAPH_FORMAT="png"
> +CONFIG_AI_BENCHMARK_GRAPH_DPI=300
> +CONFIG_AI_BENCHMARK_GRAPH_THEME="default"
> +
> +# Multi-filesystem testing configuration
> +CONFIG_AI_ENABLE_MULTIFS_TESTING=y
> +CONFIG_AI_MULTIFS_RESULTS_DIR="/data/ai-multifs-benchmark"
> +
> +# Enable dedicated Milvus storage with node-based filesystem
> +CONFIG_AI_MILVUS_STORAGE_ENABLE=y
> +CONFIG_AI_MILVUS_USE_NODE_FS=y
> +CONFIG_AI_MILVUS_DEVICE="/dev/disk/by-id/nvme-QEMU_NVMe_Ctrl_kdevops3"
> +CONFIG_AI_MILVUS_MOUNT_POINT="/data/milvus"
> +
> +# XFS configurations
> +CONFIG_AI_MULTIFS_TEST_XFS=y
> +CONFIG_AI_MULTIFS_XFS_4K_4KS=y
> +CONFIG_AI_MULTIFS_XFS_16K_4KS=y
> +CONFIG_AI_MULTIFS_XFS_32K_4KS=y
> +CONFIG_AI_MULTIFS_XFS_64K_4KS=y
> +
> +# ext4 configurations
> +CONFIG_AI_MULTIFS_TEST_EXT4=y
> +CONFIG_AI_MULTIFS_EXT4_4K=y
> +CONFIG_AI_MULTIFS_EXT4_16K_BIGALLOC=y
> +
> +# btrfs configurations
> +CONFIG_AI_MULTIFS_TEST_BTRFS=y
> +CONFIG_AI_MULTIFS_BTRFS_DEFAULT=y
> +
> +# Standard filesystem configuration (for comparison)
> +CONFIG_AI_FILESYSTEM_XFS=y
> +CONFIG_AI_FILESYSTEM="xfs"
> +CONFIG_AI_FSTYPE="xfs"
> +CONFIG_AI_XFS_MKFS_OPTS="-f -s size=4096"
> +CONFIG_AI_XFS_MOUNT_OPTS="rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota"
> +
> +# Use distribution kernel (no kernel building)
> +# CONFIG_BOOTLINUX is not set
> +
> +# Memory configuration
> +CONFIG_LIBVIRT_MEM_MB=16384
> +
> +# Disable A/B testing to use single baseline configuration
> +# CONFIG_KDEVOPS_BASELINE_AND_DEV is not set
> diff --git a/defconfigs/ai-milvus-multifs-extended b/defconfigs/ai-milvus-multifs-extended
> new file mode 100644
> index 00000000..7886c8c4
> --- /dev/null
> +++ b/defconfigs/ai-milvus-multifs-extended
> @@ -0,0 +1,108 @@
> +# AI Extended Multi-Filesystem Performance Testing Configuration (Distro Kernel)
> +# This configuration enables testing AI workloads across multiple filesystem
> +# configurations including XFS (4k, 16k, 32k, 64k block sizes), ext4 (4k and 16k bigalloc),
> +# and btrfs (default profile) using the distribution kernel without A/B testing.
> +
> +# Base virtualization setup
> +CONFIG_LIBVIRT=y
> +CONFIG_LIBVIRT_MACHINE_TYPE_Q35=y
> +CONFIG_LIBVIRT_STORAGE_POOL_PATH="/opt/kdevops/libvirt"
> +CONFIG_LIBVIRT_ENABLE_LARGEIO=y
> +CONFIG_LIBVIRT_EXTRA_STORAGE_DRIVE_NVME=y
> +CONFIG_LIBVIRT_EXTRA_STORAGE_DRIVE_SIZE="50GiB"
> +
> +# Network configuration
> +CONFIG_LIBVIRT_ENABLE_BRIDGED_NETWORKING=y
> +CONFIG_LIBVIRT_NET_NAME="kdevops"
> +
> +# Host configuration
> +CONFIG_KDEVOPS_HOSTS_TEMPLATE="hosts.j2"
> +CONFIG_VAGRANT_NVME_DISK_SIZE="50GiB"
> +
> +# Base system requirements
> +CONFIG_WORKFLOWS=y
> +CONFIG_WORKFLOWS_TESTS=y
> +CONFIG_WORKFLOWS_LINUX_TESTS=y
> +CONFIG_WORKFLOWS_DEDICATED_WORKFLOW=y
> +CONFIG_KDEVOPS_WORKFLOW_DEDICATE_AI=y
> +
> +# AI Workflow Configuration
> +CONFIG_KDEVOPS_WORKFLOW_ENABLE_AI=y
> +CONFIG_AI_TESTS_VECTOR_DATABASE=y
> +CONFIG_AI_VECTOR_DB_MILVUS=y
> +CONFIG_AI_VECTOR_DB_MILVUS_DOCKER=y
> +
> +# Test Parameters (optimized for multi-fs testing)
> +CONFIG_AI_BENCHMARK_ITERATIONS=3
> +CONFIG_AI_DATASET_1M=y
> +CONFIG_AI_VECTOR_DIM_128=y
> +CONFIG_AI_BENCHMARK_RUNTIME="180"
> +CONFIG_AI_BENCHMARK_WARMUP_TIME="30"
> +
> +# Query patterns
> +CONFIG_AI_BENCHMARK_QUERY_TOPK_1=y
> +CONFIG_AI_BENCHMARK_QUERY_TOPK_10=y
> +
> +# Batch sizes
> +CONFIG_AI_BENCHMARK_BATCH_1=y
> +CONFIG_AI_BENCHMARK_BATCH_10=y
> +
> +# Index configuration
> +CONFIG_AI_INDEX_HNSW=y
> +CONFIG_AI_INDEX_TYPE="HNSW"
> +CONFIG_AI_INDEX_HNSW_M=16
> +CONFIG_AI_INDEX_HNSW_EF_CONSTRUCTION=200
> +CONFIG_AI_INDEX_HNSW_EF=64
> +
> +# Results and visualization
> +CONFIG_AI_BENCHMARK_RESULTS_DIR="/data/ai-benchmark"
> +CONFIG_AI_BENCHMARK_ENABLE_GRAPHING=y
> +CONFIG_AI_BENCHMARK_GRAPH_FORMAT="png"
> +CONFIG_AI_BENCHMARK_GRAPH_DPI=300
> +CONFIG_AI_BENCHMARK_GRAPH_THEME="default"
> +
> +# Multi-filesystem testing configuration
> +CONFIG_AI_MULTIFS_ENABLE=y
> +CONFIG_AI_ENABLE_MULTIFS_TESTING=y
> +CONFIG_AI_MULTIFS_RESULTS_DIR="/data/ai-multifs-benchmark"
> +
> +# Enable dedicated Milvus storage with node-based filesystem
> +CONFIG_AI_MILVUS_STORAGE_ENABLE=y
> +CONFIG_AI_MILVUS_USE_NODE_FS=y
> +CONFIG_AI_MILVUS_DEVICE="/dev/disk/by-id/nvme-QEMU_NVMe_Ctrl_kdevops3"
> +CONFIG_AI_MILVUS_MOUNT_POINT="/data/milvus"
> +
> +# Extended XFS configurations (4k, 16k, 32k, 64k block sizes)
> +CONFIG_AI_MULTIFS_TEST_XFS=y
> +CONFIG_AI_MULTIFS_XFS_4K_4KS=y
> +CONFIG_AI_MULTIFS_XFS_16K_4KS=y
> +CONFIG_AI_MULTIFS_XFS_32K_4KS=y
> +CONFIG_AI_MULTIFS_XFS_64K_4KS=y
> +
> +# ext4 configurations
> +CONFIG_AI_MULTIFS_TEST_EXT4=y
> +CONFIG_AI_MULTIFS_EXT4_4K=y
> +CONFIG_AI_MULTIFS_EXT4_16K_BIGALLOC=y
> +
> +# btrfs configurations
> +CONFIG_AI_MULTIFS_TEST_BTRFS=y
> +CONFIG_AI_MULTIFS_BTRFS_DEFAULT=y
> +
> +# Standard filesystem configuration (for comparison)
> +CONFIG_AI_FILESYSTEM_XFS=y
> +CONFIG_AI_FILESYSTEM="xfs"
> +CONFIG_AI_FSTYPE="xfs"
> +CONFIG_AI_XFS_MKFS_OPTS="-f -s size=4096"
> +CONFIG_AI_XFS_MOUNT_OPTS="rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota"
> +
> +# Use distribution kernel (no kernel building)
> +# CONFIG_BOOTLINUX is not set
> +
> +# Memory configuration
> +CONFIG_LIBVIRT_MEM_MB=16384
> +
> +# Baseline/dev testing setup
> +CONFIG_KDEVOPS_BASELINE_AND_DEV=y
> +# Build Linux
> +CONFIG_WORKFLOW_LINUX_CUSTOM=y
> +CONFIG_BOOTLINUX_AB_DIFFERENT_REF=y
> diff --git a/docs/ai/vector-databases/README.md b/docs/ai/vector-databases/README.md
> index 2a3955d7..0fdd204b 100644
> --- a/docs/ai/vector-databases/README.md
> +++ b/docs/ai/vector-databases/README.md
> @@ -52,7 +52,6 @@ Vector databases heavily depend on storage performance. The workflow tests acros
>  - **XFS**: Default for many production deployments
>  - **ext4**: Traditional Linux filesystem
>  - **btrfs**: Copy-on-write with compression support
> -- **ZFS**: Advanced features for data integrity
>  
>  ## Configuration Dimensions
>  
> diff --git a/playbooks/ai_install.yml b/playbooks/ai_install.yml
> index 70b734e4..38e6671c 100644
> --- a/playbooks/ai_install.yml
> +++ b/playbooks/ai_install.yml
> @@ -4,5 +4,11 @@
>    become: true
>    become_user: root
>    roles:
> +    - role: ai_docker_storage
> +      when: ai_docker_storage_enable | default(true)
> +      tags: ['ai', 'docker', 'storage']
> +    - role: ai_milvus_storage
> +      when: ai_milvus_storage_enable | default(false)
> +      tags: ['ai', 'milvus', 'storage']
>      - role: milvus
>        tags: ['ai', 'vector_db', 'milvus', 'install']
> diff --git a/playbooks/ai_multifs.yml b/playbooks/ai_multifs.yml
> new file mode 100644
> index 00000000..637f11f4
> --- /dev/null
> +++ b/playbooks/ai_multifs.yml
> @@ -0,0 +1,24 @@
> +---
> +- hosts: baseline
> +  become: yes
> +  gather_facts: yes
> +  vars:
> +    ai_benchmark_results_dir: "{{ ai_multifs_results_dir | default('/data/ai-multifs-benchmark') }}"
> +  roles:
> +    - role: ai_multifs_setup
> +    - role: ai_multifs_run
> +  tasks:
> +    - name: Final multi-filesystem testing summary
> +      debug:
> +        msg: |
> +          Multi-filesystem AI benchmark testing completed!
> +
> +          Results directory: {{ ai_multifs_results_dir }}
> +          Comparison report: {{ ai_multifs_results_dir }}/comparison/multi_filesystem_comparison.html
> +
> +          Individual filesystem results:
> +          {% for config in ai_multifs_configurations %}
> +          {% if config.enabled %}
> +          - {{ config.name }}: {{ ai_multifs_results_dir }}/{{ config.name }}/
> +          {% endif %}
> +          {% endfor %}
> diff --git a/playbooks/host_vars/debian13-ai-xfs-4k-4ks.yml b/playbooks/host_vars/debian13-ai-xfs-4k-4ks.yml
> deleted file mode 100644
> index ffe9eb28..00000000
> --- a/playbooks/host_vars/debian13-ai-xfs-4k-4ks.yml
> +++ /dev/null
> @@ -1,10 +0,0 @@
> ----
> -# XFS 4k block, 4k sector configuration
> -ai_docker_fstype: "xfs"
> -ai_docker_xfs_blocksize: 4096
> -ai_docker_xfs_sectorsize: 4096
> -ai_docker_xfs_mkfs_opts: ""
> -filesystem_type: "xfs"
> -filesystem_block_size: "4k-4ks"
> -ai_filesystem: "xfs"
> -ai_data_device_path: "/var/lib/docker"
> \ No newline at end of file
> diff --git a/playbooks/roles/ai_collect_results/files/analyze_results.py b/playbooks/roles/ai_collect_results/files/analyze_results.py
> index 3d11fb11..2dc4a1d6 100755
> --- a/playbooks/roles/ai_collect_results/files/analyze_results.py
> +++ b/playbooks/roles/ai_collect_results/files/analyze_results.py
> @@ -226,6 +226,68 @@ class ResultsAnalyzer:
>  
>          return fs_info
>  
> +    def _extract_filesystem_config(
> +        self, result: Dict[str, Any]
> +    ) -> tuple[str, str, str]:
> +        """Extract filesystem type and block size from result data.
> +        Returns (fs_type, block_size, config_key)"""
> +        filename = result.get("_file", "")
> +
> +        # Primary: Extract filesystem type from filename (more reliable than JSON)
> +        fs_type = "unknown"
> +        block_size = "default"
> +
> +        if "xfs" in filename:
> +            fs_type = "xfs"
> +            # Check larger sizes first to avoid substring matches
> +            if "64k" in filename and "64k-" in filename:
> +                block_size = "64k"
> +            elif "32k" in filename and "32k-" in filename:
> +                block_size = "32k"
> +            elif "16k" in filename and "16k-" in filename:
> +                block_size = "16k"
> +            elif "4k" in filename and "4k-" in filename:
> +                block_size = "4k"
> +        elif "ext4" in filename:
> +            fs_type = "ext4"
> +            if "16k" in filename:
> +                block_size = "16k"
> +            elif "4k" in filename:
> +                block_size = "4k"
> +        elif "btrfs" in filename:
> +            fs_type = "btrfs"
> +            block_size = "default"
> +        else:
> +            # Fallback to JSON data if filename parsing fails
> +            fs_type = result.get("filesystem", "unknown")
> +            self.logger.warning(
> +                f"Could not determine filesystem from filename {filename}, using JSON data: {fs_type}"
> +            )
> +
> +        config_key = f"{fs_type}-{block_size}" if block_size != "default" else fs_type
> +        return fs_type, block_size, config_key
> +
> +    def _extract_node_info(self, result: Dict[str, Any]) -> tuple[str, bool]:
> +        """Extract node hostname and determine if it's a dev node.
> +        Returns (hostname, is_dev_node)"""
> +        # Get hostname from system_info (preferred) or fall back to filename
> +        system_info = result.get("system_info", {})
> +        hostname = system_info.get("hostname", "")
> +
> +        # If no hostname in system_info, try extracting from filename
> +        if not hostname:
> +            filename = result.get("_file", "")
> +            # Remove results_ prefix and .json suffix
> +            hostname = filename.replace("results_", "").replace(".json", "")
> +            # Remove iteration number if present (_1, _2, etc.)
> +            if "_" in hostname and hostname.split("_")[-1].isdigit():
> +                hostname = "_".join(hostname.split("_")[:-1])
> +
> +        # Determine if this is a dev node
> +        is_dev = hostname.endswith("-dev")
> +
> +        return hostname, is_dev
> +
>      def load_results(self) -> bool:
>          """Load all result files from the results directory"""
>          try:
> @@ -391,6 +453,8 @@ class ResultsAnalyzer:
>              html.append(
>                  "        .highlight { background-color: #fff3cd; padding: 10px; border-radius: 3px; }"
>              )
> +            html.append("        .baseline-row { background-color: #e8f5e9; }")
> +            html.append("        .dev-row { background-color: #e3f2fd; }")
>              html.append("    </style>")
>              html.append("</head>")
>              html.append("<body>")
> @@ -486,26 +550,69 @@ class ResultsAnalyzer:
>              else:
>                  html.append("        <p>No storage device information available.</p>")
>  
> -            # Filesystem section
> -            html.append("        <h3>🗂️ Filesystem Configuration</h3>")
> -            fs_info = self.system_info.get("filesystem_info", {})
> -            html.append("        <table class='config-table'>")
> -            html.append(
> -                "            <tr><td>Filesystem Type</td><td>"
> -                + str(fs_info.get("filesystem_type", "Unknown"))
> -                + "</td></tr>"
> -            )
> -            html.append(
> -                "            <tr><td>Mount Point</td><td>"
> -                + str(fs_info.get("mount_point", "Unknown"))
> -                + "</td></tr>"
> -            )
> -            html.append(
> -                "            <tr><td>Mount Options</td><td>"
> -                + str(fs_info.get("mount_options", "Unknown"))
> -                + "</td></tr>"
> -            )
> -            html.append("        </table>")
> +            # Node Configuration section - Extract from actual benchmark results
> +            html.append("        <h3>🗂️ Node Configuration</h3>")
> +
> +            # Collect node and filesystem information from benchmark results
> +            node_configs = {}
> +            for result in self.results_data:
> +                # Extract node information
> +                hostname, is_dev = self._extract_node_info(result)
> +                fs_type, block_size, config_key = self._extract_filesystem_config(
> +                    result
> +                )
> +
> +                system_info = result.get("system_info", {})
> +                data_path = system_info.get("data_path", "/data/milvus")
> +                mount_point = system_info.get("mount_point", "/data")
> +                kernel_version = system_info.get("kernel_version", "unknown")
> +
> +                if hostname not in node_configs:
> +                    node_configs[hostname] = {
> +                        "hostname": hostname,
> +                        "node_type": "Development" if is_dev else "Baseline",
> +                        "filesystem": fs_type,
> +                        "block_size": block_size,
> +                        "data_path": data_path,
> +                        "mount_point": mount_point,
> +                        "kernel": kernel_version,
> +                        "test_count": 0,
> +                    }
> +                node_configs[hostname]["test_count"] += 1
> +
> +            if node_configs:
> +                html.append("        <table class='config-table'>")
> +                html.append(
> +                    "            <tr><th>Node</th><th>Type</th><th>Filesystem</th><th>Block Size</th><th>Data Path</th><th>Mount Point</th><th>Kernel</th><th>Tests</th></tr>"
> +                )
> +                # Sort nodes with baseline first, then dev
> +                sorted_nodes = sorted(
> +                    node_configs.items(),
> +                    key=lambda x: (x[1]["node_type"] != "Baseline", x[0]),
> +                )
> +                for hostname, config_info in sorted_nodes:
> +                    row_class = (
> +                        "dev-row"
> +                        if config_info["node_type"] == "Development"
> +                        else "baseline-row"
> +                    )
> +                    html.append(f"            <tr class='{row_class}'>")
> +                    html.append(f"                <td><strong>{hostname}</strong></td>")
> +                    html.append(f"                <td>{config_info['node_type']}</td>")
> +                    html.append(f"                <td>{config_info['filesystem']}</td>")
> +                    html.append(f"                <td>{config_info['block_size']}</td>")
> +                    html.append(f"                <td>{config_info['data_path']}</td>")
> +                    html.append(
> +                        f"                <td>{config_info['mount_point']}</td>"
> +                    )
> +                    html.append(f"                <td>{config_info['kernel']}</td>")
> +                    html.append(f"                <td>{config_info['test_count']}</td>")
> +                    html.append(f"            </tr>")
> +                html.append("        </table>")
> +            else:
> +                html.append(
> +                    "        <p>No node configuration data found in results.</p>"
> +                )
>              html.append("    </div>")
>  
>              # Test Configuration Section
> @@ -551,92 +658,192 @@ class ResultsAnalyzer:
>                  html.append("        </table>")
>                  html.append("    </div>")
>  
> -            # Performance Results Section
> +            # Performance Results Section - Per Node
>              html.append("    <div class='section'>")
> -            html.append("        <h2>📊 Performance Results Summary</h2>")
> +            html.append("        <h2>📊 Performance Results by Node</h2>")
>  
>              if self.results_data:
> -                # Insert performance
> -                insert_times = [
> -                    r.get("insert_performance", {}).get("total_time_seconds", 0)
> -                    for r in self.results_data
> -                ]
> -                insert_rates = [
> -                    r.get("insert_performance", {}).get("vectors_per_second", 0)
> -                    for r in self.results_data
> -                ]
> -
> -                if insert_times and any(t > 0 for t in insert_times):
> -                    html.append("        <h3>📈 Vector Insert Performance</h3>")
> -                    html.append("        <table class='metric-table'>")
> -                    html.append(
> -                        f"            <tr><td>Average Insert Time</td><td>{np.mean(insert_times):.2f} seconds</td></tr>"
> -                    )
> -                    html.append(
> -                        f"            <tr><td>Average Insert Rate</td><td>{np.mean(insert_rates):.2f} vectors/sec</td></tr>"
> +                # Group results by node
> +                node_performance = {}
> +
> +                for result in self.results_data:
> +                    # Use node hostname as the grouping key
> +                    hostname, is_dev = self._extract_node_info(result)
> +                    fs_type, block_size, config_key = self._extract_filesystem_config(
> +                        result
>                      )
> -                    html.append(
> -                        f"            <tr><td>Insert Rate Range</td><td>{np.min(insert_rates):.2f} - {np.max(insert_rates):.2f} vectors/sec</td></tr>"
> -                    )
> -                    html.append("        </table>")
>  
> -                # Index performance
> -                index_times = [
> -                    r.get("index_performance", {}).get("creation_time_seconds", 0)
> -                    for r in self.results_data
> -                ]
> -                if index_times and any(t > 0 for t in index_times):
> -                    html.append("        <h3>🔗 Index Creation Performance</h3>")
> -                    html.append("        <table class='metric-table'>")
> -                    html.append(
> -                        f"            <tr><td>Average Index Creation Time</td><td>{np.mean(index_times):.2f} seconds</td></tr>"
> +                    if hostname not in node_performance:
> +                        node_performance[hostname] = {
> +                            "hostname": hostname,
> +                            "node_type": "Development" if is_dev else "Baseline",
> +                            "insert_rates": [],
> +                            "insert_times": [],
> +                            "index_times": [],
> +                            "query_performance": {},
> +                            "filesystem": fs_type,
> +                            "block_size": block_size,
> +                        }
> +
> +                    # Add insert performance
> +                    insert_perf = result.get("insert_performance", {})
> +                    if insert_perf:
> +                        rate = insert_perf.get("vectors_per_second", 0)
> +                        time = insert_perf.get("total_time_seconds", 0)
> +                        if rate > 0:
> +                            node_performance[hostname]["insert_rates"].append(rate)
> +                        if time > 0:
> +                            node_performance[hostname]["insert_times"].append(time)
> +
> +                    # Add index performance
> +                    index_perf = result.get("index_performance", {})
> +                    if index_perf:
> +                        time = index_perf.get("creation_time_seconds", 0)
> +                        if time > 0:
> +                            node_performance[hostname]["index_times"].append(time)
> +
> +                    # Collect query performance (use first result for each node)
> +                    query_perf = result.get("query_performance", {})
> +                    if (
> +                        query_perf
> +                        and not node_performance[hostname]["query_performance"]
> +                    ):
> +                        node_performance[hostname]["query_performance"] = query_perf
> +
> +                # Display results for each node, sorted with baseline first
> +                sorted_nodes = sorted(
> +                    node_performance.items(),
> +                    key=lambda x: (x[1]["node_type"] != "Baseline", x[0]),
> +                )
> +                for hostname, perf_data in sorted_nodes:
> +                    node_type_badge = (
> +                        "🔵" if perf_data["node_type"] == "Development" else "🟢"
>                      )
>                      html.append(
> -                        f"            <tr><td>Index Time Range</td><td>{np.min(index_times):.2f} - {np.max(index_times):.2f} seconds</td></tr>"
> +                        f"        <h3>{node_type_badge} {hostname} ({perf_data['node_type']})</h3>"
>                      )
> -                    html.append("        </table>")
> -
> -                # Query performance
> -                html.append("        <h3>🔍 Query Performance</h3>")
> -                first_query_perf = self.results_data[0].get("query_performance", {})
> -                if first_query_perf:
> -                    html.append("        <table>")
>                      html.append(
> -                        "            <tr><th>Query Type</th><th>Batch Size</th><th>QPS</th><th>Avg Latency (ms)</th></tr>"
> +                        f"        <p>Filesystem: {perf_data['filesystem']}, Block Size: {perf_data['block_size']}</p>"
>                      )
>  
> -                    for topk, topk_data in first_query_perf.items():
> -                        for batch, batch_data in topk_data.items():
> -                            qps = batch_data.get("queries_per_second", 0)
> -                            avg_time = batch_data.get("average_time_seconds", 0) * 1000
> -
> -                            # Color coding for performance
> -                            qps_class = ""
> -                            if qps > 1000:
> -                                qps_class = "performance-good"
> -                            elif qps > 100:
> -                                qps_class = "performance-warning"
> -                            else:
> -                                qps_class = "performance-poor"
> -
> -                            html.append(f"            <tr>")
> -                            html.append(
> -                                f"                <td>{topk.replace('topk_', 'Top-')}</td>"
> -                            )
> -                            html.append(
> -                                f"                <td>{batch.replace('batch_', 'Batch ')}</td>"
> -                            )
> -                            html.append(
> -                                f"                <td class='{qps_class}'>{qps:.2f}</td>"
> -                            )
> -                            html.append(f"                <td>{avg_time:.2f}</td>")
> -                            html.append(f"            </tr>")
> +                    # Insert performance
> +                    insert_rates = perf_data["insert_rates"]
> +                    if insert_rates:
> +                        html.append("        <h4>📈 Vector Insert Performance</h4>")
> +                        html.append("        <table class='metric-table'>")
> +                        html.append(
> +                            f"            <tr><td>Average Insert Rate</td><td>{np.mean(insert_rates):.2f} vectors/sec</td></tr>"
> +                        )
> +                        html.append(
> +                            f"            <tr><td>Insert Rate Range</td><td>{np.min(insert_rates):.2f} - {np.max(insert_rates):.2f} vectors/sec</td></tr>"
> +                        )
> +                        html.append(
> +                            f"            <tr><td>Test Iterations</td><td>{len(insert_rates)}</td></tr>"
> +                        )
> +                        html.append("        </table>")
> +
> +                    # Index performance
> +                    index_times = perf_data["index_times"]
> +                    if index_times:
> +                        html.append("        <h4>🔗 Index Creation Performance</h4>")
> +                        html.append("        <table class='metric-table'>")
> +                        html.append(
> +                            f"            <tr><td>Average Index Creation Time</td><td>{np.mean(index_times):.3f} seconds</td></tr>"
> +                        )
> +                        html.append(
> +                            f"            <tr><td>Index Time Range</td><td>{np.min(index_times):.3f} - {np.max(index_times):.3f} seconds</td></tr>"
> +                        )
> +                        html.append("        </table>")
> +
> +                    # Query performance
> +                    query_perf = perf_data["query_performance"]
> +                    if query_perf:
> +                        html.append("        <h4>🔍 Query Performance</h4>")
> +                        html.append("        <table>")
> +                        html.append(
> +                            "            <tr><th>Query Type</th><th>Batch Size</th><th>QPS</th><th>Avg Latency (ms)</th></tr>"
> +                        )
>  
> -                    html.append("        </table>")
> +                        for topk, topk_data in query_perf.items():
> +                            for batch, batch_data in topk_data.items():
> +                                qps = batch_data.get("queries_per_second", 0)
> +                                avg_time = (
> +                                    batch_data.get("average_time_seconds", 0) * 1000
> +                                )
> +
> +                                # Color coding for performance
> +                                qps_class = ""
> +                                if qps > 1000:
> +                                    qps_class = "performance-good"
> +                                elif qps > 100:
> +                                    qps_class = "performance-warning"
> +                                else:
> +                                    qps_class = "performance-poor"
> +
> +                                html.append(f"            <tr>")
> +                                html.append(
> +                                    f"                <td>{topk.replace('topk_', 'Top-')}</td>"
> +                                )
> +                                html.append(
> +                                    f"                <td>{batch.replace('batch_', 'Batch ')}</td>"
> +                                )
> +                                html.append(
> +                                    f"                <td class='{qps_class}'>{qps:.2f}</td>"
> +                                )
> +                                html.append(f"                <td>{avg_time:.2f}</td>")
> +                                html.append(f"            </tr>")
> +                        html.append("        </table>")
> +
> +                    html.append("        <br>")  # Add spacing between configurations
>  
> -                html.append("    </div>")
> +            html.append("    </div>")
>  
>              # Footer
> +            # Performance Graphs Section
> +            html.append("    <div class='section'>")
> +            html.append("        <h2>📈 Performance Visualizations</h2>")
> +            html.append(
> +                "        <p>The following graphs provide visual analysis of the benchmark results across all tested filesystem configurations:</p>"
> +            )
> +            html.append("        <ul>")
> +            html.append(
> +                "            <li><strong>Insert Performance:</strong> Shows vector insertion rates and times for each filesystem configuration</li>"
> +            )
> +            html.append(
> +                "            <li><strong>Query Performance:</strong> Displays query performance heatmaps for different Top-K and batch sizes</li>"
> +            )
> +            html.append(
> +                "            <li><strong>Index Performance:</strong> Compares index creation times across filesystems</li>"
> +            )
> +            html.append(
> +                "            <li><strong>Performance Matrix:</strong> Comprehensive comparison matrix of all metrics</li>"
> +            )
> +            html.append(
> +                "            <li><strong>Filesystem Comparison:</strong> Side-by-side comparison of filesystem performance</li>"
> +            )
> +            html.append("        </ul>")
> +            html.append(
> +                "        <p><em>Note: Graphs are generated as separate PNG files in the same directory as this report.</em></p>"
> +            )
> +            html.append("        <div style='margin-top: 20px;'>")
> +            html.append(
> +                "            <img src='insert_performance.png' alt='Insert Performance' style='max-width: 100%; height: auto; margin-bottom: 20px;'>"
> +            )
> +            html.append(
> +                "            <img src='query_performance.png' alt='Query Performance' style='max-width: 100%; height: auto; margin-bottom: 20px;'>"
> +            )
> +            html.append(
> +                "            <img src='index_performance.png' alt='Index Performance' style='max-width: 100%; height: auto; margin-bottom: 20px;'>"
> +            )
> +            html.append(
> +                "            <img src='performance_matrix.png' alt='Performance Matrix' style='max-width: 100%; height: auto; margin-bottom: 20px;'>"
> +            )
> +            html.append(
> +                "            <img src='filesystem_comparison.png' alt='Filesystem Comparison' style='max-width: 100%; height: auto; margin-bottom: 20px;'>"
> +            )
> +            html.append("        </div>")
> +            html.append("    </div>")
> +
>              html.append("    <div class='section'>")
>              html.append("        <h2>📝 Notes</h2>")
>              html.append("        <ul>")
> @@ -661,10 +868,11 @@ class ResultsAnalyzer:
>              return "\n".join(html)
>  
>          except Exception as e:
> -            self.logger.error(f"Error generating HTML report: {e}")
> -            return (
> -                f"<html><body><h1>Error generating HTML report: {e}</h1></body></html>"
> -            )
> +            import traceback
> +
> +            tb = traceback.format_exc()
> +            self.logger.error(f"Error generating HTML report: {e}\n{tb}")
> +            return f"<html><body><h1>Error generating HTML report: {e}</h1><pre>{tb}</pre></body></html>"
>  
>      def generate_graphs(self) -> bool:
>          """Generate performance visualization graphs"""
> @@ -691,6 +899,9 @@ class ResultsAnalyzer:
>              # Graph 4: Performance Comparison Matrix
>              self._plot_performance_matrix()
>  
> +            # Graph 5: Multi-filesystem Comparison (if applicable)
> +            self._plot_filesystem_comparison()
> +
>              self.logger.info("Graphs generated successfully")
>              return True
>  
> @@ -699,34 +910,188 @@ class ResultsAnalyzer:
>              return False
>  
>      def _plot_insert_performance(self):
> -        """Plot insert performance metrics"""
> -        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
> +        """Plot insert performance metrics with node differentiation"""
> +        # Group data by node
> +        node_performance = {}
>  
> -        # Extract insert data
> -        iterations = []
> -        insert_rates = []
> -        insert_times = []
> +        for result in self.results_data:
> +            hostname, is_dev = self._extract_node_info(result)
> +
> +            if hostname not in node_performance:
> +                node_performance[hostname] = {
> +                    "insert_rates": [],
> +                    "insert_times": [],
> +                    "iterations": [],
> +                    "is_dev": is_dev,
> +                }
>  
> -        for i, result in enumerate(self.results_data):
>              insert_perf = result.get("insert_performance", {})
>              if insert_perf:
> -                iterations.append(i + 1)
> -                insert_rates.append(insert_perf.get("vectors_per_second", 0))
> -                insert_times.append(insert_perf.get("total_time_seconds", 0))
> -
> -        # Plot insert rate
> -        ax1.plot(iterations, insert_rates, "b-o", linewidth=2, markersize=6)
> -        ax1.set_xlabel("Iteration")
> -        ax1.set_ylabel("Vectors/Second")
> -        ax1.set_title("Vector Insert Rate Performance")
> -        ax1.grid(True, alpha=0.3)
> -
> -        # Plot insert time
> -        ax2.plot(iterations, insert_times, "r-o", linewidth=2, markersize=6)
> -        ax2.set_xlabel("Iteration")
> -        ax2.set_ylabel("Total Time (seconds)")
> -        ax2.set_title("Vector Insert Time Performance")
> -        ax2.grid(True, alpha=0.3)
> +                node_performance[hostname]["insert_rates"].append(
> +                    insert_perf.get("vectors_per_second", 0)
> +                )
> +                node_performance[hostname]["insert_times"].append(
> +                    insert_perf.get("total_time_seconds", 0)
> +                )
> +                node_performance[hostname]["iterations"].append(
> +                    len(node_performance[hostname]["insert_rates"])
> +                )
> +
> +        # Check if we have multiple nodes
> +        if len(node_performance) > 1:
> +            # Multi-node mode: separate lines for each node
> +            fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(18, 7))
> +
> +            # Sort nodes with baseline first, then dev
> +            sorted_nodes = sorted(
> +                node_performance.items(), key=lambda x: (x[1]["is_dev"], x[0])
> +            )
> +
> +            # Create color palettes for baseline and dev nodes
> +            baseline_colors = [
> +                "#2E7D32",
> +                "#43A047",
> +                "#66BB6A",
> +                "#81C784",
> +                "#A5D6A7",
> +                "#C8E6C9",
> +            ]  # Greens
> +            dev_colors = [
> +                "#0D47A1",
> +                "#1565C0",
> +                "#1976D2",
> +                "#1E88E5",
> +                "#2196F3",
> +                "#42A5F5",
> +                "#64B5F6",
> +            ]  # Blues
> +
> +            # Additional colors if needed
> +            extra_colors = [
> +                "#E65100",
> +                "#F57C00",
> +                "#FF9800",
> +                "#FFB300",
> +                "#FFC107",
> +                "#FFCA28",
> +            ]  # Oranges
> +
> +            # Line styles to cycle through
> +            line_styles = ["-", "--", "-.", ":"]
> +            markers = ["o", "s", "^", "v", "D", "p", "*", "h"]
> +
> +            baseline_idx = 0
> +            dev_idx = 0
> +
> +            # Use different colors and styles for each node
> +            for idx, (hostname, perf_data) in enumerate(sorted_nodes):
> +                if not perf_data["insert_rates"]:
> +                    continue
> +
> +                # Choose color and style based on node type and index
> +                if perf_data["is_dev"]:
> +                    # Development nodes - blues
> +                    color = dev_colors[dev_idx % len(dev_colors)]
> +                    linestyle = line_styles[
> +                        (dev_idx // len(dev_colors)) % len(line_styles)
> +                    ]
> +                    marker = markers[4 + (dev_idx % 4)]  # Use markers 4-7 for dev
> +                    label = f"{hostname} (Dev)"
> +                    dev_idx += 1
> +                else:
> +                    # Baseline nodes - greens
> +                    color = baseline_colors[baseline_idx % len(baseline_colors)]
> +                    linestyle = line_styles[
> +                        (baseline_idx // len(baseline_colors)) % len(line_styles)
> +                    ]
> +                    marker = markers[
> +                        baseline_idx % 4
> +                    ]  # Use first 4 markers for baseline
> +                    label = f"{hostname} (Baseline)"
> +                    baseline_idx += 1
> +
> +                iterations = list(range(1, len(perf_data["insert_rates"]) + 1))
> +
> +                # Plot insert rate with alpha for better visibility
> +                ax1.plot(
> +                    iterations,
> +                    perf_data["insert_rates"],
> +                    color=color,
> +                    linestyle=linestyle,
> +                    marker=marker,
> +                    linewidth=1.5,
> +                    markersize=5,
> +                    label=label,
> +                    alpha=0.8,
> +                )
> +
> +                # Plot insert time
> +                ax2.plot(
> +                    iterations,
> +                    perf_data["insert_times"],
> +                    color=color,
> +                    linestyle=linestyle,
> +                    marker=marker,
> +                    linewidth=1.5,
> +                    markersize=5,
> +                    label=label,
> +                    alpha=0.8,
> +                )
> +
> +            ax1.set_xlabel("Iteration")
> +            ax1.set_ylabel("Vectors/Second")
> +            ax1.set_title("Milvus Insert Rate by Node")
> +            ax1.grid(True, alpha=0.3)
> +            # Position legend outside plot area for better visibility with many nodes
> +            ax1.legend(bbox_to_anchor=(1.05, 1), loc="upper left", fontsize=7, ncol=1)
> +
> +            ax2.set_xlabel("Iteration")
> +            ax2.set_ylabel("Total Time (seconds)")
> +            ax2.set_title("Milvus Insert Time by Node")
> +            ax2.grid(True, alpha=0.3)
> +            # Position legend outside plot area for better visibility with many nodes
> +            ax2.legend(bbox_to_anchor=(1.05, 1), loc="upper left", fontsize=7, ncol=1)
> +
> +            plt.suptitle(
> +                "Insert Performance Analysis: Baseline vs Development",
> +                fontsize=14,
> +                y=1.02,
> +            )
> +        else:
> +            # Single node mode: original behavior
> +            fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
> +
> +            # Extract insert data from single node
> +            hostname = list(node_performance.keys())[0] if node_performance else None
> +            if hostname:
> +                perf_data = node_performance[hostname]
> +                iterations = list(range(1, len(perf_data["insert_rates"]) + 1))
> +
> +                # Plot insert rate
> +                ax1.plot(
> +                    iterations,
> +                    perf_data["insert_rates"],
> +                    "b-o",
> +                    linewidth=2,
> +                    markersize=6,
> +                )
> +                ax1.set_xlabel("Iteration")
> +                ax1.set_ylabel("Vectors/Second")
> +                ax1.set_title(f"Vector Insert Rate Performance - {hostname}")
> +                ax1.grid(True, alpha=0.3)
> +
> +                # Plot insert time
> +                ax2.plot(
> +                    iterations,
> +                    perf_data["insert_times"],
> +                    "r-o",
> +                    linewidth=2,
> +                    markersize=6,
> +                )
> +                ax2.set_xlabel("Iteration")
> +                ax2.set_ylabel("Total Time (seconds)")
> +                ax2.set_title(f"Vector Insert Time Performance - {hostname}")
> +                ax2.grid(True, alpha=0.3)
>  
>          plt.tight_layout()
>          output_file = os.path.join(
> @@ -739,52 +1104,110 @@ class ResultsAnalyzer:
>          plt.close()
>  
>      def _plot_query_performance(self):
> -        """Plot query performance metrics"""
> +        """Plot query performance metrics comparing baseline vs dev nodes"""
>          if not self.results_data:
>              return
>  
> -        # Collect query performance data
> -        query_data = []
> +        # Group data by filesystem configuration
> +        fs_groups = {}
>          for result in self.results_data:
> +            hostname, is_dev = self._extract_node_info(result)
> +            fs_type, block_size, config_key = self._extract_filesystem_config(result)
> +
> +            if config_key not in fs_groups:
> +                fs_groups[config_key] = {"baseline": [], "dev": []}
> +
>              query_perf = result.get("query_performance", {})
> -            for topk, topk_data in query_perf.items():
> -                for batch, batch_data in topk_data.items():
> -                    query_data.append(
> -                        {
> -                            "topk": topk.replace("topk_", ""),
> -                            "batch": batch.replace("batch_", ""),
> -                            "qps": batch_data.get("queries_per_second", 0),
> -                            "avg_time": batch_data.get("average_time_seconds", 0)
> -                            * 1000,  # Convert to ms
> -                        }
> -                    )
> +            if query_perf:
> +                node_type = "dev" if is_dev else "baseline"
> +                for topk, topk_data in query_perf.items():
> +                    for batch, batch_data in topk_data.items():
> +                        fs_groups[config_key][node_type].append(
> +                            {
> +                                "hostname": hostname,
> +                                "topk": topk.replace("topk_", ""),
> +                                "batch": batch.replace("batch_", ""),
> +                                "qps": batch_data.get("queries_per_second", 0),
> +                                "avg_time": batch_data.get("average_time_seconds", 0)
> +                                * 1000,
> +                            }
> +                        )
>  
> -        if not query_data:
> +        if not fs_groups:
>              return
>  
> -        df = pd.DataFrame(query_data)
> +        # Create subplots for each filesystem config
> +        n_configs = len(fs_groups)
> +        fig_height = max(8, 4 * n_configs)
> +        fig, axes = plt.subplots(n_configs, 2, figsize=(16, fig_height))
>  
> -        # Create subplots
> -        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
> +        if n_configs == 1:
> +            axes = axes.reshape(1, -1)
>  
> -        # QPS heatmap
> -        qps_pivot = df.pivot_table(
> -            values="qps", index="topk", columns="batch", aggfunc="mean"
> -        )
> -        sns.heatmap(qps_pivot, annot=True, fmt=".1f", ax=ax1, cmap="YlOrRd")
> -        ax1.set_title("Queries Per Second (QPS)")
> -        ax1.set_xlabel("Batch Size")
> -        ax1.set_ylabel("Top-K")
> -
> -        # Latency heatmap
> -        latency_pivot = df.pivot_table(
> -            values="avg_time", index="topk", columns="batch", aggfunc="mean"
> -        )
> -        sns.heatmap(latency_pivot, annot=True, fmt=".1f", ax=ax2, cmap="YlOrRd")
> -        ax2.set_title("Average Query Latency (ms)")
> -        ax2.set_xlabel("Batch Size")
> -        ax2.set_ylabel("Top-K")
> +        for idx, (config_key, data) in enumerate(sorted(fs_groups.items())):
> +            # Create DataFrames for baseline and dev
> +            baseline_df = (
> +                pd.DataFrame(data["baseline"]) if data["baseline"] else pd.DataFrame()
> +            )
> +            dev_df = pd.DataFrame(data["dev"]) if data["dev"] else pd.DataFrame()
> +
> +            # Baseline QPS heatmap
> +            ax_base = axes[idx][0]
> +            if not baseline_df.empty:
> +                baseline_pivot = baseline_df.pivot_table(
> +                    values="qps", index="topk", columns="batch", aggfunc="mean"
> +                )
> +                sns.heatmap(
> +                    baseline_pivot,
> +                    annot=True,
> +                    fmt=".1f",
> +                    ax=ax_base,
> +                    cmap="Greens",
> +                    cbar_kws={"label": "QPS"},
> +                )
> +                ax_base.set_title(f"{config_key.upper()} - Baseline QPS")
> +                ax_base.set_xlabel("Batch Size")
> +                ax_base.set_ylabel("Top-K")
> +            else:
> +                ax_base.text(
> +                    0.5,
> +                    0.5,
> +                    f"No baseline data for {config_key}",
> +                    ha="center",
> +                    va="center",
> +                    transform=ax_base.transAxes,
> +                )
> +                ax_base.set_title(f"{config_key.upper()} - Baseline QPS")
>  
> +            # Dev QPS heatmap
> +            ax_dev = axes[idx][1]
> +            if not dev_df.empty:
> +                dev_pivot = dev_df.pivot_table(
> +                    values="qps", index="topk", columns="batch", aggfunc="mean"
> +                )
> +                sns.heatmap(
> +                    dev_pivot,
> +                    annot=True,
> +                    fmt=".1f",
> +                    ax=ax_dev,
> +                    cmap="Blues",
> +                    cbar_kws={"label": "QPS"},
> +                )
> +                ax_dev.set_title(f"{config_key.upper()} - Development QPS")
> +                ax_dev.set_xlabel("Batch Size")
> +                ax_dev.set_ylabel("Top-K")
> +            else:
> +                ax_dev.text(
> +                    0.5,
> +                    0.5,
> +                    f"No dev data for {config_key}",
> +                    ha="center",
> +                    va="center",
> +                    transform=ax_dev.transAxes,
> +                )
> +                ax_dev.set_title(f"{config_key.upper()} - Development QPS")
> +
> +        plt.suptitle("Query Performance: Baseline vs Development", fontsize=16, y=1.02)
>          plt.tight_layout()
>          output_file = os.path.join(
>              self.output_dir,
> @@ -796,32 +1219,101 @@ class ResultsAnalyzer:
>          plt.close()
>  
>      def _plot_index_performance(self):
> -        """Plot index creation performance"""
> -        iterations = []
> -        index_times = []
> +        """Plot index creation performance comparing baseline vs dev"""
> +        # Group by filesystem configuration
> +        fs_groups = {}
> +
> +        for result in self.results_data:
> +            hostname, is_dev = self._extract_node_info(result)
> +            fs_type, block_size, config_key = self._extract_filesystem_config(result)
> +
> +            if config_key not in fs_groups:
> +                fs_groups[config_key] = {"baseline": [], "dev": []}
>  
> -        for i, result in enumerate(self.results_data):
>              index_perf = result.get("index_performance", {})
>              if index_perf:
> -                iterations.append(i + 1)
> -                index_times.append(index_perf.get("creation_time_seconds", 0))
> +                time = index_perf.get("creation_time_seconds", 0)
> +                if time > 0:
> +                    node_type = "dev" if is_dev else "baseline"
> +                    fs_groups[config_key][node_type].append(time)
>  
> -        if not index_times:
> +        if not fs_groups:
>              return
>  
> -        plt.figure(figsize=(10, 6))
> -        plt.bar(iterations, index_times, alpha=0.7, color="green")
> -        plt.xlabel("Iteration")
> -        plt.ylabel("Index Creation Time (seconds)")
> -        plt.title("Index Creation Performance")
> -        plt.grid(True, alpha=0.3)
> -
> -        # Add average line
> -        avg_time = np.mean(index_times)
> -        plt.axhline(
> -            y=avg_time, color="red", linestyle="--", label=f"Average: {avg_time:.2f}s"
> +        # Create comparison bar chart
> +        fig, ax = plt.subplots(figsize=(14, 8))
> +
> +        configs = sorted(fs_groups.keys())
> +        x = np.arange(len(configs))
> +        width = 0.35
> +
> +        # Calculate averages for each config
> +        baseline_avgs = []
> +        dev_avgs = []
> +        baseline_stds = []
> +        dev_stds = []
> +
> +        for config in configs:
> +            baseline_times = fs_groups[config]["baseline"]
> +            dev_times = fs_groups[config]["dev"]
> +
> +            baseline_avgs.append(np.mean(baseline_times) if baseline_times else 0)
> +            dev_avgs.append(np.mean(dev_times) if dev_times else 0)
> +            baseline_stds.append(np.std(baseline_times) if baseline_times else 0)
> +            dev_stds.append(np.std(dev_times) if dev_times else 0)
> +
> +        # Create bars
> +        bars1 = ax.bar(
> +            x - width / 2,
> +            baseline_avgs,
> +            width,
> +            yerr=baseline_stds,
> +            label="Baseline",
> +            color="#4CAF50",
> +            capsize=5,
> +        )
> +        bars2 = ax.bar(
> +            x + width / 2,
> +            dev_avgs,
> +            width,
> +            yerr=dev_stds,
> +            label="Development",
> +            color="#2196F3",
> +            capsize=5,
>          )
> -        plt.legend()
> +
> +        # Add value labels on bars
> +        for bar, val in zip(bars1, baseline_avgs):
> +            if val > 0:
> +                height = bar.get_height()
> +                ax.text(
> +                    bar.get_x() + bar.get_width() / 2.0,
> +                    height,
> +                    f"{val:.3f}s",
> +                    ha="center",
> +                    va="bottom",
> +                    fontsize=9,
> +                )
> +
> +        for bar, val in zip(bars2, dev_avgs):
> +            if val > 0:
> +                height = bar.get_height()
> +                ax.text(
> +                    bar.get_x() + bar.get_width() / 2.0,
> +                    height,
> +                    f"{val:.3f}s",
> +                    ha="center",
> +                    va="bottom",
> +                    fontsize=9,
> +                )
> +
> +        ax.set_xlabel("Filesystem Configuration", fontsize=12)
> +        ax.set_ylabel("Index Creation Time (seconds)", fontsize=12)
> +        ax.set_title("Index Creation Performance: Baseline vs Development", fontsize=14)
> +        ax.set_xticks(x)
> +        ax.set_xticklabels([c.upper() for c in configs], rotation=45, ha="right")
> +        ax.legend(loc="upper right")
> +        ax.grid(True, alpha=0.3, axis="y")
>  
>          output_file = os.path.join(
>              self.output_dir,
> @@ -833,61 +1325,148 @@ class ResultsAnalyzer:
>          plt.close()
>  
>      def _plot_performance_matrix(self):
> -        """Plot comprehensive performance comparison matrix"""
> +        """Plot performance comparison matrix for each filesystem config"""
>          if len(self.results_data) < 2:
>              return
>  
> -        # Extract key metrics for comparison
> -        metrics = []
> -        for i, result in enumerate(self.results_data):
> +        # Group by filesystem configuration
> +        fs_metrics = {}
> +
> +        for result in self.results_data:
> +            hostname, is_dev = self._extract_node_info(result)
> +            fs_type, block_size, config_key = self._extract_filesystem_config(result)
> +
> +            if config_key not in fs_metrics:
> +                fs_metrics[config_key] = {"baseline": [], "dev": []}
> +
> +            # Collect metrics
>              insert_perf = result.get("insert_performance", {})
>              index_perf = result.get("index_performance", {})
> +            query_perf = result.get("query_performance", {})
>  
>              metric = {
> -                "iteration": i + 1,
> +                "hostname": hostname,
>                  "insert_rate": insert_perf.get("vectors_per_second", 0),
>                  "index_time": index_perf.get("creation_time_seconds", 0),
>              }
>  
> -            # Add query metrics
> -            query_perf = result.get("query_performance", {})
> +            # Get representative query performance (topk_10, batch_1)
>              if "topk_10" in query_perf and "batch_1" in query_perf["topk_10"]:
>                  metric["query_qps"] = query_perf["topk_10"]["batch_1"].get(
>                      "queries_per_second", 0
>                  )
> +            else:
> +                metric["query_qps"] = 0
>  
> -            metrics.append(metric)
> +            node_type = "dev" if is_dev else "baseline"
> +            fs_metrics[config_key][node_type].append(metric)
>  
> -        df = pd.DataFrame(metrics)
> +        if not fs_metrics:
> +            return
>  
> -        # Normalize metrics for comparison
> -        numeric_cols = ["insert_rate", "index_time", "query_qps"]
> -        for col in numeric_cols:
> -            if col in df.columns:
> -                df[f"{col}_norm"] = (df[col] - df[col].min()) / (
> -                    df[col].max() - df[col].min() + 1e-6
> -                )
> +        # Create subplots for each filesystem
> +        n_configs = len(fs_metrics)
> +        n_cols = min(3, n_configs)
> +        n_rows = (n_configs + n_cols - 1) // n_cols
> +
> +        fig, axes = plt.subplots(n_rows, n_cols, figsize=(n_cols * 6, n_rows * 5))
> +        if n_rows == 1 and n_cols == 1:
> +            axes = [[axes]]
> +        elif n_rows == 1:
> +            axes = [axes]
> +        elif n_cols == 1:
> +            axes = [[ax] for ax in axes]
> +
> +        for idx, (config_key, data) in enumerate(sorted(fs_metrics.items())):
> +            row = idx // n_cols
> +            col = idx % n_cols
> +            ax = axes[row][col]
> +
> +            # Calculate averages
> +            baseline_metrics = data["baseline"]
> +            dev_metrics = data["dev"]
> +
> +            if baseline_metrics and dev_metrics:
> +                categories = ["Insert Rate\n(vec/s)", "Index Time\n(s)", "Query QPS"]
> +
> +                baseline_avg = [
> +                    np.mean([m["insert_rate"] for m in baseline_metrics]),
> +                    np.mean([m["index_time"] for m in baseline_metrics]),
> +                    np.mean([m["query_qps"] for m in baseline_metrics]),
> +                ]
>  
> -        # Create radar chart
> -        fig, ax = plt.subplots(figsize=(10, 8), subplot_kw=dict(projection="polar"))
> +                dev_avg = [
> +                    np.mean([m["insert_rate"] for m in dev_metrics]),
> +                    np.mean([m["index_time"] for m in dev_metrics]),
> +                    np.mean([m["query_qps"] for m in dev_metrics]),
> +                ]
>  
> -        angles = np.linspace(0, 2 * np.pi, len(numeric_cols), endpoint=False).tolist()
> -        angles += angles[:1]  # Complete the circle
> +                x = np.arange(len(categories))
> +                width = 0.35
>  
> -        for i, row in df.iterrows():
> -            values = [row.get(f"{col}_norm", 0) for col in numeric_cols]
> -            values += values[:1]  # Complete the circle
> +                bars1 = ax.bar(
> +                    x - width / 2,
> +                    baseline_avg,
> +                    width,
> +                    label="Baseline",
> +                    color="#4CAF50",
> +                )
> +                bars2 = ax.bar(
> +                    x + width / 2, dev_avg, width, label="Development", color="#2196F3"
> +                )
>  
> -            ax.plot(
> -                angles, values, "o-", linewidth=2, label=f'Iteration {row["iteration"]}'
> -            )
> -            ax.fill(angles, values, alpha=0.25)
> +                # Add value labels
> +                for bar, val in zip(bars1, baseline_avg):
> +                    height = bar.get_height()
> +                    ax.text(
> +                        bar.get_x() + bar.get_width() / 2.0,
> +                        height,
> +                        f"{val:.0f}" if val > 100 else f"{val:.2f}",
> +                        ha="center",
> +                        va="bottom",
> +                        fontsize=8,
> +                    )
>  
> -        ax.set_xticks(angles[:-1])
> -        ax.set_xticklabels(["Insert Rate", "Index Time (inv)", "Query QPS"])
> -        ax.set_ylim(0, 1)
> -        ax.set_title("Performance Comparison Matrix (Normalized)", y=1.08)
> -        ax.legend(loc="upper right", bbox_to_anchor=(1.3, 1.0))
> +                for bar, val in zip(bars2, dev_avg):
> +                    height = bar.get_height()
> +                    ax.text(
> +                        bar.get_x() + bar.get_width() / 2.0,
> +                        height,
> +                        f"{val:.0f}" if val > 100 else f"{val:.2f}",
> +                        ha="center",
> +                        va="bottom",
> +                        fontsize=8,
> +                    )
> +
> +                ax.set_xlabel("Metrics")
> +                ax.set_ylabel("Value")
> +                ax.set_title(f"{config_key.upper()}")
> +                ax.set_xticks(x)
> +                ax.set_xticklabels(categories)
> +                ax.legend(loc="upper right", fontsize=8)
> +                ax.grid(True, alpha=0.3, axis="y")
> +            else:
> +                ax.text(
> +                    0.5,
> +                    0.5,
> +                    f"Insufficient data\nfor {config_key}",
> +                    ha="center",
> +                    va="center",
> +                    transform=ax.transAxes,
> +                )
> +                ax.set_title(f"{config_key.upper()}")
> +
> +        # Hide unused subplots
> +        for idx in range(n_configs, n_rows * n_cols):
> +            row = idx // n_cols
> +            col = idx % n_cols
> +            axes[row][col].set_visible(False)
> +
> +        plt.suptitle(
> +            "Performance Comparison Matrix: Baseline vs Development",
> +            fontsize=14,
> +            y=1.02,
> +        )
>  
>          output_file = os.path.join(
>              self.output_dir,
> @@ -898,6 +1477,149 @@ class ResultsAnalyzer:
>          )
>          plt.close()
>  
> +    def _plot_filesystem_comparison(self):
> +        """Plot node performance comparison chart"""
> +        if len(self.results_data) < 2:
> +            return
> +
> +        # Group results by node
> +        node_performance = {}
> +
> +        for result in self.results_data:
> +            hostname, is_dev = self._extract_node_info(result)
> +
> +            if hostname not in node_performance:
> +                node_performance[hostname] = {
> +                    "insert_rates": [],
> +                    "index_times": [],
> +                    "query_qps": [],
> +                    "is_dev": is_dev,
> +                }
> +
> +            # Collect metrics
> +            insert_perf = result.get("insert_performance", {})
> +            if insert_perf:
> +                node_performance[hostname]["insert_rates"].append(
> +                    insert_perf.get("vectors_per_second", 0)
> +                )
> +
> +            index_perf = result.get("index_performance", {})
> +            if index_perf:
> +                node_performance[hostname]["index_times"].append(
> +                    index_perf.get("creation_time_seconds", 0)
> +                )
> +
> +            # Get top-10 batch-1 query performance as representative
> +            query_perf = result.get("query_performance", {})
> +            if "topk_10" in query_perf and "batch_1" in query_perf["topk_10"]:
> +                qps = query_perf["topk_10"]["batch_1"].get("queries_per_second", 0)
> +                node_performance[hostname]["query_qps"].append(qps)
> +
> +        # Only create comparison if we have multiple nodes
> +        if len(node_performance) > 1:
> +            # Calculate averages
> +            node_metrics = {}
> +            for hostname, perf_data in node_performance.items():
> +                node_metrics[hostname] = {
> +                    "avg_insert_rate": (
> +                        np.mean(perf_data["insert_rates"])
> +                        if perf_data["insert_rates"]
> +                        else 0
> +                    ),
> +                    "avg_index_time": (
> +                        np.mean(perf_data["index_times"])
> +                        if perf_data["index_times"]
> +                        else 0
> +                    ),
> +                    "avg_query_qps": (
> +                        np.mean(perf_data["query_qps"]) if perf_data["query_qps"] else 0
> +                    ),
> +                    "is_dev": perf_data["is_dev"],
> +                }
> +
> +            # Create comparison bar chart with more space
> +            fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(24, 8))
> +
> +            # Sort nodes with baseline first
> +            sorted_nodes = sorted(
> +                node_metrics.items(), key=lambda x: (x[1]["is_dev"], x[0])
> +            )
> +            node_names = [hostname for hostname, _ in sorted_nodes]
> +
> +            # Use different colors for baseline vs dev
> +            colors = [
> +                "#4CAF50" if not node_metrics[hostname]["is_dev"] else "#2196F3"
> +                for hostname in node_names
> +            ]
> +
> +            # Add labels for clarity
> +            labels = [
> +                f"{hostname}\n({'Dev' if node_metrics[hostname]['is_dev'] else 'Baseline'})"
> +                for hostname in node_names
> +            ]
> +
> +            # Insert rate comparison
> +            insert_rates = [
> +                node_metrics[hostname]["avg_insert_rate"] for hostname in node_names
> +            ]
> +            bars1 = ax1.bar(labels, insert_rates, color=colors)
> +            ax1.set_title("Average Milvus Insert Rate by Node")
> +            ax1.set_ylabel("Vectors/Second")
> +            # Rotate labels for better readability
> +            ax1.set_xticklabels(labels, rotation=45, ha="right", fontsize=8)
> +
> +            # Index time comparison (lower is better)
> +            index_times = [
> +                node_metrics[hostname]["avg_index_time"] for hostname in node_names
> +            ]
> +            bars2 = ax2.bar(labels, index_times, color=colors)
> +            ax2.set_title("Average Milvus Index Time by Node")
> +            ax2.set_ylabel("Seconds (Lower is Better)")
> +            ax2.set_xticklabels(labels, rotation=45, ha="right", fontsize=8)
> +
> +            # Query QPS comparison
> +            query_qps = [
> +                node_metrics[hostname]["avg_query_qps"] for hostname in node_names
> +            ]
> +            bars3 = ax3.bar(labels, query_qps, color=colors)
> +            ax3.set_title("Average Milvus Query QPS by Node")
> +            ax3.set_ylabel("Queries/Second")
> +            ax3.set_xticklabels(labels, rotation=45, ha="right", fontsize=8)
> +
> +            # Add value labels on bars
> +            for bars, values in [
> +                (bars1, insert_rates),
> +                (bars2, index_times),
> +                (bars3, query_qps),
> +            ]:
> +                for bar, value in zip(bars, values):
> +                    height = bar.get_height()
> +                    ax = bar.axes
> +                    ax.text(
> +                        bar.get_x() + bar.get_width() / 2.0,
> +                        height + height * 0.01,
> +                        f"{value:.1f}",
> +                        ha="center",
> +                        va="bottom",
> +                        fontsize=10,
> +                    )
> +
> +            plt.suptitle(
> +                "Milvus Performance Comparison: Baseline vs Development Nodes",
> +                fontsize=16,
> +                y=1.02,
> +            )
> +            plt.tight_layout()
> +
> +            output_file = os.path.join(
> +                self.output_dir,
> +                f"filesystem_comparison.{self.config.get('graph_format', 'png')}",
> +            )
> +            plt.savefig(
> +                output_file, dpi=self.config.get("graph_dpi", 300), bbox_inches="tight"
> +            )
> +            plt.close()
> +
>      def analyze(self) -> bool:
>          """Run complete analysis"""
>          self.logger.info("Starting results analysis...")
> diff --git a/playbooks/roles/ai_collect_results/files/generate_better_graphs.py b/playbooks/roles/ai_collect_results/files/generate_better_graphs.py
> index 645bac9e..b3681ff9 100755
> --- a/playbooks/roles/ai_collect_results/files/generate_better_graphs.py
> +++ b/playbooks/roles/ai_collect_results/files/generate_better_graphs.py
> @@ -29,17 +29,18 @@ def extract_filesystem_from_filename(filename):
>          if "_" in node_name:
>              parts = node_name.split("_")
>              node_name = "_".join(parts[:-1])  # Remove last part (iteration)
> -        
> +
>          # Extract filesystem type from node name
>          if "-xfs-" in node_name:
>              return "xfs"
>          elif "-ext4-" in node_name:
> -            return "ext4"  
> +            return "ext4"
>          elif "-btrfs-" in node_name:
>              return "btrfs"
> -    
> +
>      return "unknown"
>  
> +
>  def extract_node_config_from_filename(filename):
>      """Extract detailed node configuration from filename"""
>      # Expected format: results_debian13-ai-xfs-4k-4ks_1.json
> @@ -50,14 +51,15 @@ def extract_node_config_from_filename(filename):
>          if "_" in node_name:
>              parts = node_name.split("_")
>              node_name = "_".join(parts[:-1])  # Remove last part (iteration)
> -        
> +
>          # Remove -dev suffix if present
>          node_name = node_name.replace("-dev", "")
> -        
> +
>          return node_name.replace("debian13-ai-", "")
> -    
> +
>      return "unknown"
>  
> +
>  def detect_filesystem():
>      """Detect the filesystem type of /data on test nodes"""
>      # This is now a fallback - we primarily use filename-based detection
> @@ -104,7 +106,7 @@ def load_results(results_dir):
>                  # Extract node type from filename
>                  filename = os.path.basename(json_file)
>                  data["filename"] = filename
> -                
> +
>                  # Extract filesystem type and config from filename
>                  data["filesystem"] = extract_filesystem_from_filename(filename)
>                  data["node_config"] = extract_node_config_from_filename(filename)
> diff --git a/playbooks/roles/ai_collect_results/files/generate_graphs.py b/playbooks/roles/ai_collect_results/files/generate_graphs.py
> index 53a835e2..fafc62bf 100755
> --- a/playbooks/roles/ai_collect_results/files/generate_graphs.py
> +++ b/playbooks/roles/ai_collect_results/files/generate_graphs.py
> @@ -9,7 +9,6 @@ import sys
>  import glob
>  import numpy as np
>  import matplotlib
> -
>  matplotlib.use("Agg")  # Use non-interactive backend
>  import matplotlib.pyplot as plt
>  from datetime import datetime
> @@ -17,68 +16,78 @@ from pathlib import Path
>  from collections import defaultdict
>  
>  
> +def _extract_filesystem_config(result):
> +    """Extract filesystem type and block size from result data.
> +    Returns (fs_type, block_size, config_key)"""
> +    filename = result.get("_file", "")
> +
> +    # Primary: Extract filesystem type from filename (more reliable than JSON)
> +    fs_type = "unknown"
> +    block_size = "default"
> +
> +    if "xfs" in filename:
> +        fs_type = "xfs"
> +        # Check larger sizes first to avoid substring matches
> +        if "64k" in filename and "64k-" in filename:
> +            block_size = "64k"
> +        elif "32k" in filename and "32k-" in filename:
> +            block_size = "32k"
> +        elif "16k" in filename and "16k-" in filename:
> +            block_size = "16k"
> +        elif "4k" in filename and "4k-" in filename:
> +            block_size = "4k"
> +    elif "ext4" in filename:
> +        fs_type = "ext4"
> +        if "4k" in filename and "4k-" in filename:
> +            block_size = "4k"
> +        elif "16k" in filename and "16k-" in filename:
> +            block_size = "16k"
> +    elif "btrfs" in filename:
> +        fs_type = "btrfs"
> +
> +    # Fallback: Check JSON data if filename parsing failed
> +    if fs_type == "unknown":
> +        fs_type = result.get("filesystem", "unknown")
> +
> +    # Create descriptive config key
> +    config_key = f"{fs_type}-{block_size}" if block_size != "default" else fs_type
> +    return fs_type, block_size, config_key
> +
> +
> +def _extract_node_info(result):
> +    """Extract node hostname and determine if it's a dev node.
> +    Returns (hostname, is_dev_node)"""
> +    # Get hostname from system_info (preferred) or fall back to filename
> +    system_info = result.get("system_info", {})
> +    hostname = system_info.get("hostname", "")
> +    
> +    # If no hostname in system_info, try extracting from filename
> +    if not hostname:
> +        filename = result.get("_file", "")
> +        # Remove results_ prefix and .json suffix
> +        hostname = filename.replace("results_", "").replace(".json", "")
> +        # Remove iteration number if present (_1, _2, etc.)
> +        if "_" in hostname and hostname.split("_")[-1].isdigit():
> +            hostname = "_".join(hostname.split("_")[:-1])
> +    
> +    # Determine if this is a dev node
> +    is_dev = hostname.endswith("-dev")
> +    
> +    return hostname, is_dev
> +
> +
>  def load_results(results_dir):
>      """Load all JSON result files from the directory"""
>      results = []
> -    json_files = glob.glob(os.path.join(results_dir, "*.json"))
> +    # Only load results_*.json files, not consolidated or other JSON files
> +    json_files = glob.glob(os.path.join(results_dir, "results_*.json"))
>  
>      for json_file in json_files:
>          try:
>              with open(json_file, "r") as f:
>                  data = json.load(f)
> -                # Extract filesystem info - prefer from JSON data over filename
> -                filename = os.path.basename(json_file)
> -                
> -                # First, try to get filesystem from the JSON data itself
> -                fs_type = data.get("filesystem", None)
> -                
> -                # If not in JSON, try to parse from filename (backwards compatibility)
> -                if not fs_type:
> -                    parts = filename.replace("results_", "").replace(".json", "").split("-")
> -                    
> -                    # Parse host info
> -                    if "debian13-ai-" in filename:
> -                        host_parts = (
> -                            filename.replace("results_debian13-ai-", "")
> -                            .replace("_1.json", "")
> -                            .replace("_2.json", "")
> -                            .replace("_3.json", "")
> -                            .split("-")
> -                        )
> -                        if "xfs" in host_parts[0]:
> -                            fs_type = "xfs"
> -                            # Extract block size (e.g., "4k", "16k", etc.)
> -                            block_size = host_parts[1] if len(host_parts) > 1 else "unknown"
> -                        elif "ext4" in host_parts[0]:
> -                            fs_type = "ext4"
> -                            block_size = host_parts[1] if len(host_parts) > 1 else "4k"
> -                        elif "btrfs" in host_parts[0]:
> -                            fs_type = "btrfs"
> -                            block_size = "default"
> -                        else:
> -                            fs_type = "unknown"
> -                            block_size = "unknown"
> -                    else:
> -                        fs_type = "unknown"
> -                        block_size = "unknown"
> -                else:
> -                    # If filesystem came from JSON, set appropriate block size
> -                    if fs_type == "btrfs":
> -                        block_size = "default"
> -                    elif fs_type in ["ext4", "xfs"]:
> -                        block_size = data.get("block_size", "4k")
> -                    else:
> -                        block_size = data.get("block_size", "default")
> -                
> -                is_dev = "dev" in filename
> -                
> -                # Use filesystem from JSON if available, otherwise use parsed value
> -                if "filesystem" not in data:
> -                    data["filesystem"] = fs_type
> -                data["block_size"] = block_size
> -                data["is_dev"] = is_dev
> -                data["filename"] = filename
> -
> +                # Add filename for filesystem detection
> +                data["_file"] = os.path.basename(json_file)
>                  results.append(data)
>          except Exception as e:
>              print(f"Error loading {json_file}: {e}")
> @@ -86,554 +95,243 @@ def load_results(results_dir):
>      return results
>  
>  
> -def create_filesystem_comparison_chart(results, output_dir):
> -    """Create a bar chart comparing performance across filesystems"""
> -    # Group by filesystem and baseline/dev
> -    fs_data = defaultdict(lambda: {"baseline": [], "dev": []})
> -
> -    for result in results:
> -        fs = result.get("filesystem", "unknown")
> -        category = "dev" if result.get("is_dev", False) else "baseline"
> -
> -        # Extract actual performance data from results
> -        if "insert_performance" in result:
> -            insert_qps = result["insert_performance"].get("vectors_per_second", 0)
> -        else:
> -            insert_qps = 0
> -        fs_data[fs][category].append(insert_qps)
> -
> -    # Prepare data for plotting
> -    filesystems = list(fs_data.keys())
> -    baseline_means = [
> -        np.mean(fs_data[fs]["baseline"]) if fs_data[fs]["baseline"] else 0
> -        for fs in filesystems
> -    ]
> -    dev_means = [
> -        np.mean(fs_data[fs]["dev"]) if fs_data[fs]["dev"] else 0 for fs in filesystems
> -    ]
> -
> -    x = np.arange(len(filesystems))
> -    width = 0.35
> -
> -    fig, ax = plt.subplots(figsize=(10, 6))
> -    baseline_bars = ax.bar(
> -        x - width / 2, baseline_means, width, label="Baseline", color="#1f77b4"
> -    )
> -    dev_bars = ax.bar(
> -        x + width / 2, dev_means, width, label="Development", color="#ff7f0e"
> -    )
> -
> -    ax.set_xlabel("Filesystem")
> -    ax.set_ylabel("Insert QPS")
> -    ax.set_title("Vector Database Performance by Filesystem")
> -    ax.set_xticks(x)
> -    ax.set_xticklabels(filesystems)
> -    ax.legend()
> -    ax.grid(True, alpha=0.3)
> -
> -    # Add value labels on bars
> -    for bars in [baseline_bars, dev_bars]:
> -        for bar in bars:
> -            height = bar.get_height()
> -            if height > 0:
> -                ax.annotate(
> -                    f"{height:.0f}",
> -                    xy=(bar.get_x() + bar.get_width() / 2, height),
> -                    xytext=(0, 3),
> -                    textcoords="offset points",
> -                    ha="center",
> -                    va="bottom",
> -                )
> -
> -    plt.tight_layout()
> -    plt.savefig(os.path.join(output_dir, "filesystem_comparison.png"), dpi=150)
> -    plt.close()
> -
> -
> -def create_block_size_analysis(results, output_dir):
> -    """Create analysis for different block sizes (XFS specific)"""
> -    # Filter XFS results
> -    xfs_results = [r for r in results if r.get("filesystem") == "xfs"]
> -
> -    if not xfs_results:
> +def create_simple_performance_trends(results, output_dir):
> +    """Create multi-node performance trends chart"""
> +    if not results:
>          return
>  
> -    # Group by block size
> -    block_size_data = defaultdict(lambda: {"baseline": [], "dev": []})
> -
> -    for result in xfs_results:
> -        block_size = result.get("block_size", "unknown")
> -        category = "dev" if result.get("is_dev", False) else "baseline"
> -        if "insert_performance" in result:
> -            insert_qps = result["insert_performance"].get("vectors_per_second", 0)
> -        else:
> -            insert_qps = 0
> -        block_size_data[block_size][category].append(insert_qps)
> -
> -    # Sort block sizes
> -    block_sizes = sorted(
> -        block_size_data.keys(),
> -        key=lambda x: (
> -            int(x.replace("k", "").replace("s", ""))
> -            if x not in ["unknown", "default"]
> -            else 0
> -        ),
> -    )
> -
> -    # Create grouped bar chart
> -    baseline_means = [
> -        (
> -            np.mean(block_size_data[bs]["baseline"])
> -            if block_size_data[bs]["baseline"]
> -            else 0
> -        )
> -        for bs in block_sizes
> -    ]
> -    dev_means = [
> -        np.mean(block_size_data[bs]["dev"]) if block_size_data[bs]["dev"] else 0
> -        for bs in block_sizes
> -    ]
> -
> -    x = np.arange(len(block_sizes))
> -    width = 0.35
> -
> -    fig, ax = plt.subplots(figsize=(12, 6))
> -    baseline_bars = ax.bar(
> -        x - width / 2, baseline_means, width, label="Baseline", color="#2ca02c"
> -    )
> -    dev_bars = ax.bar(
> -        x + width / 2, dev_means, width, label="Development", color="#d62728"
> -    )
> -
> -    ax.set_xlabel("Block Size")
> -    ax.set_ylabel("Insert QPS")
> -    ax.set_title("XFS Performance by Block Size")
> -    ax.set_xticks(x)
> -    ax.set_xticklabels(block_sizes)
> -    ax.legend()
> -    ax.grid(True, alpha=0.3)
> -
> -    # Add value labels
> -    for bars in [baseline_bars, dev_bars]:
> -        for bar in bars:
> -            height = bar.get_height()
> -            if height > 0:
> -                ax.annotate(
> -                    f"{height:.0f}",
> -                    xy=(bar.get_x() + bar.get_width() / 2, height),
> -                    xytext=(0, 3),
> -                    textcoords="offset points",
> -                    ha="center",
> -                    va="bottom",
> -                )
> -
> -    plt.tight_layout()
> -    plt.savefig(os.path.join(output_dir, "xfs_block_size_analysis.png"), dpi=150)
> -    plt.close()
> -
> -
> -def create_heatmap_analysis(results, output_dir):
> -    """Create a heatmap showing performance across all configurations"""
> -    # Group data by configuration and version
> -    config_data = defaultdict(
> -        lambda: {
> -            "baseline": {"insert": 0, "query": 0},
> -            "dev": {"insert": 0, "query": 0},
> -        }
> -    )
> +    # Group results by node
> +    node_performance = defaultdict(lambda: {
> +        "insert_rates": [],
> +        "insert_times": [],
> +        "iterations": [],
> +        "is_dev": False,
> +    })
>  
>      for result in results:
> -        fs = result.get("filesystem", "unknown")
> -        block_size = result.get("block_size", "default")
> -        config = f"{fs}-{block_size}"
> -        version = "dev" if result.get("is_dev", False) else "baseline"
> -
> -        # Get actual insert performance
> -        if "insert_performance" in result:
> -            insert_qps = result["insert_performance"].get("vectors_per_second", 0)
> -        else:
> -            insert_qps = 0
> -
> -        # Calculate average query QPS
> -        query_qps = 0
> -        if "query_performance" in result:
> -            qp = result["query_performance"]
> -            total_qps = 0
> -            count = 0
> -            for topk_key in ["topk_1", "topk_10", "topk_100"]:
> -                if topk_key in qp:
> -                    for batch_key in ["batch_1", "batch_10", "batch_100"]:
> -                        if batch_key in qp[topk_key]:
> -                            total_qps += qp[topk_key][batch_key].get(
> -                                "queries_per_second", 0
> -                            )
> -                            count += 1
> -            if count > 0:
> -                query_qps = total_qps / count
> -
> -        config_data[config][version]["insert"] = insert_qps
> -        config_data[config][version]["query"] = query_qps
> -
> -    # Sort configurations
> -    configs = sorted(config_data.keys())
> -
> -    # Prepare data for heatmap
> -    insert_baseline = [config_data[c]["baseline"]["insert"] for c in configs]
> -    insert_dev = [config_data[c]["dev"]["insert"] for c in configs]
> -    query_baseline = [config_data[c]["baseline"]["query"] for c in configs]
> -    query_dev = [config_data[c]["dev"]["query"] for c in configs]
> -
> -    # Create figure with custom heatmap
> -    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 8))
> -
> -    # Create data matrices
> -    insert_data = np.array([insert_baseline, insert_dev]).T
> -    query_data = np.array([query_baseline, query_dev]).T
> -
> -    # Insert QPS heatmap
> -    im1 = ax1.imshow(insert_data, cmap="YlOrRd", aspect="auto")
> -    ax1.set_xticks([0, 1])
> -    ax1.set_xticklabels(["Baseline", "Development"])
> -    ax1.set_yticks(range(len(configs)))
> -    ax1.set_yticklabels(configs)
> -    ax1.set_title("Insert Performance Heatmap")
> -    ax1.set_ylabel("Configuration")
> -
> -    # Add text annotations
> -    for i in range(len(configs)):
> -        for j in range(2):
> -            text = ax1.text(
> -                j,
> -                i,
> -                f"{int(insert_data[i, j])}",
> -                ha="center",
> -                va="center",
> -                color="black",
> -            )
> +        hostname, is_dev = _extract_node_info(result)
> +        
> +        if hostname not in node_performance:
> +            node_performance[hostname] = {
> +                "insert_rates": [],
> +                "insert_times": [],
> +                "iterations": [],
> +                "is_dev": is_dev,
> +            }
>  
> -    # Add colorbar
> -    cbar1 = plt.colorbar(im1, ax=ax1)
> -    cbar1.set_label("Insert QPS")
> -
> -    # Query QPS heatmap
> -    im2 = ax2.imshow(query_data, cmap="YlGnBu", aspect="auto")
> -    ax2.set_xticks([0, 1])
> -    ax2.set_xticklabels(["Baseline", "Development"])
> -    ax2.set_yticks(range(len(configs)))
> -    ax2.set_yticklabels(configs)
> -    ax2.set_title("Query Performance Heatmap")
> -
> -    # Add text annotations
> -    for i in range(len(configs)):
> -        for j in range(2):
> -            text = ax2.text(
> -                j,
> -                i,
> -                f"{int(query_data[i, j])}",
> -                ha="center",
> -                va="center",
> -                color="black",
> +        insert_perf = result.get("insert_performance", {})
> +        if insert_perf:
> +            node_performance[hostname]["insert_rates"].append(
> +                insert_perf.get("vectors_per_second", 0)
> +            )
> +            fs_performance[config_key]["insert_times"].append(
> +                insert_perf.get("total_time_seconds", 0)
> +            )
> +            fs_performance[config_key]["iterations"].append(
> +                len(fs_performance[config_key]["insert_rates"])
>              )
>  
> -    # Add colorbar
> -    cbar2 = plt.colorbar(im2, ax=ax2)
> -    cbar2.set_label("Query QPS")
> -
> -    plt.tight_layout()
> -    plt.savefig(os.path.join(output_dir, "performance_heatmap.png"), dpi=150)
> -    plt.close()
> -
> -
> -def create_performance_trends(results, output_dir):
> -    """Create line charts showing performance trends"""
> -    # Group by filesystem type
> -    fs_types = defaultdict(
> -        lambda: {
> -            "configs": [],
> -            "baseline_insert": [],
> -            "dev_insert": [],
> -            "baseline_query": [],
> -            "dev_query": [],
> -        }
> -    )
> -
> -    for result in results:
> -        fs = result.get("filesystem", "unknown")
> -        block_size = result.get("block_size", "default")
> -        config = f"{block_size}"
> -
> -        if config not in fs_types[fs]["configs"]:
> -            fs_types[fs]["configs"].append(config)
> -            fs_types[fs]["baseline_insert"].append(0)
> -            fs_types[fs]["dev_insert"].append(0)
> -            fs_types[fs]["baseline_query"].append(0)
> -            fs_types[fs]["dev_query"].append(0)
> -
> -        idx = fs_types[fs]["configs"].index(config)
> -
> -        # Calculate average query QPS from all test configurations
> -        query_qps = 0
> -        if "query_performance" in result:
> -            qp = result["query_performance"]
> -            total_qps = 0
> -            count = 0
> -            for topk_key in ["topk_1", "topk_10", "topk_100"]:
> -                if topk_key in qp:
> -                    for batch_key in ["batch_1", "batch_10", "batch_100"]:
> -                        if batch_key in qp[topk_key]:
> -                            total_qps += qp[topk_key][batch_key].get(
> -                                "queries_per_second", 0
> -                            )
> -                            count += 1
> -            if count > 0:
> -                query_qps = total_qps / count
> -
> -        if result.get("is_dev", False):
> -            if "insert_performance" in result:
> -                fs_types[fs]["dev_insert"][idx] = result["insert_performance"].get(
> -                    "vectors_per_second", 0
> -                )
> -            fs_types[fs]["dev_query"][idx] = query_qps
> -        else:
> -            if "insert_performance" in result:
> -                fs_types[fs]["baseline_insert"][idx] = result["insert_performance"].get(
> -                    "vectors_per_second", 0
> -                )
> -            fs_types[fs]["baseline_query"][idx] = query_qps
> -
> -    # Create separate plots for each filesystem
> -    for fs, data in fs_types.items():
> -        if not data["configs"]:
> -            continue
> -
> -        fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 10))
> -
> -        x = range(len(data["configs"]))
> -
> -        # Insert performance
> -        ax1.plot(
> -            x,
> -            data["baseline_insert"],
> -            "o-",
> -            label="Baseline",
> -            linewidth=2,
> -            markersize=8,
> -        )
> -        ax1.plot(
> -            x, data["dev_insert"], "s-", label="Development", linewidth=2, markersize=8
> -        )
> -        ax1.set_xlabel("Configuration")
> -        ax1.set_ylabel("Insert QPS")
> -        ax1.set_title(f"{fs.upper()} Insert Performance")
> -        ax1.set_xticks(x)
> -        ax1.set_xticklabels(data["configs"])
> -        ax1.legend()
> +    # Check if we have multi-filesystem data
> +    if len(fs_performance) > 1:
> +        # Multi-filesystem mode: separate lines for each filesystem
> +        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
> +        
> +        colors = ["b", "r", "g", "m", "c", "y", "k"]
> +        color_idx = 0
> +        
> +        for config_key, perf_data in fs_performance.items():
> +            if not perf_data["insert_rates"]:
> +                continue
> +                
> +            color = colors[color_idx % len(colors)]
> +            iterations = list(range(1, len(perf_data["insert_rates"]) + 1))
> +            
> +            # Plot insert rate  
> +            ax1.plot(
> +                iterations,
> +                perf_data["insert_rates"], 
> +                f"{color}-o",
> +                linewidth=2,
> +                markersize=6,
> +                label=config_key.upper(),
> +            )
> +            
> +            # Plot insert time
> +            ax2.plot(
> +                iterations,
> +                perf_data["insert_times"],
> +                f"{color}-o", 
> +                linewidth=2,
> +                markersize=6,
> +                label=config_key.upper(),
> +            )
> +            
> +            color_idx += 1
> +            
> +        ax1.set_xlabel("Iteration")
> +        ax1.set_ylabel("Vectors/Second")
> +        ax1.set_title("Milvus Insert Rate by Storage Filesystem")
>          ax1.grid(True, alpha=0.3)
> -
> -        # Query performance
> -        ax2.plot(
> -            x, data["baseline_query"], "o-", label="Baseline", linewidth=2, markersize=8
> -        )
> -        ax2.plot(
> -            x, data["dev_query"], "s-", label="Development", linewidth=2, markersize=8
> -        )
> -        ax2.set_xlabel("Configuration")
> -        ax2.set_ylabel("Query QPS")
> -        ax2.set_title(f"{fs.upper()} Query Performance")
> -        ax2.set_xticks(x)
> -        ax2.set_xticklabels(data["configs"])
> -        ax2.legend()
> +        ax1.legend()
> +        
> +        ax2.set_xlabel("Iteration")
> +        ax2.set_ylabel("Total Time (seconds)")
> +        ax2.set_title("Milvus Insert Time by Storage Filesystem")
>          ax2.grid(True, alpha=0.3)
> -
> -        plt.tight_layout()
> -        plt.savefig(os.path.join(output_dir, f"{fs}_performance_trends.png"), dpi=150)
> -        plt.close()
> +        ax2.legend()
> +    else:
> +        # Single filesystem mode: original behavior
> +        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
> +        
> +        # Extract insert data from single filesystem
> +        config_key = list(fs_performance.keys())[0] if fs_performance else None
> +        if config_key:
> +            perf_data = fs_performance[config_key]
> +            iterations = list(range(1, len(perf_data["insert_rates"]) + 1))
> +            
> +            # Plot insert rate
> +            ax1.plot(
> +                iterations,
> +                perf_data["insert_rates"],
> +                "b-o",
> +                linewidth=2,
> +                markersize=6,
> +            )
> +            ax1.set_xlabel("Iteration")
> +            ax1.set_ylabel("Vectors/Second") 
> +            ax1.set_title("Vector Insert Rate Performance")
> +            ax1.grid(True, alpha=0.3)
> +            
> +            # Plot insert time
> +            ax2.plot(
> +                iterations,
> +                perf_data["insert_times"],
> +                "r-o",
> +                linewidth=2,
> +                markersize=6,
> +            )
> +            ax2.set_xlabel("Iteration")
> +            ax2.set_ylabel("Total Time (seconds)")
> +            ax2.set_title("Vector Insert Time Performance") 
> +            ax2.grid(True, alpha=0.3)
> +            
> +    plt.tight_layout()
> +    plt.savefig(os.path.join(output_dir, "performance_trends.png"), dpi=150)
> +    plt.close()
>  
>  
> -def create_simple_performance_trends(results, output_dir):
> -    """Create a simple performance trends chart for basic Milvus testing"""
> +def create_heatmap_analysis(results, output_dir):
> +    """Create multi-filesystem heatmap showing query performance"""
>      if not results:
>          return
> -    
> -    # Separate baseline and dev results
> -    baseline_results = [r for r in results if not r.get("is_dev", False)]
> -    dev_results = [r for r in results if r.get("is_dev", False)]
> -    
> -    if not baseline_results and not dev_results:
> -        return
> -    
> -    fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 10))
> -    
> -    # Prepare data
> -    baseline_insert = []
> -    baseline_query = []
> -    dev_insert = []
> -    dev_query = []
> -    labels = []
> -    
> -    # Process baseline results
> -    for i, result in enumerate(baseline_results):
> -        if "insert_performance" in result:
> -            baseline_insert.append(result["insert_performance"].get("vectors_per_second", 0))
> -        else:
> -            baseline_insert.append(0)
> +
> +    # Group data by filesystem configuration
> +    fs_performance = defaultdict(lambda: {
> +        "query_data": [],
> +        "config_key": "",
> +    })
> +
> +    for result in results:
> +        fs_type, block_size, config_key = _extract_filesystem_config(result)
>          
> -        # Calculate average query QPS
> -        query_qps = 0
> -        if "query_performance" in result:
> -            qp = result["query_performance"]
> -            total_qps = 0
> -            count = 0
> -            for topk_key in ["topk_1", "topk_10", "topk_100"]:
> -                if topk_key in qp:
> -                    for batch_key in ["batch_1", "batch_10", "batch_100"]:
> -                        if batch_key in qp[topk_key]:
> -                            total_qps += qp[topk_key][batch_key].get("queries_per_second", 0)
> -                            count += 1
> -            if count > 0:
> -                query_qps = total_qps / count
> -        baseline_query.append(query_qps)
> -        labels.append(f"Run {i+1}")
> -    
> -    # Process dev results
> -    for result in dev_results:
> -        if "insert_performance" in result:
> -            dev_insert.append(result["insert_performance"].get("vectors_per_second", 0))
> -        else:
> -            dev_insert.append(0)
> +        query_perf = result.get("query_performance", {})
> +        for topk, topk_data in query_perf.items():
> +            for batch, batch_data in topk_data.items():
> +                qps = batch_data.get("queries_per_second", 0)
> +                fs_performance[config_key]["query_data"].append({
> +                    "topk": topk,
> +                    "batch": batch,
> +                    "qps": qps,
> +                })
> +                fs_performance[config_key]["config_key"] = config_key
> +
> +    # Check if we have multi-filesystem data
> +    if len(fs_performance) > 1:
> +        # Multi-filesystem mode: separate heatmaps for each filesystem
> +        num_fs = len(fs_performance)
> +        fig, axes = plt.subplots(1, num_fs, figsize=(5*num_fs, 6))
> +        if num_fs == 1:
> +            axes = [axes]
>          
> -        query_qps = 0
> -        if "query_performance" in result:
> -            qp = result["query_performance"]
> -            total_qps = 0
> -            count = 0
> -            for topk_key in ["topk_1", "topk_10", "topk_100"]:
> -                if topk_key in qp:
> -                    for batch_key in ["batch_1", "batch_10", "batch_100"]:
> -                        if batch_key in qp[topk_key]:
> -                            total_qps += qp[topk_key][batch_key].get("queries_per_second", 0)
> -                            count += 1
> -            if count > 0:
> -                query_qps = total_qps / count
> -        dev_query.append(query_qps)
> -    
> -    x = range(len(baseline_results) if baseline_results else len(dev_results))
> -    
> -    # Insert performance
> -    if baseline_insert:
> -        ax1.plot(x, baseline_insert, "o-", label="Baseline", linewidth=2, markersize=8)
> -    if dev_insert:
> -        ax1.plot(x[:len(dev_insert)], dev_insert, "s-", label="Development", linewidth=2, markersize=8)
> -    ax1.set_xlabel("Test Run")
> -    ax1.set_ylabel("Insert QPS")
> -    ax1.set_title("Milvus Insert Performance")
> -    ax1.set_xticks(x)
> -    ax1.set_xticklabels(labels if labels else [f"Run {i+1}" for i in x])
> -    ax1.legend()
> -    ax1.grid(True, alpha=0.3)
> -    
> -    # Query performance
> -    if baseline_query:
> -        ax2.plot(x, baseline_query, "o-", label="Baseline", linewidth=2, markersize=8)
> -    if dev_query:
> -        ax2.plot(x[:len(dev_query)], dev_query, "s-", label="Development", linewidth=2, markersize=8)
> -    ax2.set_xlabel("Test Run")
> -    ax2.set_ylabel("Query QPS")
> -    ax2.set_title("Milvus Query Performance")
> -    ax2.set_xticks(x)
> -    ax2.set_xticklabels(labels if labels else [f"Run {i+1}" for i in x])
> -    ax2.legend()
> -    ax2.grid(True, alpha=0.3)
> +        # Define common structure for consistency
> +        topk_order = ["topk_1", "topk_10", "topk_100"]
> +        batch_order = ["batch_1", "batch_10", "batch_100"]
> +        
> +        for idx, (config_key, perf_data) in enumerate(fs_performance.items()):
> +            # Create matrix for this filesystem
> +            matrix = np.zeros((len(topk_order), len(batch_order)))
> +            
> +            # Fill matrix with data
> +            query_dict = {}
> +            for item in perf_data["query_data"]:
> +                query_dict[(item["topk"], item["batch"])] = item["qps"]
> +                
> +            for i, topk in enumerate(topk_order):
> +                for j, batch in enumerate(batch_order):
> +                    matrix[i, j] = query_dict.get((topk, batch), 0)
> +            
> +            # Plot heatmap
> +            im = axes[idx].imshow(matrix, cmap='viridis', aspect='auto')
> +            axes[idx].set_title(f"{config_key.upper()} Query Performance")
> +            axes[idx].set_xticks(range(len(batch_order)))
> +            axes[idx].set_xticklabels([b.replace("batch_", "Batch ") for b in batch_order])
> +            axes[idx].set_yticks(range(len(topk_order)))
> +            axes[idx].set_yticklabels([t.replace("topk_", "Top-") for t in topk_order])
> +            
> +            # Add text annotations
> +            for i in range(len(topk_order)):
> +                for j in range(len(batch_order)):
> +                    axes[idx].text(j, i, f'{matrix[i, j]:.0f}',
> +                                 ha="center", va="center", color="white", fontweight="bold")
> +            
> +            # Add colorbar
> +            cbar = plt.colorbar(im, ax=axes[idx])
> +            cbar.set_label('Queries Per Second (QPS)')
> +    else:
> +        # Single filesystem mode
> +        fig, ax = plt.subplots(1, 1, figsize=(8, 6))
> +        
> +        if fs_performance:
> +            config_key = list(fs_performance.keys())[0]
> +            perf_data = fs_performance[config_key]
> +            
> +            # Create matrix
> +            topk_order = ["topk_1", "topk_10", "topk_100"]
> +            batch_order = ["batch_1", "batch_10", "batch_100"]
> +            matrix = np.zeros((len(topk_order), len(batch_order)))
> +            
> +            # Fill matrix with data
> +            query_dict = {}
> +            for item in perf_data["query_data"]:
> +                query_dict[(item["topk"], item["batch"])] = item["qps"]
> +                
> +            for i, topk in enumerate(topk_order):
> +                for j, batch in enumerate(batch_order):
> +                    matrix[i, j] = query_dict.get((topk, batch), 0)
> +            
> +            # Plot heatmap
> +            im = ax.imshow(matrix, cmap='viridis', aspect='auto')
> +            ax.set_title("Milvus Query Performance Heatmap")
> +            ax.set_xticks(range(len(batch_order)))
> +            ax.set_xticklabels([b.replace("batch_", "Batch ") for b in batch_order])
> +            ax.set_yticks(range(len(topk_order)))
> +            ax.set_yticklabels([t.replace("topk_", "Top-") for t in topk_order])
> +            
> +            # Add text annotations
> +            for i in range(len(topk_order)):
> +                for j in range(len(batch_order)):
> +                    ax.text(j, i, f'{matrix[i, j]:.0f}',
> +                           ha="center", va="center", color="white", fontweight="bold")
> +            
> +            # Add colorbar
> +            cbar = plt.colorbar(im, ax=ax)
> +            cbar.set_label('Queries Per Second (QPS)')
>      
>      plt.tight_layout()
> -    plt.savefig(os.path.join(output_dir, "performance_trends.png"), dpi=150)
> +    plt.savefig(os.path.join(output_dir, "performance_heatmap.png"), dpi=150, bbox_inches="tight")
>      plt.close()
>  
>  
> -def generate_summary_statistics(results, output_dir):
> -    """Generate summary statistics and save to JSON"""
> -    summary = {
> -        "total_tests": len(results),
> -        "filesystems_tested": list(
> -            set(r.get("filesystem", "unknown") for r in results)
> -        ),
> -        "configurations": {},
> -        "performance_summary": {
> -            "best_insert_qps": {"value": 0, "config": ""},
> -            "best_query_qps": {"value": 0, "config": ""},
> -            "average_insert_qps": 0,
> -            "average_query_qps": 0,
> -        },
> -    }
> -
> -    # Calculate statistics
> -    all_insert_qps = []
> -    all_query_qps = []
> -
> -    for result in results:
> -        fs = result.get("filesystem", "unknown")
> -        block_size = result.get("block_size", "default")
> -        is_dev = "dev" if result.get("is_dev", False) else "baseline"
> -        config_name = f"{fs}-{block_size}-{is_dev}"
> -
> -        # Get actual performance metrics
> -        if "insert_performance" in result:
> -            insert_qps = result["insert_performance"].get("vectors_per_second", 0)
> -        else:
> -            insert_qps = 0
> -
> -        # Calculate average query QPS
> -        query_qps = 0
> -        if "query_performance" in result:
> -            qp = result["query_performance"]
> -            total_qps = 0
> -            count = 0
> -            for topk_key in ["topk_1", "topk_10", "topk_100"]:
> -                if topk_key in qp:
> -                    for batch_key in ["batch_1", "batch_10", "batch_100"]:
> -                        if batch_key in qp[topk_key]:
> -                            total_qps += qp[topk_key][batch_key].get(
> -                                "queries_per_second", 0
> -                            )
> -                            count += 1
> -            if count > 0:
> -                query_qps = total_qps / count
> -
> -        all_insert_qps.append(insert_qps)
> -        all_query_qps.append(query_qps)
> -
> -        summary["configurations"][config_name] = {
> -            "insert_qps": insert_qps,
> -            "query_qps": query_qps,
> -            "host": result.get("host", "unknown"),
> -        }
> -
> -        if insert_qps > summary["performance_summary"]["best_insert_qps"]["value"]:
> -            summary["performance_summary"]["best_insert_qps"] = {
> -                "value": insert_qps,
> -                "config": config_name,
> -            }
> -
> -        if query_qps > summary["performance_summary"]["best_query_qps"]["value"]:
> -            summary["performance_summary"]["best_query_qps"] = {
> -                "value": query_qps,
> -                "config": config_name,
> -            }
> -
> -    summary["performance_summary"]["average_insert_qps"] = (
> -        np.mean(all_insert_qps) if all_insert_qps else 0
> -    )
> -    summary["performance_summary"]["average_query_qps"] = (
> -        np.mean(all_query_qps) if all_query_qps else 0
> -    )
> -
> -    # Save summary
> -    with open(os.path.join(output_dir, "summary.json"), "w") as f:
> -        json.dump(summary, f, indent=2)
> -
> -    return summary
> -
> -
>  def main():
>      if len(sys.argv) < 3:
>          print("Usage: generate_graphs.py <results_dir> <output_dir>")
> @@ -642,37 +340,23 @@ def main():
>      results_dir = sys.argv[1]
>      output_dir = sys.argv[2]
>  
> -    # Create output directory
> +    # Ensure output directory exists
>      os.makedirs(output_dir, exist_ok=True)
>  
>      # Load results
>      results = load_results(results_dir)
> -
>      if not results:
> -        print("No results found to analyze")
> +        print(f"No valid results found in {results_dir}")
>          sys.exit(1)
>  
>      print(f"Loaded {len(results)} result files")
>  
>      # Generate graphs
> -    print("Generating performance heatmap...")
> -    create_heatmap_analysis(results, output_dir)
> -
> -    print("Generating performance trends...")
>      create_simple_performance_trends(results, output_dir)
> +    create_heatmap_analysis(results, output_dir)
>  
> -    print("Generating summary statistics...")
> -    summary = generate_summary_statistics(results, output_dir)
> -
> -    print(f"\nAnalysis complete! Graphs saved to {output_dir}")
> -    print(f"Total configurations tested: {summary['total_tests']}")
> -    print(
> -        f"Best insert QPS: {summary['performance_summary']['best_insert_qps']['value']} ({summary['performance_summary']['best_insert_qps']['config']})"
> -    )
> -    print(
> -        f"Best query QPS: {summary['performance_summary']['best_query_qps']['value']} ({summary['performance_summary']['best_query_qps']['config']})"
> -    )
> +    print(f"Graphs generated in {output_dir}")
>  
>  
>  if __name__ == "__main__":
> -    main()
> +    main()
> \ No newline at end of file
> diff --git a/playbooks/roles/ai_collect_results/files/generate_html_report.py b/playbooks/roles/ai_collect_results/files/generate_html_report.py
> index a205577c..01ec734c 100755
> --- a/playbooks/roles/ai_collect_results/files/generate_html_report.py
> +++ b/playbooks/roles/ai_collect_results/files/generate_html_report.py
> @@ -69,6 +69,24 @@ HTML_TEMPLATE = """
>              color: #7f8c8d;
>              font-size: 0.9em;
>          }}
> +        .config-box {{
> +            background: #f8f9fa;
> +            border-left: 4px solid #3498db;
> +            padding: 15px;
> +            margin: 20px 0;
> +            border-radius: 4px;
> +        }}
> +        .config-box h3 {{
> +            margin-top: 0;
> +            color: #2c3e50;
> +        }}
> +        .config-box ul {{
> +            margin: 10px 0;
> +            padding-left: 20px;
> +        }}
> +        .config-box li {{
> +            margin: 5px 0;
> +        }}
>          .section {{
>              background: white;
>              padding: 30px;
> @@ -162,15 +180,16 @@ HTML_TEMPLATE = """
>  </head>
>  <body>
>      <div class="header">
> -        <h1>AI Vector Database Benchmark Results</h1>
> +        <h1>Milvus Vector Database Benchmark Results</h1>
>          <div class="subtitle">Generated on {timestamp}</div>
>      </div>
>      
>      <nav class="navigation">
>          <ul>
>              <li><a href="#summary">Summary</a></li>
> +            {filesystem_nav_items}
>              <li><a href="#performance-metrics">Performance Metrics</a></li>
> -            <li><a href="#performance-trends">Performance Trends</a></li>
> +            <li><a href="#performance-heatmap">Performance Heatmap</a></li>
>              <li><a href="#detailed-results">Detailed Results</a></li>
>          </ul>
>      </nav>
> @@ -192,34 +211,40 @@ HTML_TEMPLATE = """
>              <div class="label">{best_query_config}</div>
>          </div>
>          <div class="card">
> -            <h3>Test Runs</h3>
> -            <div class="value">{total_tests}</div>
> -            <div class="label">Benchmark Executions</div>
> +            <h3>{fourth_card_title}</h3>
> +            <div class="value">{fourth_card_value}</div>
> +            <div class="label">{fourth_card_label}</div>
>          </div>
>      </div>
>      
> -    <div id="performance-metrics" class="section">
> -        <h2>Performance Metrics</h2>
> -        <p>Key performance indicators for Milvus vector database operations.</p>
> +    {filesystem_comparison_section}
> +    
> +    {block_size_analysis_section}
> +    
> +    <div id="performance-heatmap" class="section">
> +        <h2>Performance Heatmap</h2>
> +        <p>Heatmap visualization showing performance metrics across all tested configurations.</p>
>          <div class="graph-container">
> -            <img src="graphs/performance_heatmap.png" alt="Performance Metrics">
> +            <img src="graphs/performance_heatmap.png" alt="Performance Heatmap">
>          </div>
>      </div>
>      
> -    <div id="performance-trends" class="section">
> -        <h2>Performance Trends</h2>
> -        <p>Performance comparison between baseline and development configurations.</p>
> -        <div class="graph-container">
> -            <img src="graphs/performance_trends.png" alt="Performance Trends">
> +    <div id="performance-metrics" class="section">
> +        <h2>Performance Metrics</h2>
> +        {config_summary}
> +        <div class="graph-grid">
> +            {performance_trend_graphs}
>          </div>
>      </div>
>      
>      <div id="detailed-results" class="section">
> -        <h2>Detailed Results Table</h2>
> +        <h2>Milvus Performance by Storage Filesystem</h2>
> +        <p>This table shows how Milvus vector database performs when its data is stored on different filesystem types and configurations.</p>
>          <table class="results-table">
>              <thead>
>                  <tr>
> -                    <th>Host</th>
> +                    <th>Filesystem</th>
> +                    <th>Configuration</th>
>                      <th>Type</th>
>                      <th>Insert QPS</th>
>                      <th>Query QPS</th>
> @@ -260,51 +285,77 @@ def load_results(results_dir):
>                  data = json.load(f)
>                  # Get filesystem from JSON data first, then fallback to filename parsing
>                  filename = os.path.basename(json_file)
> -                
> +
>                  # Skip results without valid performance data
>                  insert_perf = data.get("insert_performance", {})
>                  query_perf = data.get("query_performance", {})
>                  if not insert_perf or not query_perf:
>                      continue
> -                
> +
>                  # Get filesystem from JSON data
>                  fs_type = data.get("filesystem", None)
> -                
> -                # If not in JSON, try to parse from filename (backwards compatibility)
> -                if not fs_type and "debian13-ai" in filename:
> -                    host_parts = (
> -                        filename.replace("results_debian13-ai-", "")
> -                        .replace("_1.json", "")
> +
> +                # Always try to parse from filename first since JSON data might be wrong
> +                if "-ai-" in filename:
> +                    # Handle both debian13-ai- and prod-ai- prefixes
> +                    cleaned_filename = filename.replace("results_", "")
> +
> +                    # Extract the part after -ai-
> +                    if "debian13-ai-" in cleaned_filename:
> +                        host_part = cleaned_filename.replace("debian13-ai-", "")
> +                    elif "prod-ai-" in cleaned_filename:
> +                        host_part = cleaned_filename.replace("prod-ai-", "")
> +                    else:
> +                        # Generic extraction
> +                        ai_index = cleaned_filename.find("-ai-")
> +                        if ai_index != -1:
> +                            host_part = cleaned_filename[ai_index + 4 :]  # Skip "-ai-"
> +                        else:
> +                            host_part = cleaned_filename
> +
> +                    # Remove file extensions and dev suffix
> +                    host_part = (
> +                        host_part.replace("_1.json", "")
>                          .replace("_2.json", "")
>                          .replace("_3.json", "")
> -                        .split("-")
> +                        .replace("-dev", "")
>                      )
> -                    if "xfs" in host_parts[0]:
> +
> +                    # Parse filesystem type and block size
> +                    if host_part.startswith("xfs-"):
>                          fs_type = "xfs"
> -                        block_size = host_parts[1] if len(host_parts) > 1 else "4k"
> -                    elif "ext4" in host_parts[0]:
> +                        # Extract block size: xfs-4k-4ks -> 4k
> +                        parts = host_part.split("-")
> +                        if len(parts) >= 2:
> +                            block_size = parts[1]  # 4k, 16k, 32k, 64k
> +                        else:
> +                            block_size = "4k"
> +                    elif host_part.startswith("ext4-"):
>                          fs_type = "ext4"
> -                        block_size = host_parts[1] if len(host_parts) > 1 else "4k"
> -                    elif "btrfs" in host_parts[0]:
> +                        parts = host_part.split("-")
> +                        block_size = parts[1] if len(parts) > 1 else "4k"
> +                    elif host_part.startswith("btrfs"):
>                          fs_type = "btrfs"
>                          block_size = "default"
>                      else:
> -                        fs_type = "unknown"
> -                        block_size = "unknown"
> +                        # Fallback to JSON data if available
> +                        if not fs_type:
> +                            fs_type = "unknown"
> +                            block_size = "unknown"
>                  else:
>                      # Set appropriate block size based on filesystem
>                      if fs_type == "btrfs":
>                          block_size = "default"
>                      else:
>                          block_size = data.get("block_size", "default")
> -                
> +
>                  # Default to unknown if still not found
>                  if not fs_type:
>                      fs_type = "unknown"
>                      block_size = "unknown"
> -                
> +
>                  is_dev = "dev" in filename
> -                
> +
>                  # Calculate average QPS from query performance data
>                  query_qps = 0
>                  query_count = 0
> @@ -316,7 +367,7 @@ def load_results(results_dir):
>                              query_count += 1
>                  if query_count > 0:
>                      query_qps = query_qps / query_count
> -                
> +
>                  results.append(
>                      {
>                          "host": filename.replace("results_", "").replace(".json", ""),
> @@ -348,12 +399,36 @@ def generate_table_rows(results, best_configs):
>          if config_key in best_configs:
>              row_class += " best-config"
>  
> +        # Generate descriptive labels showing Milvus is running on this filesystem
> +        if result["filesystem"] == "xfs" and result["block_size"] != "default":
> +            storage_label = f"XFS {result['block_size'].upper()}"
> +            config_details = f"Block size: {result['block_size']}, Milvus data on XFS"
> +        elif result["filesystem"] == "ext4":
> +            storage_label = "EXT4"
> +            if "bigalloc" in result.get("host", "").lower():
> +                config_details = "EXT4 with bigalloc, Milvus data on ext4"
> +            else:
> +                config_details = (
> +                    f"Block size: {result['block_size']}, Milvus data on ext4"
> +                )
> +        elif result["filesystem"] == "btrfs":
> +            storage_label = "BTRFS"
> +            config_details = "Default Btrfs settings, Milvus data on Btrfs"
> +        else:
> +            storage_label = result["filesystem"].upper()
> +            config_details = f"Milvus data on {result['filesystem']}"
> +
> +        # Extract clean node identifier from hostname
> +        node_name = result["host"].replace("results_", "").replace(".json", "")
> +
>          row = f"""
>          <tr class="{row_class}">
> -            <td>{result['host']}</td>
> +            <td><strong>{storage_label}</strong></td>
> +            <td>{config_details}</td>
>              <td>{result['type']}</td>
>              <td>{result['insert_qps']:,}</td>
>              <td>{result['query_qps']:,}</td>
> +            <td><code>{node_name}</code></td>
>              <td>{result['timestamp']}</td>
>          </tr>
>          """
> @@ -362,10 +437,66 @@ def generate_table_rows(results, best_configs):
>      return "\n".join(rows)
>  
>  
> +def generate_config_summary(results_dir):
> +    """Generate configuration summary HTML from results"""
> +    # Try to load first result file to get configuration
> +    result_files = glob.glob(os.path.join(results_dir, "results_*.json"))
> +    if not result_files:
> +        return ""
> +
> +    try:
> +        with open(result_files[0], "r") as f:
> +            data = json.load(f)
> +            config = data.get("config", {})
> +
> +            # Format configuration details
> +            config_html = """
> +        <div class="config-box">
> +            <h3>Test Configuration</h3>
> +            <ul>
> +                <li><strong>Vector Dataset Size:</strong> {:,} vectors</li>
> +                <li><strong>Vector Dimensions:</strong> {}</li>
> +                <li><strong>Index Type:</strong> {} (M={}, ef_construction={}, ef={})</li>
> +                <li><strong>Benchmark Runtime:</strong> {} seconds</li>
> +                <li><strong>Batch Size:</strong> {:,}</li>
> +                <li><strong>Test Iterations:</strong> {} runs with identical configuration</li>
> +            </ul>
> +        </div>
> +            """.format(
> +                config.get("vector_dataset_size", "N/A"),
> +                config.get("vector_dimensions", "N/A"),
> +                config.get("index_type", "N/A"),
> +                config.get("index_hnsw_m", "N/A"),
> +                config.get("index_hnsw_ef_construction", "N/A"),
> +                config.get("index_hnsw_ef", "N/A"),
> +                config.get("benchmark_runtime", "N/A"),
> +                config.get("batch_size", "N/A"),
> +                len(result_files),
> +            )
> +            return config_html
> +    except Exception as e:
> +        print(f"Warning: Could not generate config summary: {e}")
> +        return ""
> +
> +
>  def find_performance_trend_graphs(graphs_dir):
> -    """Find performance trend graph"""
> -    # Not used in basic implementation since we embed the graph directly
> -    return ""
> +    """Find performance trend graphs"""
> +    graphs = []
> +    # Look for filesystem-specific graphs in multi-fs mode
> +    for fs in ["xfs", "ext4", "btrfs"]:
> +        graph_path = f"{fs}_performance_trends.png"
> +        if os.path.exists(os.path.join(graphs_dir, graph_path)):
> +            graphs.append(
> +                f'<div class="graph-container"><img src="graphs/{graph_path}" alt="{fs.upper()} Performance Trends"></div>'
> +            )
> +    # Fallback to simple performance trends for single mode
> +    if not graphs and os.path.exists(
> +        os.path.join(graphs_dir, "performance_trends.png")
> +    ):
> +        graphs.append(
> +            '<div class="graph-container"><img src="graphs/performance_trends.png" alt="Performance Trends"></div>'
> +        )
> +    return "\n".join(graphs)
>  
>  
>  def generate_html_report(results_dir, graphs_dir, output_path):
> @@ -393,6 +524,50 @@ def generate_html_report(results_dir, graphs_dir, output_path):
>      if summary["performance_summary"]["best_query_qps"]["config"]:
>          best_configs.add(summary["performance_summary"]["best_query_qps"]["config"])
>  
> +    # Check if multi-filesystem testing is enabled (more than one filesystem)
> +    filesystems_tested = summary.get("filesystems_tested", [])
> +    is_multifs_enabled = len(filesystems_tested) > 1
> +
> +    # Generate conditional sections based on multi-fs status
> +    if is_multifs_enabled:
> +        filesystem_nav_items = """
> +            <li><a href="#filesystem-comparison">Filesystem Comparison</a></li>
> +            <li><a href="#block-size-analysis">Block Size Analysis</a></li>"""
> +
> +        filesystem_comparison_section = """<div id="filesystem-comparison" class="section">
> +        <h2>Milvus Storage Filesystem Comparison</h2>
> +        <p>Comparison of Milvus vector database performance when its data is stored on different filesystem types (XFS, ext4, Btrfs) with various configurations.</p>
> +        <div class="graph-container">
> +            <img src="graphs/filesystem_comparison.png" alt="Filesystem Comparison">
> +        </div>
> +    </div>"""
> +
> +        block_size_analysis_section = """<div id="block-size-analysis" class="section">
> +        <h2>XFS Block Size Analysis</h2>
> +        <p>Performance analysis of XFS filesystem with different block sizes (4K, 16K, 32K, 64K).</p>
> +        <div class="graph-container">
> +            <img src="graphs/xfs_block_size_analysis.png" alt="XFS Block Size Analysis">
> +        </div>
> +    </div>"""
> +
> +        # Multi-fs mode: show filesystem info
> +        fourth_card_title = "Storage Filesystems"
> +        fourth_card_value = str(len(filesystems_tested))
> +        fourth_card_label = ", ".join(filesystems_tested).upper() + " for Milvus Data"
> +    else:
> +        # Single filesystem mode - hide multi-fs sections
> +        filesystem_nav_items = ""
> +        filesystem_comparison_section = ""
> +        block_size_analysis_section = ""
> +
> +        # Single mode: show test iterations
> +        fourth_card_title = "Test Iterations"
> +        fourth_card_value = str(summary["total_tests"])
> +        fourth_card_label = "Identical Configuration Runs"
> +
> +    # Generate configuration summary
> +    config_summary = generate_config_summary(results_dir)
> +
>      # Generate HTML
>      html_content = HTML_TEMPLATE.format(
>          timestamp=datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
> @@ -401,6 +576,14 @@ def generate_html_report(results_dir, graphs_dir, output_path):
>          best_insert_config=summary["performance_summary"]["best_insert_qps"]["config"],
>          best_query_qps=f"{summary['performance_summary']['best_query_qps']['value']:,}",
>          best_query_config=summary["performance_summary"]["best_query_qps"]["config"],
> +        fourth_card_title=fourth_card_title,
> +        fourth_card_value=fourth_card_value,
> +        fourth_card_label=fourth_card_label,
> +        filesystem_nav_items=filesystem_nav_items,
> +        filesystem_comparison_section=filesystem_comparison_section,
> +        block_size_analysis_section=block_size_analysis_section,
> +        config_summary=config_summary,
> +        performance_trend_graphs=find_performance_trend_graphs(graphs_dir),
>          table_rows=generate_table_rows(results, best_configs),
>      )
>  
> diff --git a/playbooks/roles/ai_collect_results/tasks/main.yml b/playbooks/roles/ai_collect_results/tasks/main.yml
> index 6a15d89c..9586890a 100644
> --- a/playbooks/roles/ai_collect_results/tasks/main.yml
> +++ b/playbooks/roles/ai_collect_results/tasks/main.yml
> @@ -134,13 +134,22 @@
>    ansible.builtin.command: >
>      python3 {{ local_scripts_dir }}/analyze_results.py
>      --results-dir {{ local_results_dir }}
> -    --output-dir {{ local_results_dir }}
> +    --output-dir {{ local_results_dir }}/graphs
>      {% if ai_benchmark_enable_graphing | bool %}--config {{ local_scripts_dir }}/analysis_config.json{% endif %}
>    register: analysis_result
>    run_once: true
>    delegate_to: localhost
>    when: collected_results.files is defined and collected_results.files | length > 0
>    tags: ['results', 'analysis']
> +  failed_when: analysis_result.rc != 0
> +
> +- name: Display analysis script output
> +  ansible.builtin.debug:
> +    var: analysis_result
> +  run_once: true
> +  delegate_to: localhost
> +  when: collected_results.files is defined and collected_results.files | length > 0
> +  tags: ['results', 'analysis']
>  
>  
>  - name: Create graphs directory
> @@ -155,35 +164,8 @@
>      - collected_results.files | length > 0
>    tags: ['results', 'graphs']
>  
> -- name: Generate performance graphs
> -  ansible.builtin.command: >
> -    python3 {{ local_scripts_dir }}/generate_better_graphs.py
> -    {{ local_results_dir }}
> -    {{ local_results_dir }}/graphs
> -  register: graph_generation_result
> -  failed_when: false
> -  run_once: true
> -  delegate_to: localhost
> -  when:
> -    - collected_results.files is defined
> -    - collected_results.files | length > 0
> -    - ai_benchmark_enable_graphing|bool
> -  tags: ['results', 'graphs']
> -
> -- name: Fallback to basic graphs if better graphs fail
> -  ansible.builtin.command: >
> -    python3 {{ local_scripts_dir }}/generate_graphs.py
> -    {{ local_results_dir }}
> -    {{ local_results_dir }}/graphs
> -  run_once: true
> -  delegate_to: localhost
> -  when:
> -    - collected_results.files is defined
> -    - collected_results.files | length > 0
> -    - ai_benchmark_enable_graphing|bool
> -    - graph_generation_result is defined
> -    - graph_generation_result.rc != 0
> -  tags: ['results', 'graphs']
> +# Graph generation is now handled by analyze_results.py above
> +# No separate graph generation step needed
>  
>  - name: Generate HTML report
>    ansible.builtin.command: >
> diff --git a/playbooks/roles/ai_collect_results/templates/analysis_config.json.j2 b/playbooks/roles/ai_collect_results/templates/analysis_config.json.j2
> index 5a879649..459cd602 100644
> --- a/playbooks/roles/ai_collect_results/templates/analysis_config.json.j2
> +++ b/playbooks/roles/ai_collect_results/templates/analysis_config.json.j2
> @@ -2,5 +2,5 @@
>    "enable_graphing": {{ ai_benchmark_enable_graphing|default(true)|lower }},
>    "graph_format": "{{ ai_benchmark_graph_format|default('png') }}",
>    "graph_dpi": {{ ai_benchmark_graph_dpi|default(150) }},
> -  "graph_theme": "{{ ai_benchmark_graph_theme|default('seaborn') }}"
> +  "graph_theme": "{{ ai_benchmark_graph_theme|default('default') }}"
>  }
> diff --git a/playbooks/roles/ai_milvus_storage/tasks/main.yml b/playbooks/roles/ai_milvus_storage/tasks/main.yml
> new file mode 100644
> index 00000000..f8e4ea63
> --- /dev/null
> +++ b/playbooks/roles/ai_milvus_storage/tasks/main.yml
> @@ -0,0 +1,161 @@
> +---
> +- name: Import optional extra_args file
> +  include_vars: "{{ item }}"
> +  ignore_errors: yes
> +  with_items:
> +    - "../extra_vars.yaml"
> +  tags: vars
> +
> +- name: Milvus storage setup
> +  when: ai_milvus_storage_enable|bool
> +  block:
> +    - name: Install filesystem utilities
> +      package:
> +        name:
> +          - xfsprogs
> +          - e2fsprogs
> +          - btrfs-progs
> +        state: present
> +      become: yes
> +      become_method: sudo
> +
> +    - name: Check if device exists
> +      stat:
> +        path: "{{ ai_milvus_device }}"
> +      register: milvus_device_stat
> +      failed_when: not milvus_device_stat.stat.exists
> +
> +    - name: Check if Milvus storage is already mounted
> +      command: mountpoint -q {{ ai_milvus_mount_point }}
> +      register: milvus_mount_check
> +      changed_when: false
> +      failed_when: false
> +
> +    - name: Setup Milvus storage filesystem
> +      when: milvus_mount_check.rc != 0
> +      block:
> +        - name: Create Milvus mount point directory
> +          file:
> +            path: "{{ ai_milvus_mount_point }}"
> +            state: directory
> +            mode: '0755'
> +          become: yes
> +          become_method: sudo
> +
> +        - name: Detect filesystem type from node name
> +          set_fact:
> +            detected_fstype: >-
> +              {%- if 'xfs' in inventory_hostname -%}
> +                xfs
> +              {%- elif 'ext4' in inventory_hostname -%}
> +                ext4
> +              {%- elif 'btrfs' in inventory_hostname -%}
> +                btrfs
> +              {%- else -%}
> +                {{ ai_milvus_fstype | default('xfs') }}
> +              {%- endif -%}
> +          when: ai_milvus_use_node_fs | default(false) | bool
> +
> +        - name: Detect XFS parameters from node name
> +          set_fact:
> +            milvus_xfs_blocksize: >-
> +              {%- if '64k' in inventory_hostname -%}
> +                65536
> +              {%- elif '32k' in inventory_hostname -%}
> +                32768
> +              {%- elif '16k' in inventory_hostname -%}
> +                16384
> +              {%- else -%}
> +                {{ ai_milvus_xfs_blocksize | default(4096) }}
> +              {%- endif -%}
> +            milvus_xfs_sectorsize: >-
> +              {%- if '4ks' in inventory_hostname -%}
> +                4096
> +              {%- elif '512s' in inventory_hostname -%}
> +                512
> +              {%- else -%}
> +                {{ ai_milvus_xfs_sectorsize | default(4096) }}
> +              {%- endif -%}
> +          when:
> +            - ai_milvus_use_node_fs | default(false) | bool
> +            - detected_fstype | default(ai_milvus_fstype) == 'xfs'
> +
> +        - name: Detect ext4 parameters from node name
> +          set_fact:
> +            milvus_ext4_opts: >-
> +              {%- if '16k' in inventory_hostname and 'bigalloc' in inventory_hostname -%}
> +                -F -b 4096 -C 16384 -O bigalloc
> +              {%- elif '4k' in inventory_hostname -%}
> +                -F -b 4096
> +              {%- else -%}
> +                {{ ai_milvus_ext4_mkfs_opts | default('-F') }}
> +              {%- endif -%}
> +          when:
> +            - ai_milvus_use_node_fs | default(false) | bool
> +            - detected_fstype | default(ai_milvus_fstype) == 'ext4'
> +
> +        - name: Set final filesystem type
> +          set_fact:
> +            milvus_fstype: "{{ detected_fstype | default(ai_milvus_fstype | default('xfs')) }}"
> +
> +        - name: Format device with XFS
> +          command: >
> +            mkfs.xfs -f
> +            -b size={{ milvus_xfs_blocksize | default(ai_milvus_xfs_blocksize | default(4096)) }}
> +            -s size={{ milvus_xfs_sectorsize | default(ai_milvus_xfs_sectorsize | default(4096)) }}
> +            {{ ai_milvus_xfs_mkfs_opts | default('') }}
> +            {{ ai_milvus_device }}
> +          when: milvus_fstype == "xfs"
> +          become: yes
> +          become_method: sudo
> +
> +        - name: Format device with Btrfs
> +          command: mkfs.btrfs {{ ai_milvus_btrfs_mkfs_opts | default('-f') }} {{ ai_milvus_device }}
> +          when: milvus_fstype == "btrfs"
> +          become: yes
> +          become_method: sudo
> +
> +        - name: Format device with ext4
> +          command: mkfs.ext4 {{ milvus_ext4_opts | default(ai_milvus_ext4_mkfs_opts | default('-F')) }} {{ ai_milvus_device }}
> +          when: milvus_fstype == "ext4"
> +          become: yes
> +          become_method: sudo
> +
> +        - name: Mount Milvus storage filesystem
> +          mount:
> +            path: "{{ ai_milvus_mount_point }}"
> +            src: "{{ ai_milvus_device }}"
> +            fstype: "{{ milvus_fstype }}"
> +            opts: defaults,noatime
> +            state: mounted
> +          become: yes
> +          become_method: sudo
> +
> +        - name: Add Milvus storage mount to fstab
> +          mount:
> +            path: "{{ ai_milvus_mount_point }}"
> +            src: "{{ ai_milvus_device }}"
> +            fstype: "{{ milvus_fstype }}"
> +            opts: defaults,noatime
> +            state: present
> +          become: yes
> +          become_method: sudo
> +
> +    - name: Ensure Milvus directories exist with proper permissions
> +      file:
> +        path: "{{ item }}"
> +        state: directory
> +        mode: '0755'
> +        owner: root
> +        group: root
> +      become: yes
> +      become_method: sudo
> +      loop:
> +        - "{{ ai_milvus_mount_point }}"
> +        - "{{ ai_milvus_mount_point }}/data"
> +        - "{{ ai_milvus_mount_point }}/etcd"
> +        - "{{ ai_milvus_mount_point }}/minio"
> +
> +    - name: Display Milvus storage setup complete
> +      debug:
> +        msg: "Milvus storage has been prepared at: {{ ai_milvus_mount_point }} with filesystem: {{ milvus_fstype | default(ai_milvus_fstype | default('xfs')) }}"
> diff --git a/playbooks/roles/ai_multifs_run/tasks/generate_comparison.yml b/playbooks/roles/ai_multifs_run/tasks/generate_comparison.yml
> new file mode 100644
> index 00000000..b4453b81
> --- /dev/null
> +++ b/playbooks/roles/ai_multifs_run/tasks/generate_comparison.yml
> @@ -0,0 +1,279 @@
> +---
> +- name: Create multi-filesystem comparison script
> +  copy:
> +    content: |
> +      #!/usr/bin/env python3
> +      """
> +      Multi-Filesystem AI Benchmark Comparison Report Generator
> +
> +      This script analyzes AI benchmark results across different filesystem
> +      configurations and generates a comprehensive comparison report.
> +      """
> +
> +      import json
> +      import glob
> +      import os
> +      import sys
> +      from datetime import datetime
> +      from typing import Dict, List, Any
> +
> +      def load_filesystem_results(results_dir: str) -> Dict[str, Any]:
> +          """Load results from all filesystem configurations"""
> +          fs_results = {}
> +
> +          # Find all filesystem configuration directories
> +          fs_dirs = [d for d in os.listdir(results_dir)
> +                    if os.path.isdir(os.path.join(results_dir, d)) and d != 'comparison']
> +
> +          for fs_name in fs_dirs:
> +              fs_path = os.path.join(results_dir, fs_name)
> +
> +              # Load configuration
> +              config_file = os.path.join(fs_path, 'filesystem_config.txt')
> +              config_info = {}
> +              if os.path.exists(config_file):
> +                  with open(config_file, 'r') as f:
> +                      config_info['config_text'] = f.read()
> +
> +              # Load benchmark results
> +              result_files = glob.glob(os.path.join(fs_path, 'results_*.json'))
> +              benchmark_results = []
> +
> +              for result_file in result_files:
> +                  try:
> +                      with open(result_file, 'r') as f:
> +                          data = json.load(f)
> +                          benchmark_results.append(data)
> +                  except Exception as e:
> +                      print(f"Error loading {result_file}: {e}")
> +
> +              fs_results[fs_name] = {
> +                  'config': config_info,
> +                  'results': benchmark_results,
> +                  'path': fs_path
> +              }
> +
> +          return fs_results
> +
> +      def generate_comparison_report(fs_results: Dict[str, Any], output_dir: str):
> +          """Generate HTML comparison report"""
> +          html = []
> +
> +          # HTML header
> +          html.append("<!DOCTYPE html>")
> +          html.append("<html lang='en'>")
> +          html.append("<head>")
> +          html.append("    <meta charset='UTF-8'>")
> +          html.append("    <title>AI Multi-Filesystem Benchmark Comparison</title>")
> +          html.append("    <style>")
> +          html.append("        body { font-family: Arial, sans-serif; margin: 20px; }")
> +          html.append("        .header { background-color: #f0f8ff; padding: 20px; border-radius: 5px; margin-bottom: 20px; }")
> +          html.append("        .fs-section { margin-bottom: 30px; border: 1px solid #ddd; padding: 15px; border-radius: 5px; }")
> +          html.append("        .comparison-table { width: 100%; border-collapse: collapse; margin: 20px 0; }")
> +          html.append("        .comparison-table th, .comparison-table td { border: 1px solid #ddd; padding: 8px; text-align: left; }")
> +          html.append("        .comparison-table th { background-color: #f2f2f2; }")
> +          html.append("        .metric-best { background-color: #d4edda; font-weight: bold; }")
> +          html.append("        .metric-worst { background-color: #f8d7da; }")
> +          html.append("        .chart-container { margin: 20px 0; padding: 15px; background-color: #f9f9f9; border-radius: 5px; }")
> +          html.append("    </style>")
> +          html.append("</head>")
> +          html.append("<body>")
> +
> +          # Report header
> +          html.append("    <div class='header'>")
> +          html.append("        <h1>🗂️ AI Multi-Filesystem Benchmark Comparison</h1>")
> +          html.append(f"        <p><strong>Generated:</strong> {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}</p>")
> +          html.append(f"        <p><strong>Filesystem Configurations Tested:</strong> {len(fs_results)}</p>")
> +          html.append("    </div>")
> +
> +          # Performance comparison table
> +          html.append("    <h2>📊 Performance Comparison Summary</h2>")
> +          html.append("    <table class='comparison-table'>")
> +          html.append("        <tr>")
> +          html.append("            <th>Filesystem</th>")
> +          html.append("            <th>Avg Insert Rate (vectors/sec)</th>")
> +          html.append("            <th>Avg Index Time (sec)</th>")
> +          html.append("            <th>Avg Query QPS (Top-10, Batch-1)</th>")
> +          html.append("            <th>Avg Query Latency (ms)</th>")
> +          html.append("        </tr>")
> +
> +          # Calculate metrics for comparison
> +          fs_metrics = {}
> +          for fs_name, fs_data in fs_results.items():
> +              if not fs_data['results']:
> +                  continue
> +
> +              # Calculate averages across all iterations
> +              insert_rates = []
> +              index_times = []
> +              query_qps = []
> +              query_latencies = []
> +
> +              for result in fs_data['results']:
> +                  if 'insert_performance' in result:
> +                      insert_rates.append(result['insert_performance'].get('vectors_per_second', 0))
> +
> +                  if 'index_performance' in result:
> +                      index_times.append(result['index_performance'].get('creation_time_seconds', 0))
> +
> +                  if 'query_performance' in result:
> +                      qp = result['query_performance']
> +                      if 'topk_10' in qp and 'batch_1' in qp['topk_10']:
> +                          batch_data = qp['topk_10']['batch_1']
> +                          query_qps.append(batch_data.get('queries_per_second', 0))
> +                          query_latencies.append(batch_data.get('average_time_seconds', 0) * 1000)
> +
> +              fs_metrics[fs_name] = {
> +                  'insert_rate': sum(insert_rates) / len(insert_rates) if insert_rates else 0,
> +                  'index_time': sum(index_times) / len(index_times) if index_times else 0,
> +                  'query_qps': sum(query_qps) / len(query_qps) if query_qps else 0,
> +                  'query_latency': sum(query_latencies) / len(query_latencies) if query_latencies else 0
> +              }
> +
> +          # Find best/worst for highlighting
> +          if fs_metrics:
> +              best_insert = max(fs_metrics.keys(), key=lambda x: fs_metrics[x]['insert_rate'])
> +              best_index = min(fs_metrics.keys(), key=lambda x: fs_metrics[x]['index_time'])
> +              best_qps = max(fs_metrics.keys(), key=lambda x: fs_metrics[x]['query_qps'])
> +              best_latency = min(fs_metrics.keys(), key=lambda x: fs_metrics[x]['query_latency'])
> +
> +              worst_insert = min(fs_metrics.keys(), key=lambda x: fs_metrics[x]['insert_rate'])
> +              worst_index = max(fs_metrics.keys(), key=lambda x: fs_metrics[x]['index_time'])
> +              worst_qps = min(fs_metrics.keys(), key=lambda x: fs_metrics[x]['query_qps'])
> +              worst_latency = max(fs_metrics.keys(), key=lambda x: fs_metrics[x]['query_latency'])
> +
> +          # Generate comparison rows
> +          for fs_name, metrics in fs_metrics.items():
> +              html.append("        <tr>")
> +              html.append(f"            <td><strong>{fs_name}</strong></td>")
> +
> +              # Insert rate
> +              cell_class = ""
> +              if fs_name == best_insert:
> +                  cell_class = "metric-best"
> +              elif fs_name == worst_insert:
> +                  cell_class = "metric-worst"
> +              html.append(f"            <td class='{cell_class}'>{metrics['insert_rate']:.2f}</td>")
> +
> +              # Index time
> +              cell_class = ""
> +              if fs_name == best_index:
> +                  cell_class = "metric-best"
> +              elif fs_name == worst_index:
> +                  cell_class = "metric-worst"
> +              html.append(f"            <td class='{cell_class}'>{metrics['index_time']:.2f}</td>")
> +
> +              # Query QPS
> +              cell_class = ""
> +              if fs_name == best_qps:
> +                  cell_class = "metric-best"
> +              elif fs_name == worst_qps:
> +                  cell_class = "metric-worst"
> +              html.append(f"            <td class='{cell_class}'>{metrics['query_qps']:.2f}</td>")
> +
> +              # Query latency
> +              cell_class = ""
> +              if fs_name == best_latency:
> +                  cell_class = "metric-best"
> +              elif fs_name == worst_latency:
> +                  cell_class = "metric-worst"
> +              html.append(f"            <td class='{cell_class}'>{metrics['query_latency']:.2f}</td>")
> +
> +              html.append("        </tr>")
> +
> +          html.append("    </table>")
> +
> +          # Individual filesystem details
> +          html.append("    <h2>📁 Individual Filesystem Details</h2>")
> +          for fs_name, fs_data in fs_results.items():
> +              html.append(f"    <div class='fs-section'>")
> +              html.append(f"        <h3>{fs_name}</h3>")
> +
> +              if 'config_text' in fs_data['config']:
> +                  html.append("        <h4>Configuration:</h4>")
> +                  html.append("        <pre>" + fs_data['config']['config_text'][:500] + "</pre>")
> +
> +              html.append(f"        <p><strong>Benchmark Iterations:</strong> {len(fs_data['results'])}</p>")
> +
> +              if fs_name in fs_metrics:
> +                  metrics = fs_metrics[fs_name]
> +                  html.append("        <table class='comparison-table'>")
> +                  html.append("            <tr><th>Metric</th><th>Value</th></tr>")
> +                  html.append(f"            <tr><td>Average Insert Rate</td><td>{metrics['insert_rate']:.2f} vectors/sec</td></tr>")
> +                  html.append(f"            <tr><td>Average Index Time</td><td>{metrics['index_time']:.2f} seconds</td></tr>")
> +                  html.append(f"            <tr><td>Average Query QPS</td><td>{metrics['query_qps']:.2f}</td></tr>")
> +                  html.append(f"            <tr><td>Average Query Latency</td><td>{metrics['query_latency']:.2f} ms</td></tr>")
> +                  html.append("        </table>")
> +
> +              html.append("    </div>")
> +
> +          # Footer
> +          html.append("    <div style='margin-top: 40px; padding: 20px; background-color: #f8f9fa; border-radius: 5px;'>")
> +          html.append("        <h3>📝 Analysis Notes</h3>")
> +          html.append("        <ul>")
> +          html.append("            <li>Green highlighting indicates the best performing filesystem for each metric</li>")
> +          html.append("            <li>Red highlighting indicates the worst performing filesystem for each metric</li>")
> +          html.append("            <li>Results are averaged across all benchmark iterations for each filesystem</li>")
> +          html.append("            <li>Performance can vary based on hardware, kernel version, and workload characteristics</li>")
> +          html.append("        </ul>")
> +          html.append("    </div>")
> +
> +          html.append("</body>")
> +          html.append("</html>")
> +
> +          # Write HTML report
> +          report_file = os.path.join(output_dir, "multi_filesystem_comparison.html")
> +          with open(report_file, 'w') as f:
> +              f.write("\n".join(html))
> +
> +          print(f"Multi-filesystem comparison report generated: {report_file}")
> +
> +          # Generate JSON summary
> +          summary_data = {
> +              'generation_time': datetime.now().isoformat(),
> +              'filesystem_count': len(fs_results),
> +              'metrics_summary': fs_metrics,
> +              'raw_results': {fs: data['results'] for fs, data in fs_results.items()}
> +          }
> +
> +          summary_file = os.path.join(output_dir, "multi_filesystem_summary.json")
> +          with open(summary_file, 'w') as f:
> +              json.dump(summary_data, f, indent=2)
> +
> +          print(f"Multi-filesystem summary data: {summary_file}")
> +
> +      def main():
> +          results_dir = "{{ ai_multifs_results_dir }}"
> +          comparison_dir = os.path.join(results_dir, "comparison")
> +          os.makedirs(comparison_dir, exist_ok=True)
> +
> +          print("Loading filesystem results...")
> +          fs_results = load_filesystem_results(results_dir)
> +
> +          if not fs_results:
> +              print("No filesystem results found!")
> +              return 1
> +
> +          print(f"Found results for {len(fs_results)} filesystem configurations")
> +          print("Generating comparison report...")
> +
> +          generate_comparison_report(fs_results, comparison_dir)
> +
> +          print("Multi-filesystem comparison completed!")
> +          return 0
> +
> +      if __name__ == "__main__":
> +          sys.exit(main())
> +    dest: "{{ ai_multifs_results_dir }}/generate_comparison.py"
> +    mode: '0755'
> +
> +- name: Run multi-filesystem comparison analysis
> +  command: python3 {{ ai_multifs_results_dir }}/generate_comparison.py
> +  register: comparison_result
> +
> +- name: Display comparison completion message
> +  debug:
> +    msg: |
> +      Multi-filesystem comparison completed!
> +      Comparison report: {{ ai_multifs_results_dir }}/comparison/multi_filesystem_comparison.html
> +      Summary data: {{ ai_multifs_results_dir }}/comparison/multi_filesystem_summary.json
> diff --git a/playbooks/roles/ai_multifs_run/tasks/main.yml b/playbooks/roles/ai_multifs_run/tasks/main.yml
> new file mode 100644
> index 00000000..38dbba12
> --- /dev/null
> +++ b/playbooks/roles/ai_multifs_run/tasks/main.yml
> @@ -0,0 +1,23 @@
> +---
> +- name: Import optional extra_args file
> +  include_vars: "{{ item }}"
> +  ignore_errors: yes
> +  with_items:
> +    - "../extra_vars.yaml"
> +  tags: vars
> +
> +- name: Filter enabled filesystem configurations
> +  set_fact:
> +    enabled_fs_configs: "{{ ai_multifs_configurations | selectattr('enabled', 'equalto', true) | list }}"
> +
> +- name: Run AI benchmarks on each filesystem configuration
> +  include_tasks: run_single_filesystem.yml
> +  loop: "{{ enabled_fs_configs }}"
> +  loop_control:
> +    loop_var: fs_config
> +    index_var: fs_index
> +  when: enabled_fs_configs | length > 0
> +
> +- name: Generate multi-filesystem comparison report
> +  include_tasks: generate_comparison.yml
> +  when: enabled_fs_configs | length > 1
> diff --git a/playbooks/roles/ai_multifs_run/tasks/run_single_filesystem.yml b/playbooks/roles/ai_multifs_run/tasks/run_single_filesystem.yml
> new file mode 100644
> index 00000000..fd194550
> --- /dev/null
> +++ b/playbooks/roles/ai_multifs_run/tasks/run_single_filesystem.yml
> @@ -0,0 +1,104 @@
> +---
> +- name: Display current filesystem configuration
> +  debug:
> +    msg: "Testing filesystem configuration {{ fs_index + 1 }}/{{ enabled_fs_configs | length }}: {{ fs_config.name }}"
> +
> +- name: Unmount filesystem if mounted
> +  mount:
> +    path: "{{ ai_multifs_mount_point }}"
> +    state: unmounted
> +  ignore_errors: yes
> +
> +- name: Create filesystem with specific configuration
> +  shell: "{{ fs_config.mkfs_cmd }} {{ ai_multifs_device }}"
> +  register: mkfs_result
> +
> +- name: Display mkfs output
> +  debug:
> +    msg: "mkfs output: {{ mkfs_result.stdout }}"
> +  when: mkfs_result.stdout != ""
> +
> +- name: Mount filesystem with specific options
> +  mount:
> +    path: "{{ ai_multifs_mount_point }}"
> +    src: "{{ ai_multifs_device }}"
> +    fstype: "{{ fs_config.filesystem }}"
> +    opts: "{{ fs_config.mount_opts }}"
> +    state: mounted
> +
> +- name: Create filesystem-specific results directory
> +  file:
> +    path: "{{ ai_multifs_results_dir }}/{{ fs_config.name }}"
> +    state: directory
> +    mode: '0755'
> +
> +- name: Update AI benchmark configuration for current filesystem
> +  set_fact:
> +    current_fs_benchmark_dir: "{{ ai_multifs_mount_point }}/ai-benchmark-data"
> +    current_fs_results_dir: "{{ ai_multifs_results_dir }}/{{ fs_config.name }}"
> +
> +- name: Create AI benchmark data directory on current filesystem
> +  file:
> +    path: "{{ current_fs_benchmark_dir }}"
> +    state: directory
> +    mode: '0755'
> +
> +- name: Generate AI benchmark configuration for current filesystem
> +  template:
> +    src: milvus_config.json.j2
> +    dest: "{{ current_fs_results_dir }}/milvus_config.json"
> +    mode: '0644'
> +
> +- name: Run AI benchmark on current filesystem
> +  shell: |
> +    cd {{ current_fs_benchmark_dir }}
> +    python3 {{ playbook_dir }}/roles/ai_run_benchmarks/files/milvus_benchmark.py \
> +      --config {{ current_fs_results_dir }}/milvus_config.json \
> +      --output {{ current_fs_results_dir }}/results_{{ fs_config.name }}_$(date +%Y%m%d_%H%M%S).json
> +  register: benchmark_result
> +  async: 7200  # 2 hour timeout
> +  poll: 30
> +
> +- name: Display benchmark completion
> +  debug:
> +    msg: "Benchmark completed for {{ fs_config.name }}: {{ benchmark_result.stdout_lines[-5:] | default(['No output']) }}"
> +
> +- name: Record filesystem configuration metadata
> +  copy:
> +    content: |
> +      # Filesystem Configuration: {{ fs_config.name }}
> +      Filesystem Type: {{ fs_config.filesystem }}
> +      mkfs Command: {{ fs_config.mkfs_cmd }}
> +      Mount Options: {{ fs_config.mount_opts }}
> +      Device: {{ ai_multifs_device }}
> +      Mount Point: {{ ai_multifs_mount_point }}
> +      Data Directory: {{ current_fs_benchmark_dir }}
> +      Results Directory: {{ current_fs_results_dir }}
> +      Test Start Time: {{ ansible_date_time.iso8601 }}
> +
> +      mkfs Output:
> +      {{ mkfs_result.stdout }}
> +      {{ mkfs_result.stderr }}
> +    dest: "{{ current_fs_results_dir }}/filesystem_config.txt"
> +    mode: '0644'
> +
> +- name: Capture filesystem statistics after benchmark
> +  shell: |
> +    echo "=== Filesystem Usage ===" > {{ current_fs_results_dir }}/filesystem_stats.txt
> +    df -h {{ ai_multifs_mount_point }} >> {{ current_fs_results_dir }}/filesystem_stats.txt
> +    echo "" >> {{ current_fs_results_dir }}/filesystem_stats.txt
> +    echo "=== Filesystem Info ===" >> {{ current_fs_results_dir }}/filesystem_stats.txt
> +    {% if fs_config.filesystem == 'xfs' %}
> +    xfs_info {{ ai_multifs_mount_point }} >> {{ current_fs_results_dir }}/filesystem_stats.txt 2>&1 || true
> +    {% elif fs_config.filesystem == 'ext4' %}
> +    tune2fs -l {{ ai_multifs_device }} >> {{ current_fs_results_dir }}/filesystem_stats.txt 2>&1 || true
> +    {% elif fs_config.filesystem == 'btrfs' %}
> +    btrfs filesystem show {{ ai_multifs_mount_point }} >> {{ current_fs_results_dir }}/filesystem_stats.txt 2>&1 || true
> +    btrfs filesystem usage {{ ai_multifs_mount_point }} >> {{ current_fs_results_dir }}/filesystem_stats.txt 2>&1 || true
> +    {% endif %}
> +  ignore_errors: yes
> +
> +- name: Unmount filesystem after benchmark
> +  mount:
> +    path: "{{ ai_multifs_mount_point }}"
> +    state: unmounted
> diff --git a/playbooks/roles/ai_multifs_run/templates/milvus_config.json.j2 b/playbooks/roles/ai_multifs_run/templates/milvus_config.json.j2
> new file mode 100644
> index 00000000..6216bf46
> --- /dev/null
> +++ b/playbooks/roles/ai_multifs_run/templates/milvus_config.json.j2
> @@ -0,0 +1,42 @@
> +{
> +    "milvus": {
> +        "host": "{{ ai_milvus_host }}",
> +        "port": {{ ai_milvus_port }},
> +        "database_name": "{{ ai_milvus_database_name }}_{{ fs_config.name }}"
> +    },
> +    "benchmark": {
> +        "vector_dataset_size": {{ ai_vector_dataset_size }},
> +        "vector_dimensions": {{ ai_vector_dimensions }},
> +        "index_type": "{{ ai_index_type }}",
> +        "iterations": {{ ai_benchmark_iterations }},
> +        "runtime_seconds": {{ ai_benchmark_runtime }},
> +        "warmup_seconds": {{ ai_benchmark_warmup_time }},
> +        "query_patterns": {
> +            "topk_1": {{ ai_benchmark_query_topk_1 | lower }},
> +            "topk_10": {{ ai_benchmark_query_topk_10 | lower }},
> +            "topk_100": {{ ai_benchmark_query_topk_100 | lower }}
> +        },
> +        "batch_sizes": {
> +            "batch_1": {{ ai_benchmark_batch_1 | lower }},
> +            "batch_10": {{ ai_benchmark_batch_10 | lower }},
> +            "batch_100": {{ ai_benchmark_batch_100 | lower }}
> +        }
> +    },
> +    "index_params": {
> +{% if ai_index_type == "HNSW" %}
> +        "M": {{ ai_index_hnsw_m }},
> +        "efConstruction": {{ ai_index_hnsw_ef_construction }},
> +        "ef": {{ ai_index_hnsw_ef }}
> +{% elif ai_index_type == "IVF_FLAT" %}
> +        "nlist": {{ ai_index_ivf_nlist }},
> +        "nprobe": {{ ai_index_ivf_nprobe }}
> +{% endif %}
> +    },
> +    "filesystem": {
> +        "name": "{{ fs_config.name }}",
> +        "type": "{{ fs_config.filesystem }}",
> +        "mkfs_cmd": "{{ fs_config.mkfs_cmd }}",
> +        "mount_opts": "{{ fs_config.mount_opts }}",
> +        "data_directory": "{{ current_fs_benchmark_dir }}"
> +    }
> +}
> diff --git a/playbooks/roles/ai_multifs_setup/defaults/main.yml b/playbooks/roles/ai_multifs_setup/defaults/main.yml
> new file mode 100644
> index 00000000..c35d179f
> --- /dev/null
> +++ b/playbooks/roles/ai_multifs_setup/defaults/main.yml
> @@ -0,0 +1,49 @@
> +---
> +# Default values for AI multi-filesystem testing
> +ai_multifs_results_dir: "/data/ai-multifs-benchmark"
> +ai_multifs_device: "/dev/vdb"
> +ai_multifs_mount_point: "/mnt/ai-multifs-test"
> +
> +# Filesystem configurations to test
> +ai_multifs_configurations:
> +  - name: "xfs_4k_4ks"
> +    filesystem: "xfs"
> +    mkfs_cmd: "mkfs.xfs -f -s size=4096 -b size=4096"
> +    mount_opts: "rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota"
> +    enabled: "{{ ai_multifs_test_xfs and ai_multifs_xfs_4k_4ks }}"
> +
> +  - name: "xfs_16k_4ks"
> +    filesystem: "xfs"
> +    mkfs_cmd: "mkfs.xfs -f -s size=4096 -b size=16384"
> +    mount_opts: "rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota"
> +    enabled: "{{ ai_multifs_test_xfs and ai_multifs_xfs_16k_4ks }}"
> +
> +  - name: "xfs_32k_4ks"
> +    filesystem: "xfs"
> +    mkfs_cmd: "mkfs.xfs -f -s size=4096 -b size=32768"
> +    mount_opts: "rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota"
> +    enabled: "{{ ai_multifs_test_xfs and ai_multifs_xfs_32k_4ks }}"
> +
> +  - name: "xfs_64k_4ks"
> +    filesystem: "xfs"
> +    mkfs_cmd: "mkfs.xfs -f -s size=4096 -b size=65536"
> +    mount_opts: "rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota"
> +    enabled: "{{ ai_multifs_test_xfs and ai_multifs_xfs_64k_4ks }}"
> +
> +  - name: "ext4_4k"
> +    filesystem: "ext4"
> +    mkfs_cmd: "mkfs.ext4 -F -b 4096"
> +    mount_opts: "rw,relatime,data=ordered"
> +    enabled: "{{ ai_multifs_test_ext4 and ai_multifs_ext4_4k }}"
> +
> +  - name: "ext4_16k_bigalloc"
> +    filesystem: "ext4"
> +    mkfs_cmd: "mkfs.ext4 -F -b 4096 -C 16384"
> +    mount_opts: "rw,relatime,data=ordered"
> +    enabled: "{{ ai_multifs_test_ext4 and ai_multifs_ext4_16k_bigalloc }}"
> +
> +  - name: "btrfs_default"
> +    filesystem: "btrfs"
> +    mkfs_cmd: "mkfs.btrfs -f"
> +    mount_opts: "rw,relatime,space_cache=v2,discard=async"
> +    enabled: "{{ ai_multifs_test_btrfs and ai_multifs_btrfs_default }}"
> diff --git a/playbooks/roles/ai_multifs_setup/tasks/main.yml b/playbooks/roles/ai_multifs_setup/tasks/main.yml
> new file mode 100644
> index 00000000..28f3ec40
> --- /dev/null
> +++ b/playbooks/roles/ai_multifs_setup/tasks/main.yml
> @@ -0,0 +1,70 @@
> +---
> +- name: Import optional extra_args file
> +  include_vars: "{{ item }}"
> +  ignore_errors: yes
> +  with_items:
> +    - "../extra_vars.yaml"
> +  tags: vars
> +
> +- name: Create multi-filesystem results directory
> +  file:
> +    path: "{{ ai_multifs_results_dir }}"
> +    state: directory
> +    mode: '0755'
> +
> +- name: Create mount point directory
> +  file:
> +    path: "{{ ai_multifs_mount_point }}"
> +    state: directory
> +    mode: '0755'
> +
> +- name: Unmount any existing filesystem on mount point
> +  mount:
> +    path: "{{ ai_multifs_mount_point }}"
> +    state: unmounted
> +  ignore_errors: yes
> +
> +- name: Install required filesystem utilities
> +  package:
> +    name:
> +      - xfsprogs
> +      - e2fsprogs
> +      - btrfs-progs
> +    state: present
> +
> +- name: Filter enabled filesystem configurations
> +  set_fact:
> +    enabled_fs_configs: "{{ ai_multifs_configurations | selectattr('enabled', 'equalto', true) | list }}"
> +
> +- name: Display enabled filesystem configurations
> +  debug:
> +    msg: "Will test {{ enabled_fs_configs | length }} filesystem configurations: {{ enabled_fs_configs | map(attribute='name') | list }}"
> +
> +- name: Validate that device exists
> +  stat:
> +    path: "{{ ai_multifs_device }}"
> +  register: device_stat
> +  failed_when: not device_stat.stat.exists
> +
> +- name: Display device information
> +  debug:
> +    msg: "Using device {{ ai_multifs_device }} for multi-filesystem testing"
> +
> +- name: Create filesystem configuration summary
> +  copy:
> +    content: |
> +      # AI Multi-Filesystem Testing Configuration
> +      Generated: {{ ansible_date_time.iso8601 }}
> +      Device: {{ ai_multifs_device }}
> +      Mount Point: {{ ai_multifs_mount_point }}
> +      Results Directory: {{ ai_multifs_results_dir }}
> +
> +      Enabled Filesystem Configurations:
> +      {% for config in enabled_fs_configs %}
> +      - {{ config.name }}:
> +          Filesystem: {{ config.filesystem }}
> +          mkfs command: {{ config.mkfs_cmd }}
> +          Mount options: {{ config.mount_opts }}
> +      {% endfor %}
> +    dest: "{{ ai_multifs_results_dir }}/test_configuration.txt"
> +    mode: '0644'
> diff --git a/playbooks/roles/ai_run_benchmarks/files/milvus_benchmark.py b/playbooks/roles/ai_run_benchmarks/files/milvus_benchmark.py
> index 4ce14fb7..2aaa54ba 100644
> --- a/playbooks/roles/ai_run_benchmarks/files/milvus_benchmark.py
> +++ b/playbooks/roles/ai_run_benchmarks/files/milvus_benchmark.py
> @@ -54,67 +54,83 @@ class MilvusBenchmark:
>          )
>          self.logger = logging.getLogger(__name__)
>  
> -    def get_filesystem_info(self, path: str = "/data") -> Dict[str, str]:
> +    def get_filesystem_info(self, path: str = "/data/milvus") -> Dict[str, str]:
>          """Detect filesystem type for the given path"""
> -        try:
> -            # Use df -T to get filesystem type
> -            result = subprocess.run(
> -                ["df", "-T", path], capture_output=True, text=True, check=True
> -            )
> -
> -            lines = result.stdout.strip().split("\n")
> -            if len(lines) >= 2:
> -                # Second line contains the filesystem info
> -                # Format: Filesystem Type 1K-blocks Used Available Use% Mounted on
> -                parts = lines[1].split()
> -                if len(parts) >= 2:
> -                    filesystem_type = parts[1]
> -                    mount_point = parts[-1] if len(parts) >= 7 else path
> +        # Try primary path first, fallback to /data for backwards compatibility
> +        paths_to_try = [path]
> +        if path != "/data" and not os.path.exists(path):
> +            paths_to_try.append("/data")
> +
> +        for check_path in paths_to_try:
> +            try:
> +                # Use df -T to get filesystem type
> +                result = subprocess.run(
> +                    ["df", "-T", check_path], capture_output=True, text=True, check=True
> +                )
> +
> +                lines = result.stdout.strip().split("\n")
> +                if len(lines) >= 2:
> +                    # Second line contains the filesystem info
> +                    # Format: Filesystem Type 1K-blocks Used Available Use% Mounted on
> +                    parts = lines[1].split()
> +                    if len(parts) >= 2:
> +                        filesystem_type = parts[1]
> +                        mount_point = parts[-1] if len(parts) >= 7 else check_path
> +
> +                        return {
> +                            "filesystem": filesystem_type,
> +                            "mount_point": mount_point,
> +                            "data_path": check_path,
> +                        }
> +            except subprocess.CalledProcessError as e:
> +                self.logger.warning(
> +                    f"Failed to detect filesystem for {check_path}: {e}"
> +                )
> +                continue
> +            except Exception as e:
> +                self.logger.warning(f"Error detecting filesystem for {check_path}: {e}")
> +                continue
>  
> +        # Fallback: try to detect from /proc/mounts
> +        for check_path in paths_to_try:
> +            try:
> +                with open("/proc/mounts", "r") as f:
> +                    mounts = f.readlines()
> +
> +                # Find the mount that contains our path
> +                best_match = ""
> +                best_fs = "unknown"
> +
> +                for line in mounts:
> +                    parts = line.strip().split()
> +                    if len(parts) >= 3:
> +                        mount_point = parts[1]
> +                        fs_type = parts[2]
> +
> +                        # Check if this mount point is a prefix of our path
> +                        if check_path.startswith(mount_point) and len(
> +                            mount_point
> +                        ) > len(best_match):
> +                            best_match = mount_point
> +                            best_fs = fs_type
> +
> +                if best_fs != "unknown":
>                      return {
> -                        "filesystem": filesystem_type,
> -                        "mount_point": mount_point,
> -                        "data_path": path,
> +                        "filesystem": best_fs,
> +                        "mount_point": best_match,
> +                        "data_path": check_path,
>                      }
> -        except subprocess.CalledProcessError as e:
> -            self.logger.warning(f"Failed to detect filesystem for {path}: {e}")
> -        except Exception as e:
> -            self.logger.warning(f"Error detecting filesystem for {path}: {e}")
>  
> -        # Fallback: try to detect from /proc/mounts
> -        try:
> -            with open("/proc/mounts", "r") as f:
> -                mounts = f.readlines()
> -
> -            # Find the mount that contains our path
> -            best_match = ""
> -            best_fs = "unknown"
> -
> -            for line in mounts:
> -                parts = line.strip().split()
> -                if len(parts) >= 3:
> -                    mount_point = parts[1]
> -                    fs_type = parts[2]
> -
> -                    # Check if this mount point is a prefix of our path
> -                    if path.startswith(mount_point) and len(mount_point) > len(
> -                        best_match
> -                    ):
> -                        best_match = mount_point
> -                        best_fs = fs_type
> -
> -            if best_fs != "unknown":
> -                return {
> -                    "filesystem": best_fs,
> -                    "mount_point": best_match,
> -                    "data_path": path,
> -                }
> -
> -        except Exception as e:
> -            self.logger.warning(f"Error reading /proc/mounts: {e}")
> +            except Exception as e:
> +                self.logger.warning(f"Error reading /proc/mounts for {check_path}: {e}")
> +                continue
>  
>          # Final fallback
> -        return {"filesystem": "unknown", "mount_point": "/", "data_path": path}
> +        return {
> +            "filesystem": "unknown",
> +            "mount_point": "/",
> +            "data_path": paths_to_try[0],
> +        }
>  
>      def connect_to_milvus(self) -> bool:
>          """Connect to Milvus server"""
> @@ -440,13 +456,47 @@ class MilvusBenchmark:
>          """Run complete benchmark suite"""
>          self.logger.info("Starting Milvus benchmark suite...")
>  
> -        # Detect filesystem information
> -        fs_info = self.get_filesystem_info("/data")
> +        # Detect filesystem information - Milvus data path first
> +        milvus_data_path = "/data/milvus"
> +        if os.path.exists(milvus_data_path):
> +            # Multi-fs mode: Milvus data is on dedicated filesystem
> +            fs_info = self.get_filesystem_info(milvus_data_path)
> +            self.logger.info(
> +                f"Multi-filesystem mode: Using {milvus_data_path} for filesystem detection"
> +            )
> +        else:
> +            # Single-fs mode: fallback to /data
> +            fs_info = self.get_filesystem_info("/data")
> +            self.logger.info(
> +                f"Single-filesystem mode: Using /data for filesystem detection"
> +            )
> +
>          self.results["system_info"] = fs_info
> +        
> +        # Add kernel version and hostname to system info
> +        try:
> +            import socket
> +            
> +            # Get hostname
> +            self.results["system_info"]["hostname"] = socket.gethostname()
> +            
> +            # Get kernel version using uname -r
> +            kernel_result = subprocess.run(['uname', '-r'], capture_output=True, text=True, check=True)
> +            self.results["system_info"]["kernel_version"] = kernel_result.stdout.strip()
> +            
> +            self.logger.info(
> +                f"System info: hostname={self.results['system_info']['hostname']}, "
> +                f"kernel={self.results['system_info']['kernel_version']}"
> +            )
> +        except Exception as e:
> +            self.logger.warning(f"Could not collect kernel info: {e}")
> +            self.results["system_info"]["kernel_version"] = "unknown"
> +            self.results["system_info"]["hostname"] = "unknown"
> +        
>          # Also add filesystem at top level for compatibility with existing graphs
>          self.results["filesystem"] = fs_info["filesystem"]
>          self.logger.info(
> -            f"Detected filesystem: {fs_info['filesystem']} at {fs_info['mount_point']}"
> +            f"Detected filesystem: {fs_info['filesystem']} at {fs_info['mount_point']} (data path: {fs_info['data_path']})"
>          )
>  
>          if not self.connect_to_milvus():
> diff --git a/playbooks/roles/gen_hosts/tasks/main.yml b/playbooks/roles/gen_hosts/tasks/main.yml
> index 4b35d9f6..d36790b0 100644
> --- a/playbooks/roles/gen_hosts/tasks/main.yml
> +++ b/playbooks/roles/gen_hosts/tasks/main.yml
> @@ -381,6 +381,25 @@
>      - workflows_reboot_limit
>      - ansible_hosts_template.stat.exists
>  
> +- name: Load AI nodes configuration for multi-filesystem setup
> +  include_vars:
> +    file: "{{ topdir_path }}/{{ kdevops_nodes }}"
> +    name: guestfs_nodes
> +  when:
> +    - kdevops_workflows_dedicated_workflow
> +    - kdevops_workflow_enable_ai
> +    - ai_enable_multifs_testing|default(false)|bool
> +    - ansible_hosts_template.stat.exists
> +
> +- name: Extract AI node names for multi-filesystem setup
> +  set_fact:
> +    all_generic_nodes: "{{ guestfs_nodes.guestfs_nodes | map(attribute='name') | list }}"
> +  when:
> +    - kdevops_workflows_dedicated_workflow
> +    - kdevops_workflow_enable_ai
> +    - ai_enable_multifs_testing|default(false)|bool
> +    - guestfs_nodes is defined
> +
>  - name: Generate the Ansible hosts file for a dedicated AI setup
>    tags: ['hosts']
>    ansible.builtin.template:
> diff --git a/playbooks/roles/gen_hosts/templates/fstests.j2 b/playbooks/roles/gen_hosts/templates/fstests.j2
> index ac086c6e..32d90abf 100644
> --- a/playbooks/roles/gen_hosts/templates/fstests.j2
> +++ b/playbooks/roles/gen_hosts/templates/fstests.j2
> @@ -70,6 +70,7 @@ ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
>  [krb5:vars]
>  ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
>  {% endif %}
> +{% if kdevops_enable_iscsi or kdevops_nfsd_enable or kdevops_smbd_enable or kdevops_krb5_enable %}
>  [service]
>  {% if kdevops_enable_iscsi %}
>  {{ kdevops_hosts_prefix }}-iscsi
> @@ -85,3 +86,4 @@ ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
>  {% endif %}
>  [service:vars]
>  ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
> +{% endif %}
> diff --git a/playbooks/roles/gen_hosts/templates/gitr.j2 b/playbooks/roles/gen_hosts/templates/gitr.j2
> index 7f9094d4..3f30a5fb 100644
> --- a/playbooks/roles/gen_hosts/templates/gitr.j2
> +++ b/playbooks/roles/gen_hosts/templates/gitr.j2
> @@ -38,6 +38,7 @@ ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
>  [nfsd:vars]
>  ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
>  {% endif %}
> +{% if kdevops_enable_iscsi or kdevops_nfsd_enable %}
>  [service]
>  {% if kdevops_enable_iscsi %}
>  {{ kdevops_hosts_prefix }}-iscsi
> @@ -47,3 +48,4 @@ ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
>  {% endif %}
>  [service:vars]
>  ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
> +{% endif %}
> diff --git a/playbooks/roles/gen_hosts/templates/hosts.j2 b/playbooks/roles/gen_hosts/templates/hosts.j2
> index cdcd1883..e9441605 100644
> --- a/playbooks/roles/gen_hosts/templates/hosts.j2
> +++ b/playbooks/roles/gen_hosts/templates/hosts.j2
> @@ -119,39 +119,30 @@ ansible_python_interpreter = "{{ kdevops_python_interpreter }}"
>  [ai:vars]
>  ansible_python_interpreter = "{{ kdevops_python_interpreter }}"
>  
> -{% set fs_configs = [] %}
> +{# Individual section groups for multi-filesystem testing #}
> +{% set section_names = [] %}
>  {% for node in all_generic_nodes %}
> -{% set node_parts = node.split('-') %}
> -{% if node_parts|length >= 3 %}
> -{% set fs_type = node_parts[2] %}
> -{% set fs_config = node_parts[3:] | select('ne', 'dev') | join('_') %}
> -{% set fs_group = fs_type + '_' + fs_config if fs_config else fs_type %}
> -{% if fs_group not in fs_configs %}
> -{% set _ = fs_configs.append(fs_group) %}
> +{% if not node.endswith('-dev') %}
> +{% set section = node.replace(kdevops_host_prefix + '-ai-', '') %}
> +{% if section != kdevops_host_prefix + '-ai' %}
> +{% if section_names.append(section) %}{% endif %}
>  {% endif %}
>  {% endif %}
>  {% endfor %}
>  
> -{% for fs_group in fs_configs %}
> -[ai_{{ fs_group }}]
> -{% for node in all_generic_nodes %}
> -{% set node_parts = node.split('-') %}
> -{% if node_parts|length >= 3 %}
> -{% set fs_type = node_parts[2] %}
> -{% set fs_config = node_parts[3:] | select('ne', 'dev') | join('_') %}
> -{% set node_fs_group = fs_type + '_' + fs_config if fs_config else fs_type %}
> -{% if node_fs_group == fs_group %}
> -{{ node }}
> -{% endif %}
> +{% for section in section_names %}
> +[ai_{{ section | replace('-', '_') }}]
> +{{ kdevops_host_prefix }}-ai-{{ section }}
> +{% if kdevops_baseline_and_dev %}
> +{{ kdevops_host_prefix }}-ai-{{ section }}-dev
>  {% endif %}
> -{% endfor %}
>  
> -[ai_{{ fs_group }}:vars]
> +[ai_{{ section | replace('-', '_') }}:vars]
>  ansible_python_interpreter = "{{ kdevops_python_interpreter }}"
>  
>  {% endfor %}
>  {% else %}
> -{# Single-node AI hosts #}
> +{# Single filesystem hosts (original behavior) #}
>  [all]
>  localhost ansible_connection=local
>  {{ kdevops_host_prefix }}-ai
> diff --git a/playbooks/roles/gen_hosts/templates/nfstest.j2 b/playbooks/roles/gen_hosts/templates/nfstest.j2
> index e427ac34..709d871d 100644
> --- a/playbooks/roles/gen_hosts/templates/nfstest.j2
> +++ b/playbooks/roles/gen_hosts/templates/nfstest.j2
> @@ -38,6 +38,7 @@ ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
>  [nfsd:vars]
>  ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
>  {% endif %}
> +{% if kdevops_enable_iscsi or kdevops_nfsd_enable %}
>  [service]
>  {% if kdevops_enable_iscsi %}
>  {{ kdevops_hosts_prefix }}-iscsi
> @@ -47,3 +48,4 @@ ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
>  {% endif %}
>  [service:vars]
>  ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
> +{% endif %}
> diff --git a/playbooks/roles/gen_hosts/templates/pynfs.j2 b/playbooks/roles/gen_hosts/templates/pynfs.j2
> index 85c87dae..55add4d1 100644
> --- a/playbooks/roles/gen_hosts/templates/pynfs.j2
> +++ b/playbooks/roles/gen_hosts/templates/pynfs.j2
> @@ -23,6 +23,7 @@ ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
>  {{ kdevops_hosts_prefix }}-nfsd
>  [nfsd:vars]
>  ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
> +{% if true %}
>  [service]
>  {% if kdevops_enable_iscsi %}
>  {{ kdevops_hosts_prefix }}-iscsi
> @@ -30,3 +31,4 @@ ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
>  {{ kdevops_hosts_prefix }}-nfsd
>  [service:vars]
>  ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
> +{% endif %}
> diff --git a/playbooks/roles/gen_nodes/tasks/main.yml b/playbooks/roles/gen_nodes/tasks/main.yml
> index d54977be..b294d294 100644
> --- a/playbooks/roles/gen_nodes/tasks/main.yml
> +++ b/playbooks/roles/gen_nodes/tasks/main.yml
> @@ -658,6 +658,7 @@
>      - kdevops_workflow_enable_ai
>      - ansible_nodes_template.stat.exists
>      - not kdevops_baseline_and_dev
> +    - not ai_enable_multifs_testing|default(false)|bool
>  
>  - name: Generate the AI kdevops nodes file with dev hosts using {{ kdevops_nodes_template }} as jinja2 source template
>    tags: ['hosts']
> @@ -675,6 +676,95 @@
>      - kdevops_workflow_enable_ai
>      - ansible_nodes_template.stat.exists
>      - kdevops_baseline_and_dev
> +    - not ai_enable_multifs_testing|default(false)|bool
> +
> +- name: Infer enabled AI multi-filesystem configurations
> +  vars:
> +    kdevops_config_data: "{{ lookup('file', topdir_path + '/.config') }}"
> +    # Find all enabled AI multifs configurations
> +    xfs_configs: >-
> +      {{
> +        kdevops_config_data | regex_findall('^CONFIG_AI_MULTIFS_XFS_(.*)=y$', multiline=True)
> +        | map('lower')
> +        | map('regex_replace', '_', '-')
> +        | map('regex_replace', '^', 'xfs-')
> +        | list
> +        if kdevops_config_data | regex_search('^CONFIG_AI_MULTIFS_TEST_XFS=y$', multiline=True)
> +        else []
> +      }}
> +    ext4_configs: >-
> +      {{
> +        kdevops_config_data | regex_findall('^CONFIG_AI_MULTIFS_EXT4_(.*)=y$', multiline=True)
> +        | map('lower')
> +        | map('regex_replace', '_', '-')
> +        | map('regex_replace', '^', 'ext4-')
> +        | list
> +        if kdevops_config_data | regex_search('^CONFIG_AI_MULTIFS_TEST_EXT4=y$', multiline=True)
> +        else []
> +      }}
> +    btrfs_configs: >-
> +      {{
> +        kdevops_config_data | regex_findall('^CONFIG_AI_MULTIFS_BTRFS_(.*)=y$', multiline=True)
> +        | map('lower')
> +        | map('regex_replace', '_', '-')
> +        | map('regex_replace', '^', 'btrfs-')
> +        | list
> +        if kdevops_config_data | regex_search('^CONFIG_AI_MULTIFS_TEST_BTRFS=y$', multiline=True)
> +        else []
> +      }}
> +  set_fact:
> +    ai_multifs_enabled_configs: "{{ (xfs_configs + ext4_configs + btrfs_configs) | unique }}"
> +  when:
> +    - kdevops_workflows_dedicated_workflow
> +    - kdevops_workflow_enable_ai
> +    - ai_enable_multifs_testing|default(false)|bool
> +    - ansible_nodes_template.stat.exists
> +
> +- name: Create AI nodes for each filesystem configuration (no dev)
> +  vars:
> +    filesystem_nodes: "{{ [kdevops_host_prefix + '-ai-'] | product(ai_multifs_enabled_configs | default([])) | map('join') | list }}"
> +  set_fact:
> +    ai_enabled_section_types: "{{ filesystem_nodes }}"
> +  when:
> +    - kdevops_workflows_dedicated_workflow
> +    - kdevops_workflow_enable_ai
> +    - ai_enable_multifs_testing|default(false)|bool
> +    - ansible_nodes_template.stat.exists
> +    - not kdevops_baseline_and_dev
> +    - ai_multifs_enabled_configs is defined
> +    - ai_multifs_enabled_configs | length > 0
> +
> +- name: Create AI nodes for each filesystem configuration with dev hosts
> +  vars:
> +    filesystem_nodes: "{{ [kdevops_host_prefix + '-ai-'] | product(ai_multifs_enabled_configs | default([])) | map('join') | list }}"
> +  set_fact:
> +    ai_enabled_section_types: "{{ filesystem_nodes | product(['', '-dev']) | map('join') | list }}"
> +  when:
> +    - kdevops_workflows_dedicated_workflow
> +    - kdevops_workflow_enable_ai
> +    - ai_enable_multifs_testing|default(false)|bool
> +    - ansible_nodes_template.stat.exists
> +    - kdevops_baseline_and_dev
> +    - ai_multifs_enabled_configs is defined
> +    - ai_multifs_enabled_configs | length > 0
> +
> +- name: Generate the AI multi-filesystem kdevops nodes file using {{ kdevops_nodes_template }} as jinja2 source template
> +  tags: [ 'hosts' ]
> +  vars:
> +    node_template: "{{ kdevops_nodes_template | basename }}"
> +    nodes: "{{ ai_enabled_section_types | regex_replace('\\[') | regex_replace('\\]') | replace(\"'\", '') | split(', ') }}"
> +    all_generic_nodes: "{{ ai_enabled_section_types }}"
> +  template:
> +    src: "{{ node_template }}"
> +    dest: "{{ topdir_path }}/{{ kdevops_nodes }}"
> +    force: yes
> +  when:
> +    - kdevops_workflows_dedicated_workflow
> +    - kdevops_workflow_enable_ai
> +    - ai_enable_multifs_testing|default(false)|bool
> +    - ansible_nodes_template.stat.exists
> +    - ai_enabled_section_types is defined
> +    - ai_enabled_section_types | length > 0
>  
>  - name: Get the control host's timezone
>    ansible.builtin.command: "timedatectl show -p Timezone --value"
> diff --git a/playbooks/roles/guestfs/tasks/bringup/main.yml b/playbooks/roles/guestfs/tasks/bringup/main.yml
> index c131de25..bd9f5260 100644
> --- a/playbooks/roles/guestfs/tasks/bringup/main.yml
> +++ b/playbooks/roles/guestfs/tasks/bringup/main.yml
> @@ -1,11 +1,16 @@
>  ---
>  - name: List defined libvirt guests
>    run_once: true
> +  delegate_to: localhost
>    community.libvirt.virt:
>      command: list_vms
>      uri: "{{ libvirt_uri }}"
>    register: defined_vms
>  
> +- name: Debug defined VMs
> +  debug:
> +    msg: "Hostname: {{ inventory_hostname }}, Defined VMs: {{ hostvars['localhost']['defined_vms']['list_vms'] | default([]) }}, Check: {{ inventory_hostname not in (hostvars['localhost']['defined_vms']['list_vms'] | default([])) }}"
> +
>  - name: Provision each target node
>    when:
>      - "inventory_hostname not in defined_vms.list_vms"
> @@ -25,10 +30,13 @@
>              path: "{{ ssh_key_dir }}"
>              state: directory
>              mode: "u=rwx"
> +          delegate_to: localhost
>  
>          - name: Generate fresh keys for each target node
>            ansible.builtin.command:
>              cmd: 'ssh-keygen -q -t ed25519 -f {{ ssh_key }} -N ""'
> +            creates: "{{ ssh_key }}"
> +          delegate_to: localhost
>  
>      - name: Set the pathname of the root disk image for each target node
>        ansible.builtin.set_fact:
> @@ -38,15 +46,18 @@
>        ansible.builtin.file:
>          path: "{{ storagedir }}/{{ inventory_hostname }}"
>          state: directory
> +      delegate_to: localhost
>  
>      - name: Duplicate the root disk image for each target node
>        ansible.builtin.command:
>          cmd: "cp --reflink=auto {{ base_image }} {{ root_image }}"
> +      delegate_to: localhost
>  
>      - name: Get the timezone of the control host
>        ansible.builtin.command:
>          cmd: "timedatectl show -p Timezone --value"
>        register: host_timezone
> +      delegate_to: localhost
>  
>      - name: Build the root image for each target node (as root)
>        become: true
> @@ -103,6 +114,7 @@
>          name: "{{ inventory_hostname }}"
>          xml: "{{ lookup('file', xml_file) }}"
>          uri: "{{ libvirt_uri }}"
> +      delegate_to: localhost
>  
>      - name: Find PCIe passthrough devices
>        ansible.builtin.find:
> @@ -110,6 +122,7 @@
>          file_type: file
>          patterns: "pcie_passthrough_*.xml"
>        register: passthrough_devices
> +      delegate_to: localhost
>  
>      - name: Attach PCIe passthrough devices to each target node
>        environment:
> @@ -124,6 +137,7 @@
>        loop: "{{ passthrough_devices.files }}"
>        loop_control:
>          label: "Doing PCI-E passthrough for device {{ item }}"
> +      delegate_to: localhost
>        when:
>          - passthrough_devices.matched > 0
>  
> @@ -142,3 +156,4 @@
>      name: "{{ inventory_hostname }}"
>      uri: "{{ libvirt_uri }}"
>      state: running
> +  delegate_to: localhost
> diff --git a/scripts/guestfs.Makefile b/scripts/guestfs.Makefile
> index bd03f58c..f6c350a4 100644
> --- a/scripts/guestfs.Makefile
> +++ b/scripts/guestfs.Makefile
> @@ -79,7 +79,7 @@ bringup_guestfs: $(GUESTFS_BRINGUP_DEPS)
>  		--extra-vars=@./extra_vars.yaml \
>  		--tags network,pool,base_image
>  	$(Q)ansible-playbook $(ANSIBLE_VERBOSE) \
> -		--limit 'baseline:dev:service' \
> +		--limit 'baseline:dev:service:ai' \
>  		playbooks/guestfs.yml \
>  		--extra-vars=@./extra_vars.yaml \
>  		--tags bringup
> diff --git a/workflows/ai/Kconfig b/workflows/ai/Kconfig
> index 2ffc6b65..d04570d8 100644
> --- a/workflows/ai/Kconfig
> +++ b/workflows/ai/Kconfig
> @@ -161,4 +161,17 @@ config AI_BENCHMARK_ITERATIONS
>  # Docker storage configuration
>  source "workflows/ai/Kconfig.docker-storage"
>  
> +# Multi-filesystem configuration
> +config AI_MULTIFS_ENABLE
> +	bool "Enable multi-filesystem benchmarking"
> +	output yaml
> +	default n
> +	help
> +	  Run AI benchmarks across multiple filesystem configurations
> +	  to compare performance characteristics.
> +
> +if AI_MULTIFS_ENABLE
> +source "workflows/ai/Kconfig.multifs"
> +endif
> +
>  endif # KDEVOPS_WORKFLOW_ENABLE_AI
> diff --git a/workflows/ai/Kconfig.fs b/workflows/ai/Kconfig.fs
> new file mode 100644
> index 00000000..a95d02c6
> --- /dev/null
> +++ b/workflows/ai/Kconfig.fs
> @@ -0,0 +1,118 @@
> +menu "Target filesystem to use"
> +
> +choice
> +	prompt "Target filesystem"
> +	default AI_FILESYSTEM_XFS
> +
> +config AI_FILESYSTEM_XFS
> +	bool "xfs"
> +	select HAVE_SUPPORTS_PURE_IOMAP if BOOTLINUX_TREE_LINUS || BOOTLINUX_TREE_STABLE
> +	help
> +	  This will target testing AI workloads on top of XFS.
> +	  XFS provides excellent performance for large datasets
> +	  and is commonly used in high-performance computing.
> +
> +config AI_FILESYSTEM_BTRFS
> +	bool "btrfs"
> +	help
> +	  This will target testing AI workloads on top of btrfs.
> +	  Btrfs provides features like snapshots and compression
> +	  which can be useful for AI dataset management.
> +
> +config AI_FILESYSTEM_EXT4
> +	bool "ext4"
> +	help
> +	  This will target testing AI workloads on top of ext4.
> +	  Ext4 is widely supported and provides reliable performance
> +	  for AI workloads.
> +
> +endchoice
> +
> +config AI_FILESYSTEM
> +	string
> +	output yaml
> +	default "xfs" if AI_FILESYSTEM_XFS
> +	default "btrfs" if AI_FILESYSTEM_BTRFS
> +	default "ext4" if AI_FILESYSTEM_EXT4
> +
> +config AI_FSTYPE
> +	string
> +	output yaml
> +	default "xfs" if AI_FILESYSTEM_XFS
> +	default "btrfs" if AI_FILESYSTEM_BTRFS
> +	default "ext4" if AI_FILESYSTEM_EXT4
> +
> +if AI_FILESYSTEM_XFS
> +
> +menu "XFS configuration"
> +
> +config AI_XFS_MKFS_OPTS
> +	string "mkfs.xfs options"
> +	output yaml
> +	default "-f -s size=4096"
> +	help
> +	  Additional options to pass to mkfs.xfs when creating
> +	  the filesystem for AI workloads.
> +
> +config AI_XFS_MOUNT_OPTS
> +	string "XFS mount options"
> +	output yaml
> +	default "rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota"
> +	help
> +	  Mount options for XFS filesystem. These options are
> +	  optimized for AI workloads with large sequential I/O.
> +
> +endmenu
> +
> +endif # AI_FILESYSTEM_XFS
> +
> +if AI_FILESYSTEM_BTRFS
> +
> +menu "Btrfs configuration"
> +
> +config AI_BTRFS_MKFS_OPTS
> +	string "mkfs.btrfs options"
> +	output yaml
> +	default "-f"
> +	help
> +	  Additional options to pass to mkfs.btrfs when creating
> +	  the filesystem for AI workloads.
> +
> +config AI_BTRFS_MOUNT_OPTS
> +	string "Btrfs mount options"
> +	output yaml
> +	default "rw,relatime,compress=lz4,space_cache=v2"
> +	help
> +	  Mount options for Btrfs filesystem. LZ4 compression
> +	  can help with AI datasets while maintaining performance.
> +
> +endmenu
> +
> +endif # AI_FILESYSTEM_BTRFS
> +
> +if AI_FILESYSTEM_EXT4
> +
> +menu "Ext4 configuration"
> +
> +config AI_EXT4_MKFS_OPTS
> +	string "mkfs.ext4 options"
> +	output yaml
> +	default "-F"
> +	help
> +	  Additional options to pass to mkfs.ext4 when creating
> +	  the filesystem for AI workloads.
> +
> +config AI_EXT4_MOUNT_OPTS
> +	string "Ext4 mount options"
> +	output yaml
> +	default "rw,relatime,data=ordered"
> +	help
> +	  Mount options for Ext4 filesystem optimized for
> +	  AI workload patterns.
> +
> +endmenu
> +
> +endif # AI_FILESYSTEM_EXT4
> +
> +
> +endmenu
> diff --git a/workflows/ai/Kconfig.multifs b/workflows/ai/Kconfig.multifs
> new file mode 100644
> index 00000000..2b72dd6c
> --- /dev/null
> +++ b/workflows/ai/Kconfig.multifs
> @@ -0,0 +1,184 @@
> +menu "Multi-filesystem testing configuration"
> +
> +config AI_ENABLE_MULTIFS_TESTING
> +	bool "Enable multi-filesystem testing"
> +	default n
> +	output yaml
> +	help
> +	  Enable testing the same AI workload across multiple filesystem
> +	  configurations. This allows comparing performance characteristics
> +	  between different filesystems and their configurations.
> +
> +	  When enabled, the AI benchmark will run sequentially across all
> +	  selected filesystem configurations, allowing for detailed
> +	  performance analysis across different storage backends.
> +
> +if AI_ENABLE_MULTIFS_TESTING
> +
> +config AI_MULTIFS_TEST_XFS
> +	bool "Test XFS configurations"
> +	default y
> +	output yaml
> +	help
> +	  Enable testing AI workloads on XFS filesystem with different
> +	  block size configurations.
> +
> +if AI_MULTIFS_TEST_XFS
> +
> +menu "XFS configuration profiles"
> +
> +config AI_MULTIFS_XFS_4K_4KS
> +	bool "XFS 4k block size - 4k sector size"
> +	default y
> +	output yaml
> +	help
> +	  Test AI workloads on XFS with 4k filesystem block size
> +	  and 4k sector size. This is the most common configuration
> +	  and provides good performance for most workloads.
> +
> +config AI_MULTIFS_XFS_16K_4KS
> +	bool "XFS 16k block size - 4k sector size"
> +	default y
> +	output yaml
> +	help
> +	  Test AI workloads on XFS with 16k filesystem block size
> +	  and 4k sector size. Larger block sizes can improve performance
> +	  for sequential I/O patterns common in AI workloads.
> +
> +config AI_MULTIFS_XFS_32K_4KS
> +	bool "XFS 32k block size - 4k sector size"
> +	default y
> +	output yaml
> +	help
> +	  Test AI workloads on XFS with 32k filesystem block size
> +	  and 4k sector size. Even larger block sizes can provide
> +	  benefits for large sequential I/O operations typical in
> +	  AI vector database workloads.
> +
> +config AI_MULTIFS_XFS_64K_4KS
> +	bool "XFS 64k block size - 4k sector size"
> +	default y
> +	output yaml
> +	help
> +	  Test AI workloads on XFS with 64k filesystem block size
> +	  and 4k sector size. Maximum supported block size for XFS,
> +	  optimized for very large file operations and high-throughput
> +	  AI workloads with substantial data transfers.
> +
> +endmenu
> +
> +endif # AI_MULTIFS_TEST_XFS
> +
> +config AI_MULTIFS_TEST_EXT4
> +	bool "Test ext4 configurations"
> +	default y
> +	output yaml
> +	help
> +	  Enable testing AI workloads on ext4 filesystem with different
> +	  configurations including bigalloc options.
> +
> +if AI_MULTIFS_TEST_EXT4
> +
> +menu "ext4 configuration profiles"
> +
> +config AI_MULTIFS_EXT4_4K
> +	bool "ext4 4k block size"
> +	default y
> +	output yaml
> +	help
> +	  Test AI workloads on ext4 with standard 4k block size.
> +	  This is the default ext4 configuration.
> +
> +config AI_MULTIFS_EXT4_16K_BIGALLOC
> +	bool "ext4 16k bigalloc"
> +	default y
> +	output yaml
> +	help
> +	  Test AI workloads on ext4 with 16k bigalloc enabled.
> +	  Bigalloc reduces metadata overhead and can improve
> +	  performance for large file workloads.
> +
> +endmenu
> +
> +endif # AI_MULTIFS_TEST_EXT4
> +
> +config AI_MULTIFS_TEST_BTRFS
> +	bool "Test btrfs configurations"
> +	default y
> +	output yaml
> +	help
> +	  Enable testing AI workloads on btrfs filesystem with
> +	  common default configuration profile.
> +
> +if AI_MULTIFS_TEST_BTRFS
> +
> +menu "btrfs configuration profiles"
> +
> +config AI_MULTIFS_BTRFS_DEFAULT
> +	bool "btrfs default profile"
> +	default y
> +	output yaml
> +	help
> +	  Test AI workloads on btrfs with default configuration.
> +	  This includes modern defaults with free-space-tree and
> +	  no-holes features enabled.
> +
> +endmenu
> +
> +endif # AI_MULTIFS_TEST_BTRFS
> +
> +config AI_MULTIFS_RESULTS_DIR
> +	string "Multi-filesystem results directory"
> +	output yaml
> +	default "/data/ai-multifs-benchmark"
> +	help
> +	  Directory where multi-filesystem test results and logs will be stored.
> +	  Each filesystem configuration will have its own subdirectory.
> +
> +config AI_MILVUS_STORAGE_ENABLE
> +	bool "Enable dedicated Milvus storage with filesystem matching node profile"
> +	default y
> +	output yaml
> +	help
> +	  Configure a dedicated storage device for Milvus data including
> +	  vector data (MinIO), metadata (etcd), and local cache. The filesystem
> +	  type will automatically match the node's configuration profile.
> +
> +config AI_MILVUS_DEVICE
> +	string "Device to use for Milvus storage"
> +	output yaml
> +	default "/dev/disk/by-id/nvme-QEMU_NVMe_Ctrl_kdevops3" if LIBVIRT && LIBVIRT_EXTRA_STORAGE_DRIVE_NVME
> +	default "/dev/disk/by-id/virtio-kdevops3" if LIBVIRT && LIBVIRT_EXTRA_STORAGE_DRIVE_VIRTIO
> +	default "/dev/disk/by-id/ata-QEMU_HARDDISK_kdevops3" if LIBVIRT && LIBVIRT_EXTRA_STORAGE_DRIVE_IDE
> +	default "/dev/nvme3n1" if TERRAFORM_AWS_INSTANCE_M5AD_2XLARGE
> +	default "/dev/nvme3n1" if TERRAFORM_AWS_INSTANCE_M5AD_4XLARGE
> +	default "/dev/nvme3n1" if TERRAFORM_GCE
> +	default "/dev/sde" if TERRAFORM_AZURE
> +	default TERRAFORM_OCI_SPARSE_VOLUME_DEVICE_FILE_NAME if TERRAFORM_OCI
> +	help
> +	  The device to use for Milvus storage. This device will be
> +	  formatted with the filesystem type matching the node's profile
> +	  and mounted at /data/milvus.
> +
> +config AI_MILVUS_MOUNT_POINT
> +	string "Mount point for Milvus storage"
> +	output yaml
> +	default "/data/milvus"
> +	help
> +	  The path where the Milvus storage filesystem will be mounted.
> +	  All Milvus data directories (data/, etcd/, minio/) will be
> +	  created under this mount point.
> +
> +config AI_MILVUS_USE_NODE_FS
> +	bool "Automatically detect filesystem type from node name"
> +	default y
> +	output yaml
> +	help
> +	  When enabled, the filesystem type for Milvus storage will be
> +	  automatically determined based on the node's configuration name.
> +	  For example, nodes named *-xfs-* will use XFS, *-ext4-* will
> +	  use ext4, and *-btrfs-* will use Btrfs.
> +
> +endif # AI_ENABLE_MULTIFS_TESTING
> +
> +endmenu
> diff --git a/workflows/ai/scripts/analysis_config.json b/workflows/ai/scripts/analysis_config.json
> index 2f90f4d5..5f0a9328 100644
> --- a/workflows/ai/scripts/analysis_config.json
> +++ b/workflows/ai/scripts/analysis_config.json
> @@ -2,5 +2,5 @@
>    "enable_graphing": true,
>    "graph_format": "png",
>    "graph_dpi": 150,
> -  "graph_theme": "seaborn"
> +  "graph_theme": "default"
>  }
> diff --git a/workflows/ai/scripts/analyze_results.py b/workflows/ai/scripts/analyze_results.py
> index 3d11fb11..2dc4a1d6 100755
> --- a/workflows/ai/scripts/analyze_results.py
> +++ b/workflows/ai/scripts/analyze_results.py
> @@ -226,6 +226,68 @@ class ResultsAnalyzer:
>  
>          return fs_info
>  
> +    def _extract_filesystem_config(
> +        self, result: Dict[str, Any]
> +    ) -> tuple[str, str, str]:
> +        """Extract filesystem type and block size from result data.
> +        Returns (fs_type, block_size, config_key)"""
> +        filename = result.get("_file", "")
> +
> +        # Primary: Extract filesystem type from filename (more reliable than JSON)
> +        fs_type = "unknown"
> +        block_size = "default"
> +
> +        if "xfs" in filename:
> +            fs_type = "xfs"
> +            # Check larger sizes first to avoid substring matches
> +            if "64k" in filename and "64k-" in filename:
> +                block_size = "64k"
> +            elif "32k" in filename and "32k-" in filename:
> +                block_size = "32k"
> +            elif "16k" in filename and "16k-" in filename:
> +                block_size = "16k"
> +            elif "4k" in filename and "4k-" in filename:
> +                block_size = "4k"
> +        elif "ext4" in filename:
> +            fs_type = "ext4"
> +            if "16k" in filename:
> +                block_size = "16k"
> +            elif "4k" in filename:
> +                block_size = "4k"
> +        elif "btrfs" in filename:
> +            fs_type = "btrfs"
> +            block_size = "default"
> +        else:
> +            # Fallback to JSON data if filename parsing fails
> +            fs_type = result.get("filesystem", "unknown")
> +            self.logger.warning(
> +                f"Could not determine filesystem from filename {filename}, using JSON data: {fs_type}"
> +            )
> +
> +        config_key = f"{fs_type}-{block_size}" if block_size != "default" else fs_type
> +        return fs_type, block_size, config_key
> +
> +    def _extract_node_info(self, result: Dict[str, Any]) -> tuple[str, bool]:
> +        """Extract node hostname and determine if it's a dev node.
> +        Returns (hostname, is_dev_node)"""
> +        # Get hostname from system_info (preferred) or fall back to filename
> +        system_info = result.get("system_info", {})
> +        hostname = system_info.get("hostname", "")
> +
> +        # If no hostname in system_info, try extracting from filename
> +        if not hostname:
> +            filename = result.get("_file", "")
> +            # Remove results_ prefix and .json suffix
> +            hostname = filename.replace("results_", "").replace(".json", "")
> +            # Remove iteration number if present (_1, _2, etc.)
> +            if "_" in hostname and hostname.split("_")[-1].isdigit():
> +                hostname = "_".join(hostname.split("_")[:-1])
> +
> +        # Determine if this is a dev node
> +        is_dev = hostname.endswith("-dev")
> +
> +        return hostname, is_dev
> +
>      def load_results(self) -> bool:
>          """Load all result files from the results directory"""
>          try:
> @@ -391,6 +453,8 @@ class ResultsAnalyzer:
>              html.append(
>                  "        .highlight { background-color: #fff3cd; padding: 10px; border-radius: 3px; }"
>              )
> +            html.append("        .baseline-row { background-color: #e8f5e9; }")
> +            html.append("        .dev-row { background-color: #e3f2fd; }")
>              html.append("    </style>")
>              html.append("</head>")
>              html.append("<body>")
> @@ -486,26 +550,69 @@ class ResultsAnalyzer:
>              else:
>                  html.append("        <p>No storage device information available.</p>")
>  
> -            # Filesystem section
> -            html.append("        <h3>🗂️ Filesystem Configuration</h3>")
> -            fs_info = self.system_info.get("filesystem_info", {})
> -            html.append("        <table class='config-table'>")
> -            html.append(
> -                "            <tr><td>Filesystem Type</td><td>"
> -                + str(fs_info.get("filesystem_type", "Unknown"))
> -                + "</td></tr>"
> -            )
> -            html.append(
> -                "            <tr><td>Mount Point</td><td>"
> -                + str(fs_info.get("mount_point", "Unknown"))
> -                + "</td></tr>"
> -            )
> -            html.append(
> -                "            <tr><td>Mount Options</td><td>"
> -                + str(fs_info.get("mount_options", "Unknown"))
> -                + "</td></tr>"
> -            )
> -            html.append("        </table>")
> +            # Node Configuration section - Extract from actual benchmark results
> +            html.append("        <h3>🗂️ Node Configuration</h3>")
> +
> +            # Collect node and filesystem information from benchmark results
> +            node_configs = {}
> +            for result in self.results_data:
> +                # Extract node information
> +                hostname, is_dev = self._extract_node_info(result)
> +                fs_type, block_size, config_key = self._extract_filesystem_config(
> +                    result
> +                )
> +
> +                system_info = result.get("system_info", {})
> +                data_path = system_info.get("data_path", "/data/milvus")
> +                mount_point = system_info.get("mount_point", "/data")
> +                kernel_version = system_info.get("kernel_version", "unknown")
> +
> +                if hostname not in node_configs:
> +                    node_configs[hostname] = {
> +                        "hostname": hostname,
> +                        "node_type": "Development" if is_dev else "Baseline",
> +                        "filesystem": fs_type,
> +                        "block_size": block_size,
> +                        "data_path": data_path,
> +                        "mount_point": mount_point,
> +                        "kernel": kernel_version,
> +                        "test_count": 0,
> +                    }
> +                node_configs[hostname]["test_count"] += 1
> +
> +            if node_configs:
> +                html.append("        <table class='config-table'>")
> +                html.append(
> +                    "            <tr><th>Node</th><th>Type</th><th>Filesystem</th><th>Block Size</th><th>Data Path</th><th>Mount Point</th><th>Kernel</th><th>Tests</th></tr>"
> +                )
> +                # Sort nodes with baseline first, then dev
> +                sorted_nodes = sorted(
> +                    node_configs.items(),
> +                    key=lambda x: (x[1]["node_type"] != "Baseline", x[0]),
> +                )
> +                for hostname, config_info in sorted_nodes:
> +                    row_class = (
> +                        "dev-row"
> +                        if config_info["node_type"] == "Development"
> +                        else "baseline-row"
> +                    )
> +                    html.append(f"            <tr class='{row_class}'>")
> +                    html.append(f"                <td><strong>{hostname}</strong></td>")
> +                    html.append(f"                <td>{config_info['node_type']}</td>")
> +                    html.append(f"                <td>{config_info['filesystem']}</td>")
> +                    html.append(f"                <td>{config_info['block_size']}</td>")
> +                    html.append(f"                <td>{config_info['data_path']}</td>")
> +                    html.append(
> +                        f"                <td>{config_info['mount_point']}</td>"
> +                    )
> +                    html.append(f"                <td>{config_info['kernel']}</td>")
> +                    html.append(f"                <td>{config_info['test_count']}</td>")
> +                    html.append(f"            </tr>")
> +                html.append("        </table>")
> +            else:
> +                html.append(
> +                    "        <p>No node configuration data found in results.</p>"
> +                )
>              html.append("    </div>")
>  
>              # Test Configuration Section
> @@ -551,92 +658,192 @@ class ResultsAnalyzer:
>                  html.append("        </table>")
>                  html.append("    </div>")
>  
> -            # Performance Results Section
> +            # Performance Results Section - Per Node
>              html.append("    <div class='section'>")
> -            html.append("        <h2>📊 Performance Results Summary</h2>")
> +            html.append("        <h2>📊 Performance Results by Node</h2>")
>  
>              if self.results_data:
> -                # Insert performance
> -                insert_times = [
> -                    r.get("insert_performance", {}).get("total_time_seconds", 0)
> -                    for r in self.results_data
> -                ]
> -                insert_rates = [
> -                    r.get("insert_performance", {}).get("vectors_per_second", 0)
> -                    for r in self.results_data
> -                ]
> -
> -                if insert_times and any(t > 0 for t in insert_times):
> -                    html.append("        <h3>📈 Vector Insert Performance</h3>")
> -                    html.append("        <table class='metric-table'>")
> -                    html.append(
> -                        f"            <tr><td>Average Insert Time</td><td>{np.mean(insert_times):.2f} seconds</td></tr>"
> -                    )
> -                    html.append(
> -                        f"            <tr><td>Average Insert Rate</td><td>{np.mean(insert_rates):.2f} vectors/sec</td></tr>"
> +                # Group results by node
> +                node_performance = {}
> +
> +                for result in self.results_data:
> +                    # Use node hostname as the grouping key
> +                    hostname, is_dev = self._extract_node_info(result)
> +                    fs_type, block_size, config_key = self._extract_filesystem_config(
> +                        result
>                      )
> -                    html.append(
> -                        f"            <tr><td>Insert Rate Range</td><td>{np.min(insert_rates):.2f} - {np.max(insert_rates):.2f} vectors/sec</td></tr>"
> -                    )
> -                    html.append("        </table>")
>  
> -                # Index performance
> -                index_times = [
> -                    r.get("index_performance", {}).get("creation_time_seconds", 0)
> -                    for r in self.results_data
> -                ]
> -                if index_times and any(t > 0 for t in index_times):
> -                    html.append("        <h3>🔗 Index Creation Performance</h3>")
> -                    html.append("        <table class='metric-table'>")
> -                    html.append(
> -                        f"            <tr><td>Average Index Creation Time</td><td>{np.mean(index_times):.2f} seconds</td></tr>"
> +                    if hostname not in node_performance:
> +                        node_performance[hostname] = {
> +                            "hostname": hostname,
> +                            "node_type": "Development" if is_dev else "Baseline",
> +                            "insert_rates": [],
> +                            "insert_times": [],
> +                            "index_times": [],
> +                            "query_performance": {},
> +                            "filesystem": fs_type,
> +                            "block_size": block_size,
> +                        }
> +
> +                    # Add insert performance
> +                    insert_perf = result.get("insert_performance", {})
> +                    if insert_perf:
> +                        rate = insert_perf.get("vectors_per_second", 0)
> +                        time = insert_perf.get("total_time_seconds", 0)
> +                        if rate > 0:
> +                            node_performance[hostname]["insert_rates"].append(rate)
> +                        if time > 0:
> +                            node_performance[hostname]["insert_times"].append(time)
> +
> +                    # Add index performance
> +                    index_perf = result.get("index_performance", {})
> +                    if index_perf:
> +                        time = index_perf.get("creation_time_seconds", 0)
> +                        if time > 0:
> +                            node_performance[hostname]["index_times"].append(time)
> +
> +                    # Collect query performance (use first result for each node)
> +                    query_perf = result.get("query_performance", {})
> +                    if (
> +                        query_perf
> +                        and not node_performance[hostname]["query_performance"]
> +                    ):
> +                        node_performance[hostname]["query_performance"] = query_perf
> +
> +                # Display results for each node, sorted with baseline first
> +                sorted_nodes = sorted(
> +                    node_performance.items(),
> +                    key=lambda x: (x[1]["node_type"] != "Baseline", x[0]),
> +                )
> +                for hostname, perf_data in sorted_nodes:
> +                    node_type_badge = (
> +                        "🔵" if perf_data["node_type"] == "Development" else "🟢"
>                      )
>                      html.append(
> -                        f"            <tr><td>Index Time Range</td><td>{np.min(index_times):.2f} - {np.max(index_times):.2f} seconds</td></tr>"
> +                        f"        <h3>{node_type_badge} {hostname} ({perf_data['node_type']})</h3>"
>                      )
> -                    html.append("        </table>")
> -
> -                # Query performance
> -                html.append("        <h3>🔍 Query Performance</h3>")
> -                first_query_perf = self.results_data[0].get("query_performance", {})
> -                if first_query_perf:
> -                    html.append("        <table>")
>                      html.append(
> -                        "            <tr><th>Query Type</th><th>Batch Size</th><th>QPS</th><th>Avg Latency (ms)</th></tr>"
> +                        f"        <p>Filesystem: {perf_data['filesystem']}, Block Size: {perf_data['block_size']}</p>"
>                      )
>  
> -                    for topk, topk_data in first_query_perf.items():
> -                        for batch, batch_data in topk_data.items():
> -                            qps = batch_data.get("queries_per_second", 0)
> -                            avg_time = batch_data.get("average_time_seconds", 0) * 1000
> -
> -                            # Color coding for performance
> -                            qps_class = ""
> -                            if qps > 1000:
> -                                qps_class = "performance-good"
> -                            elif qps > 100:
> -                                qps_class = "performance-warning"
> -                            else:
> -                                qps_class = "performance-poor"
> -
> -                            html.append(f"            <tr>")
> -                            html.append(
> -                                f"                <td>{topk.replace('topk_', 'Top-')}</td>"
> -                            )
> -                            html.append(
> -                                f"                <td>{batch.replace('batch_', 'Batch ')}</td>"
> -                            )
> -                            html.append(
> -                                f"                <td class='{qps_class}'>{qps:.2f}</td>"
> -                            )
> -                            html.append(f"                <td>{avg_time:.2f}</td>")
> -                            html.append(f"            </tr>")
> +                    # Insert performance
> +                    insert_rates = perf_data["insert_rates"]
> +                    if insert_rates:
> +                        html.append("        <h4>📈 Vector Insert Performance</h4>")
> +                        html.append("        <table class='metric-table'>")
> +                        html.append(
> +                            f"            <tr><td>Average Insert Rate</td><td>{np.mean(insert_rates):.2f} vectors/sec</td></tr>"
> +                        )
> +                        html.append(
> +                            f"            <tr><td>Insert Rate Range</td><td>{np.min(insert_rates):.2f} - {np.max(insert_rates):.2f} vectors/sec</td></tr>"
> +                        )
> +                        html.append(
> +                            f"            <tr><td>Test Iterations</td><td>{len(insert_rates)}</td></tr>"
> +                        )
> +                        html.append("        </table>")
> +
> +                    # Index performance
> +                    index_times = perf_data["index_times"]
> +                    if index_times:
> +                        html.append("        <h4>🔗 Index Creation Performance</h4>")
> +                        html.append("        <table class='metric-table'>")
> +                        html.append(
> +                            f"            <tr><td>Average Index Creation Time</td><td>{np.mean(index_times):.3f} seconds</td></tr>"
> +                        )
> +                        html.append(
> +                            f"            <tr><td>Index Time Range</td><td>{np.min(index_times):.3f} - {np.max(index_times):.3f} seconds</td></tr>"
> +                        )
> +                        html.append("        </table>")
> +
> +                    # Query performance
> +                    query_perf = perf_data["query_performance"]
> +                    if query_perf:
> +                        html.append("        <h4>🔍 Query Performance</h4>")
> +                        html.append("        <table>")
> +                        html.append(
> +                            "            <tr><th>Query Type</th><th>Batch Size</th><th>QPS</th><th>Avg Latency (ms)</th></tr>"
> +                        )
>  
> -                    html.append("        </table>")
> +                        for topk, topk_data in query_perf.items():
> +                            for batch, batch_data in topk_data.items():
> +                                qps = batch_data.get("queries_per_second", 0)
> +                                avg_time = (
> +                                    batch_data.get("average_time_seconds", 0) * 1000
> +                                )
> +
> +                                # Color coding for performance
> +                                qps_class = ""
> +                                if qps > 1000:
> +                                    qps_class = "performance-good"
> +                                elif qps > 100:
> +                                    qps_class = "performance-warning"
> +                                else:
> +                                    qps_class = "performance-poor"
> +
> +                                html.append(f"            <tr>")
> +                                html.append(
> +                                    f"                <td>{topk.replace('topk_', 'Top-')}</td>"
> +                                )
> +                                html.append(
> +                                    f"                <td>{batch.replace('batch_', 'Batch ')}</td>"
> +                                )
> +                                html.append(
> +                                    f"                <td class='{qps_class}'>{qps:.2f}</td>"
> +                                )
> +                                html.append(f"                <td>{avg_time:.2f}</td>")
> +                                html.append(f"            </tr>")
> +                        html.append("        </table>")
> +
> +                    html.append("        <br>")  # Add spacing between configurations
>  
> -                html.append("    </div>")
> +            html.append("    </div>")
>  
>              # Footer
> +            # Performance Graphs Section
> +            html.append("    <div class='section'>")
> +            html.append("        <h2>📈 Performance Visualizations</h2>")
> +            html.append(
> +                "        <p>The following graphs provide visual analysis of the benchmark results across all tested filesystem configurations:</p>"
> +            )
> +            html.append("        <ul>")
> +            html.append(
> +                "            <li><strong>Insert Performance:</strong> Shows vector insertion rates and times for each filesystem configuration</li>"
> +            )
> +            html.append(
> +                "            <li><strong>Query Performance:</strong> Displays query performance heatmaps for different Top-K and batch sizes</li>"
> +            )
> +            html.append(
> +                "            <li><strong>Index Performance:</strong> Compares index creation times across filesystems</li>"
> +            )
> +            html.append(
> +                "            <li><strong>Performance Matrix:</strong> Comprehensive comparison matrix of all metrics</li>"
> +            )
> +            html.append(
> +                "            <li><strong>Filesystem Comparison:</strong> Side-by-side comparison of filesystem performance</li>"
> +            )
> +            html.append("        </ul>")
> +            html.append(
> +                "        <p><em>Note: Graphs are generated as separate PNG files in the same directory as this report.</em></p>"
> +            )
> +            html.append("        <div style='margin-top: 20px;'>")
> +            html.append(
> +                "            <img src='insert_performance.png' alt='Insert Performance' style='max-width: 100%; height: auto; margin-bottom: 20px;'>"
> +            )
> +            html.append(
> +                "            <img src='query_performance.png' alt='Query Performance' style='max-width: 100%; height: auto; margin-bottom: 20px;'>"
> +            )
> +            html.append(
> +                "            <img src='index_performance.png' alt='Index Performance' style='max-width: 100%; height: auto; margin-bottom: 20px;'>"
> +            )
> +            html.append(
> +                "            <img src='performance_matrix.png' alt='Performance Matrix' style='max-width: 100%; height: auto; margin-bottom: 20px;'>"
> +            )
> +            html.append(
> +                "            <img src='filesystem_comparison.png' alt='Filesystem Comparison' style='max-width: 100%; height: auto; margin-bottom: 20px;'>"
> +            )
> +            html.append("        </div>")
> +            html.append("    </div>")
> +
>              html.append("    <div class='section'>")
>              html.append("        <h2>📝 Notes</h2>")
>              html.append("        <ul>")
> @@ -661,10 +868,11 @@ class ResultsAnalyzer:
>              return "\n".join(html)
>  
>          except Exception as e:
> -            self.logger.error(f"Error generating HTML report: {e}")
> -            return (
> -                f"<html><body><h1>Error generating HTML report: {e}</h1></body></html>"
> -            )
> +            import traceback
> +
> +            tb = traceback.format_exc()
> +            self.logger.error(f"Error generating HTML report: {e}\n{tb}")
> +            return f"<html><body><h1>Error generating HTML report: {e}</h1><pre>{tb}</pre></body></html>"
>  
>      def generate_graphs(self) -> bool:
>          """Generate performance visualization graphs"""
> @@ -691,6 +899,9 @@ class ResultsAnalyzer:
>              # Graph 4: Performance Comparison Matrix
>              self._plot_performance_matrix()
>  
> +            # Graph 5: Multi-filesystem Comparison (if applicable)
> +            self._plot_filesystem_comparison()
> +
>              self.logger.info("Graphs generated successfully")
>              return True
>  
> @@ -699,34 +910,188 @@ class ResultsAnalyzer:
>              return False
>  
>      def _plot_insert_performance(self):
> -        """Plot insert performance metrics"""
> -        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
> +        """Plot insert performance metrics with node differentiation"""
> +        # Group data by node
> +        node_performance = {}
>  
> -        # Extract insert data
> -        iterations = []
> -        insert_rates = []
> -        insert_times = []
> +        for result in self.results_data:
> +            hostname, is_dev = self._extract_node_info(result)
> +
> +            if hostname not in node_performance:
> +                node_performance[hostname] = {
> +                    "insert_rates": [],
> +                    "insert_times": [],
> +                    "iterations": [],
> +                    "is_dev": is_dev,
> +                }
>  
> -        for i, result in enumerate(self.results_data):
>              insert_perf = result.get("insert_performance", {})
>              if insert_perf:
> -                iterations.append(i + 1)
> -                insert_rates.append(insert_perf.get("vectors_per_second", 0))
> -                insert_times.append(insert_perf.get("total_time_seconds", 0))
> -
> -        # Plot insert rate
> -        ax1.plot(iterations, insert_rates, "b-o", linewidth=2, markersize=6)
> -        ax1.set_xlabel("Iteration")
> -        ax1.set_ylabel("Vectors/Second")
> -        ax1.set_title("Vector Insert Rate Performance")
> -        ax1.grid(True, alpha=0.3)
> -
> -        # Plot insert time
> -        ax2.plot(iterations, insert_times, "r-o", linewidth=2, markersize=6)
> -        ax2.set_xlabel("Iteration")
> -        ax2.set_ylabel("Total Time (seconds)")
> -        ax2.set_title("Vector Insert Time Performance")
> -        ax2.grid(True, alpha=0.3)
> +                node_performance[hostname]["insert_rates"].append(
> +                    insert_perf.get("vectors_per_second", 0)
> +                )
> +                node_performance[hostname]["insert_times"].append(
> +                    insert_perf.get("total_time_seconds", 0)
> +                )
> +                node_performance[hostname]["iterations"].append(
> +                    len(node_performance[hostname]["insert_rates"])
> +                )
> +
> +        # Check if we have multiple nodes
> +        if len(node_performance) > 1:
> +            # Multi-node mode: separate lines for each node
> +            fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(18, 7))
> +
> +            # Sort nodes with baseline first, then dev
> +            sorted_nodes = sorted(
> +                node_performance.items(), key=lambda x: (x[1]["is_dev"], x[0])
> +            )
> +
> +            # Create color palettes for baseline and dev nodes
> +            baseline_colors = [
> +                "#2E7D32",
> +                "#43A047",
> +                "#66BB6A",
> +                "#81C784",
> +                "#A5D6A7",
> +                "#C8E6C9",
> +            ]  # Greens
> +            dev_colors = [
> +                "#0D47A1",
> +                "#1565C0",
> +                "#1976D2",
> +                "#1E88E5",
> +                "#2196F3",
> +                "#42A5F5",
> +                "#64B5F6",
> +            ]  # Blues
> +
> +            # Additional colors if needed
> +            extra_colors = [
> +                "#E65100",
> +                "#F57C00",
> +                "#FF9800",
> +                "#FFB300",
> +                "#FFC107",
> +                "#FFCA28",
> +            ]  # Oranges
> +
> +            # Line styles to cycle through
> +            line_styles = ["-", "--", "-.", ":"]
> +            markers = ["o", "s", "^", "v", "D", "p", "*", "h"]
> +
> +            baseline_idx = 0
> +            dev_idx = 0
> +
> +            # Use different colors and styles for each node
> +            for idx, (hostname, perf_data) in enumerate(sorted_nodes):
> +                if not perf_data["insert_rates"]:
> +                    continue
> +
> +                # Choose color and style based on node type and index
> +                if perf_data["is_dev"]:
> +                    # Development nodes - blues
> +                    color = dev_colors[dev_idx % len(dev_colors)]
> +                    linestyle = line_styles[
> +                        (dev_idx // len(dev_colors)) % len(line_styles)
> +                    ]
> +                    marker = markers[4 + (dev_idx % 4)]  # Use markers 4-7 for dev
> +                    label = f"{hostname} (Dev)"
> +                    dev_idx += 1
> +                else:
> +                    # Baseline nodes - greens
> +                    color = baseline_colors[baseline_idx % len(baseline_colors)]
> +                    linestyle = line_styles[
> +                        (baseline_idx // len(baseline_colors)) % len(line_styles)
> +                    ]
> +                    marker = markers[
> +                        baseline_idx % 4
> +                    ]  # Use first 4 markers for baseline
> +                    label = f"{hostname} (Baseline)"
> +                    baseline_idx += 1
> +
> +                iterations = list(range(1, len(perf_data["insert_rates"]) + 1))
> +
> +                # Plot insert rate with alpha for better visibility
> +                ax1.plot(
> +                    iterations,
> +                    perf_data["insert_rates"],
> +                    color=color,
> +                    linestyle=linestyle,
> +                    marker=marker,
> +                    linewidth=1.5,
> +                    markersize=5,
> +                    label=label,
> +                    alpha=0.8,
> +                )
> +
> +                # Plot insert time
> +                ax2.plot(
> +                    iterations,
> +                    perf_data["insert_times"],
> +                    color=color,
> +                    linestyle=linestyle,
> +                    marker=marker,
> +                    linewidth=1.5,
> +                    markersize=5,
> +                    label=label,
> +                    alpha=0.8,
> +                )
> +
> +            ax1.set_xlabel("Iteration")
> +            ax1.set_ylabel("Vectors/Second")
> +            ax1.set_title("Milvus Insert Rate by Node")
> +            ax1.grid(True, alpha=0.3)
> +            # Position legend outside plot area for better visibility with many nodes
> +            ax1.legend(bbox_to_anchor=(1.05, 1), loc="upper left", fontsize=7, ncol=1)
> +
> +            ax2.set_xlabel("Iteration")
> +            ax2.set_ylabel("Total Time (seconds)")
> +            ax2.set_title("Milvus Insert Time by Node")
> +            ax2.grid(True, alpha=0.3)
> +            # Position legend outside plot area for better visibility with many nodes
> +            ax2.legend(bbox_to_anchor=(1.05, 1), loc="upper left", fontsize=7, ncol=1)
> +
> +            plt.suptitle(
> +                "Insert Performance Analysis: Baseline vs Development",
> +                fontsize=14,
> +                y=1.02,
> +            )
> +        else:
> +            # Single node mode: original behavior
> +            fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
> +
> +            # Extract insert data from single node
> +            hostname = list(node_performance.keys())[0] if node_performance else None
> +            if hostname:
> +                perf_data = node_performance[hostname]
> +                iterations = list(range(1, len(perf_data["insert_rates"]) + 1))
> +
> +                # Plot insert rate
> +                ax1.plot(
> +                    iterations,
> +                    perf_data["insert_rates"],
> +                    "b-o",
> +                    linewidth=2,
> +                    markersize=6,
> +                )
> +                ax1.set_xlabel("Iteration")
> +                ax1.set_ylabel("Vectors/Second")
> +                ax1.set_title(f"Vector Insert Rate Performance - {hostname}")
> +                ax1.grid(True, alpha=0.3)
> +
> +                # Plot insert time
> +                ax2.plot(
> +                    iterations,
> +                    perf_data["insert_times"],
> +                    "r-o",
> +                    linewidth=2,
> +                    markersize=6,
> +                )
> +                ax2.set_xlabel("Iteration")
> +                ax2.set_ylabel("Total Time (seconds)")
> +                ax2.set_title(f"Vector Insert Time Performance - {hostname}")
> +                ax2.grid(True, alpha=0.3)
>  
>          plt.tight_layout()
>          output_file = os.path.join(
> @@ -739,52 +1104,110 @@ class ResultsAnalyzer:
>          plt.close()
>  
>      def _plot_query_performance(self):
> -        """Plot query performance metrics"""
> +        """Plot query performance metrics comparing baseline vs dev nodes"""
>          if not self.results_data:
>              return
>  
> -        # Collect query performance data
> -        query_data = []
> +        # Group data by filesystem configuration
> +        fs_groups = {}
>          for result in self.results_data:
> +            hostname, is_dev = self._extract_node_info(result)
> +            fs_type, block_size, config_key = self._extract_filesystem_config(result)
> +
> +            if config_key not in fs_groups:
> +                fs_groups[config_key] = {"baseline": [], "dev": []}
> +
>              query_perf = result.get("query_performance", {})
> -            for topk, topk_data in query_perf.items():
> -                for batch, batch_data in topk_data.items():
> -                    query_data.append(
> -                        {
> -                            "topk": topk.replace("topk_", ""),
> -                            "batch": batch.replace("batch_", ""),
> -                            "qps": batch_data.get("queries_per_second", 0),
> -                            "avg_time": batch_data.get("average_time_seconds", 0)
> -                            * 1000,  # Convert to ms
> -                        }
> -                    )
> +            if query_perf:
> +                node_type = "dev" if is_dev else "baseline"
> +                for topk, topk_data in query_perf.items():
> +                    for batch, batch_data in topk_data.items():
> +                        fs_groups[config_key][node_type].append(
> +                            {
> +                                "hostname": hostname,
> +                                "topk": topk.replace("topk_", ""),
> +                                "batch": batch.replace("batch_", ""),
> +                                "qps": batch_data.get("queries_per_second", 0),
> +                                "avg_time": batch_data.get("average_time_seconds", 0)
> +                                * 1000,
> +                            }
> +                        )
>  
> -        if not query_data:
> +        if not fs_groups:
>              return
>  
> -        df = pd.DataFrame(query_data)
> +        # Create subplots for each filesystem config
> +        n_configs = len(fs_groups)
> +        fig_height = max(8, 4 * n_configs)
> +        fig, axes = plt.subplots(n_configs, 2, figsize=(16, fig_height))
>  
> -        # Create subplots
> -        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
> +        if n_configs == 1:
> +            axes = axes.reshape(1, -1)
>  
> -        # QPS heatmap
> -        qps_pivot = df.pivot_table(
> -            values="qps", index="topk", columns="batch", aggfunc="mean"
> -        )
> -        sns.heatmap(qps_pivot, annot=True, fmt=".1f", ax=ax1, cmap="YlOrRd")
> -        ax1.set_title("Queries Per Second (QPS)")
> -        ax1.set_xlabel("Batch Size")
> -        ax1.set_ylabel("Top-K")
> -
> -        # Latency heatmap
> -        latency_pivot = df.pivot_table(
> -            values="avg_time", index="topk", columns="batch", aggfunc="mean"
> -        )
> -        sns.heatmap(latency_pivot, annot=True, fmt=".1f", ax=ax2, cmap="YlOrRd")
> -        ax2.set_title("Average Query Latency (ms)")
> -        ax2.set_xlabel("Batch Size")
> -        ax2.set_ylabel("Top-K")
> +        for idx, (config_key, data) in enumerate(sorted(fs_groups.items())):
> +            # Create DataFrames for baseline and dev
> +            baseline_df = (
> +                pd.DataFrame(data["baseline"]) if data["baseline"] else pd.DataFrame()
> +            )
> +            dev_df = pd.DataFrame(data["dev"]) if data["dev"] else pd.DataFrame()
> +
> +            # Baseline QPS heatmap
> +            ax_base = axes[idx][0]
> +            if not baseline_df.empty:
> +                baseline_pivot = baseline_df.pivot_table(
> +                    values="qps", index="topk", columns="batch", aggfunc="mean"
> +                )
> +                sns.heatmap(
> +                    baseline_pivot,
> +                    annot=True,
> +                    fmt=".1f",
> +                    ax=ax_base,
> +                    cmap="Greens",
> +                    cbar_kws={"label": "QPS"},
> +                )
> +                ax_base.set_title(f"{config_key.upper()} - Baseline QPS")
> +                ax_base.set_xlabel("Batch Size")
> +                ax_base.set_ylabel("Top-K")
> +            else:
> +                ax_base.text(
> +                    0.5,
> +                    0.5,
> +                    f"No baseline data for {config_key}",
> +                    ha="center",
> +                    va="center",
> +                    transform=ax_base.transAxes,
> +                )
> +                ax_base.set_title(f"{config_key.upper()} - Baseline QPS")
>  
> +            # Dev QPS heatmap
> +            ax_dev = axes[idx][1]
> +            if not dev_df.empty:
> +                dev_pivot = dev_df.pivot_table(
> +                    values="qps", index="topk", columns="batch", aggfunc="mean"
> +                )
> +                sns.heatmap(
> +                    dev_pivot,
> +                    annot=True,
> +                    fmt=".1f",
> +                    ax=ax_dev,
> +                    cmap="Blues",
> +                    cbar_kws={"label": "QPS"},
> +                )
> +                ax_dev.set_title(f"{config_key.upper()} - Development QPS")
> +                ax_dev.set_xlabel("Batch Size")
> +                ax_dev.set_ylabel("Top-K")
> +            else:
> +                ax_dev.text(
> +                    0.5,
> +                    0.5,
> +                    f"No dev data for {config_key}",
> +                    ha="center",
> +                    va="center",
> +                    transform=ax_dev.transAxes,
> +                )
> +                ax_dev.set_title(f"{config_key.upper()} - Development QPS")
> +
> +        plt.suptitle("Query Performance: Baseline vs Development", fontsize=16, y=1.02)
>          plt.tight_layout()
>          output_file = os.path.join(
>              self.output_dir,
> @@ -796,32 +1219,101 @@ class ResultsAnalyzer:
>          plt.close()
>  
>      def _plot_index_performance(self):
> -        """Plot index creation performance"""
> -        iterations = []
> -        index_times = []
> +        """Plot index creation performance comparing baseline vs dev"""
> +        # Group by filesystem configuration
> +        fs_groups = {}
> +
> +        for result in self.results_data:
> +            hostname, is_dev = self._extract_node_info(result)
> +            fs_type, block_size, config_key = self._extract_filesystem_config(result)
> +
> +            if config_key not in fs_groups:
> +                fs_groups[config_key] = {"baseline": [], "dev": []}
>  
> -        for i, result in enumerate(self.results_data):
>              index_perf = result.get("index_performance", {})
>              if index_perf:
> -                iterations.append(i + 1)
> -                index_times.append(index_perf.get("creation_time_seconds", 0))
> +                time = index_perf.get("creation_time_seconds", 0)
> +                if time > 0:
> +                    node_type = "dev" if is_dev else "baseline"
> +                    fs_groups[config_key][node_type].append(time)
>  
> -        if not index_times:
> +        if not fs_groups:
>              return
>  
> -        plt.figure(figsize=(10, 6))
> -        plt.bar(iterations, index_times, alpha=0.7, color="green")
> -        plt.xlabel("Iteration")
> -        plt.ylabel("Index Creation Time (seconds)")
> -        plt.title("Index Creation Performance")
> -        plt.grid(True, alpha=0.3)
> -
> -        # Add average line
> -        avg_time = np.mean(index_times)
> -        plt.axhline(
> -            y=avg_time, color="red", linestyle="--", label=f"Average: {avg_time:.2f}s"
> +        # Create comparison bar chart
> +        fig, ax = plt.subplots(figsize=(14, 8))
> +
> +        configs = sorted(fs_groups.keys())
> +        x = np.arange(len(configs))
> +        width = 0.35
> +
> +        # Calculate averages for each config
> +        baseline_avgs = []
> +        dev_avgs = []
> +        baseline_stds = []
> +        dev_stds = []
> +
> +        for config in configs:
> +            baseline_times = fs_groups[config]["baseline"]
> +            dev_times = fs_groups[config]["dev"]
> +
> +            baseline_avgs.append(np.mean(baseline_times) if baseline_times else 0)
> +            dev_avgs.append(np.mean(dev_times) if dev_times else 0)
> +            baseline_stds.append(np.std(baseline_times) if baseline_times else 0)
> +            dev_stds.append(np.std(dev_times) if dev_times else 0)
> +
> +        # Create bars
> +        bars1 = ax.bar(
> +            x - width / 2,
> +            baseline_avgs,
> +            width,
> +            yerr=baseline_stds,
> +            label="Baseline",
> +            color="#4CAF50",
> +            capsize=5,
> +        )
> +        bars2 = ax.bar(
> +            x + width / 2,
> +            dev_avgs,
> +            width,
> +            yerr=dev_stds,
> +            label="Development",
> +            color="#2196F3",
> +            capsize=5,
>          )
> -        plt.legend()
> +
> +        # Add value labels on bars
> +        for bar, val in zip(bars1, baseline_avgs):
> +            if val > 0:
> +                height = bar.get_height()
> +                ax.text(
> +                    bar.get_x() + bar.get_width() / 2.0,
> +                    height,
> +                    f"{val:.3f}s",
> +                    ha="center",
> +                    va="bottom",
> +                    fontsize=9,
> +                )
> +
> +        for bar, val in zip(bars2, dev_avgs):
> +            if val > 0:
> +                height = bar.get_height()
> +                ax.text(
> +                    bar.get_x() + bar.get_width() / 2.0,
> +                    height,
> +                    f"{val:.3f}s",
> +                    ha="center",
> +                    va="bottom",
> +                    fontsize=9,
> +                )
> +
> +        ax.set_xlabel("Filesystem Configuration", fontsize=12)
> +        ax.set_ylabel("Index Creation Time (seconds)", fontsize=12)
> +        ax.set_title("Index Creation Performance: Baseline vs Development", fontsize=14)
> +        ax.set_xticks(x)
> +        ax.set_xticklabels([c.upper() for c in configs], rotation=45, ha="right")
> +        ax.legend(loc="upper right")
> +        ax.grid(True, alpha=0.3, axis="y")
>  
>          output_file = os.path.join(
>              self.output_dir,
> @@ -833,61 +1325,148 @@ class ResultsAnalyzer:
>          plt.close()
>  
>      def _plot_performance_matrix(self):
> -        """Plot comprehensive performance comparison matrix"""
> +        """Plot performance comparison matrix for each filesystem config"""
>          if len(self.results_data) < 2:
>              return
>  
> -        # Extract key metrics for comparison
> -        metrics = []
> -        for i, result in enumerate(self.results_data):
> +        # Group by filesystem configuration
> +        fs_metrics = {}
> +
> +        for result in self.results_data:
> +            hostname, is_dev = self._extract_node_info(result)
> +            fs_type, block_size, config_key = self._extract_filesystem_config(result)
> +
> +            if config_key not in fs_metrics:
> +                fs_metrics[config_key] = {"baseline": [], "dev": []}
> +
> +            # Collect metrics
>              insert_perf = result.get("insert_performance", {})
>              index_perf = result.get("index_performance", {})
> +            query_perf = result.get("query_performance", {})
>  
>              metric = {
> -                "iteration": i + 1,
> +                "hostname": hostname,
>                  "insert_rate": insert_perf.get("vectors_per_second", 0),
>                  "index_time": index_perf.get("creation_time_seconds", 0),
>              }
>  
> -            # Add query metrics
> -            query_perf = result.get("query_performance", {})
> +            # Get representative query performance (topk_10, batch_1)
>              if "topk_10" in query_perf and "batch_1" in query_perf["topk_10"]:
>                  metric["query_qps"] = query_perf["topk_10"]["batch_1"].get(
>                      "queries_per_second", 0
>                  )
> +            else:
> +                metric["query_qps"] = 0
>  
> -            metrics.append(metric)
> +            node_type = "dev" if is_dev else "baseline"
> +            fs_metrics[config_key][node_type].append(metric)
>  
> -        df = pd.DataFrame(metrics)
> +        if not fs_metrics:
> +            return
>  
> -        # Normalize metrics for comparison
> -        numeric_cols = ["insert_rate", "index_time", "query_qps"]
> -        for col in numeric_cols:
> -            if col in df.columns:
> -                df[f"{col}_norm"] = (df[col] - df[col].min()) / (
> -                    df[col].max() - df[col].min() + 1e-6
> -                )
> +        # Create subplots for each filesystem
> +        n_configs = len(fs_metrics)
> +        n_cols = min(3, n_configs)
> +        n_rows = (n_configs + n_cols - 1) // n_cols
> +
> +        fig, axes = plt.subplots(n_rows, n_cols, figsize=(n_cols * 6, n_rows * 5))
> +        if n_rows == 1 and n_cols == 1:
> +            axes = [[axes]]
> +        elif n_rows == 1:
> +            axes = [axes]
> +        elif n_cols == 1:
> +            axes = [[ax] for ax in axes]
> +
> +        for idx, (config_key, data) in enumerate(sorted(fs_metrics.items())):
> +            row = idx // n_cols
> +            col = idx % n_cols
> +            ax = axes[row][col]
> +
> +            # Calculate averages
> +            baseline_metrics = data["baseline"]
> +            dev_metrics = data["dev"]
> +
> +            if baseline_metrics and dev_metrics:
> +                categories = ["Insert Rate\n(vec/s)", "Index Time\n(s)", "Query QPS"]
> +
> +                baseline_avg = [
> +                    np.mean([m["insert_rate"] for m in baseline_metrics]),
> +                    np.mean([m["index_time"] for m in baseline_metrics]),
> +                    np.mean([m["query_qps"] for m in baseline_metrics]),
> +                ]
>  
> -        # Create radar chart
> -        fig, ax = plt.subplots(figsize=(10, 8), subplot_kw=dict(projection="polar"))
> +                dev_avg = [
> +                    np.mean([m["insert_rate"] for m in dev_metrics]),
> +                    np.mean([m["index_time"] for m in dev_metrics]),
> +                    np.mean([m["query_qps"] for m in dev_metrics]),
> +                ]
>  
> -        angles = np.linspace(0, 2 * np.pi, len(numeric_cols), endpoint=False).tolist()
> -        angles += angles[:1]  # Complete the circle
> +                x = np.arange(len(categories))
> +                width = 0.35
>  
> -        for i, row in df.iterrows():
> -            values = [row.get(f"{col}_norm", 0) for col in numeric_cols]
> -            values += values[:1]  # Complete the circle
> +                bars1 = ax.bar(
> +                    x - width / 2,
> +                    baseline_avg,
> +                    width,
> +                    label="Baseline",
> +                    color="#4CAF50",
> +                )
> +                bars2 = ax.bar(
> +                    x + width / 2, dev_avg, width, label="Development", color="#2196F3"
> +                )
>  
> -            ax.plot(
> -                angles, values, "o-", linewidth=2, label=f'Iteration {row["iteration"]}'
> -            )
> -            ax.fill(angles, values, alpha=0.25)
> +                # Add value labels
> +                for bar, val in zip(bars1, baseline_avg):
> +                    height = bar.get_height()
> +                    ax.text(
> +                        bar.get_x() + bar.get_width() / 2.0,
> +                        height,
> +                        f"{val:.0f}" if val > 100 else f"{val:.2f}",
> +                        ha="center",
> +                        va="bottom",
> +                        fontsize=8,
> +                    )
>  
> -        ax.set_xticks(angles[:-1])
> -        ax.set_xticklabels(["Insert Rate", "Index Time (inv)", "Query QPS"])
> -        ax.set_ylim(0, 1)
> -        ax.set_title("Performance Comparison Matrix (Normalized)", y=1.08)
> -        ax.legend(loc="upper right", bbox_to_anchor=(1.3, 1.0))
> +                for bar, val in zip(bars2, dev_avg):
> +                    height = bar.get_height()
> +                    ax.text(
> +                        bar.get_x() + bar.get_width() / 2.0,
> +                        height,
> +                        f"{val:.0f}" if val > 100 else f"{val:.2f}",
> +                        ha="center",
> +                        va="bottom",
> +                        fontsize=8,
> +                    )
> +
> +                ax.set_xlabel("Metrics")
> +                ax.set_ylabel("Value")
> +                ax.set_title(f"{config_key.upper()}")
> +                ax.set_xticks(x)
> +                ax.set_xticklabels(categories)
> +                ax.legend(loc="upper right", fontsize=8)
> +                ax.grid(True, alpha=0.3, axis="y")
> +            else:
> +                ax.text(
> +                    0.5,
> +                    0.5,
> +                    f"Insufficient data\nfor {config_key}",
> +                    ha="center",
> +                    va="center",
> +                    transform=ax.transAxes,
> +                )
> +                ax.set_title(f"{config_key.upper()}")
> +
> +        # Hide unused subplots
> +        for idx in range(n_configs, n_rows * n_cols):
> +            row = idx // n_cols
> +            col = idx % n_cols
> +            axes[row][col].set_visible(False)
> +
> +        plt.suptitle(
> +            "Performance Comparison Matrix: Baseline vs Development",
> +            fontsize=14,
> +            y=1.02,
> +        )
>  
>          output_file = os.path.join(
>              self.output_dir,
> @@ -898,6 +1477,149 @@ class ResultsAnalyzer:
>          )
>          plt.close()
>  
> +    def _plot_filesystem_comparison(self):
> +        """Plot node performance comparison chart"""
> +        if len(self.results_data) < 2:
> +            return
> +
> +        # Group results by node
> +        node_performance = {}
> +
> +        for result in self.results_data:
> +            hostname, is_dev = self._extract_node_info(result)
> +
> +            if hostname not in node_performance:
> +                node_performance[hostname] = {
> +                    "insert_rates": [],
> +                    "index_times": [],
> +                    "query_qps": [],
> +                    "is_dev": is_dev,
> +                }
> +
> +            # Collect metrics
> +            insert_perf = result.get("insert_performance", {})
> +            if insert_perf:
> +                node_performance[hostname]["insert_rates"].append(
> +                    insert_perf.get("vectors_per_second", 0)
> +                )
> +
> +            index_perf = result.get("index_performance", {})
> +            if index_perf:
> +                node_performance[hostname]["index_times"].append(
> +                    index_perf.get("creation_time_seconds", 0)
> +                )
> +
> +            # Get top-10 batch-1 query performance as representative
> +            query_perf = result.get("query_performance", {})
> +            if "topk_10" in query_perf and "batch_1" in query_perf["topk_10"]:
> +                qps = query_perf["topk_10"]["batch_1"].get("queries_per_second", 0)
> +                node_performance[hostname]["query_qps"].append(qps)
> +
> +        # Only create comparison if we have multiple nodes
> +        if len(node_performance) > 1:
> +            # Calculate averages
> +            node_metrics = {}
> +            for hostname, perf_data in node_performance.items():
> +                node_metrics[hostname] = {
> +                    "avg_insert_rate": (
> +                        np.mean(perf_data["insert_rates"])
> +                        if perf_data["insert_rates"]
> +                        else 0
> +                    ),
> +                    "avg_index_time": (
> +                        np.mean(perf_data["index_times"])
> +                        if perf_data["index_times"]
> +                        else 0
> +                    ),
> +                    "avg_query_qps": (
> +                        np.mean(perf_data["query_qps"]) if perf_data["query_qps"] else 0
> +                    ),
> +                    "is_dev": perf_data["is_dev"],
> +                }
> +
> +            # Create comparison bar chart with more space
> +            fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(24, 8))
> +
> +            # Sort nodes with baseline first
> +            sorted_nodes = sorted(
> +                node_metrics.items(), key=lambda x: (x[1]["is_dev"], x[0])
> +            )
> +            node_names = [hostname for hostname, _ in sorted_nodes]
> +
> +            # Use different colors for baseline vs dev
> +            colors = [
> +                "#4CAF50" if not node_metrics[hostname]["is_dev"] else "#2196F3"
> +                for hostname in node_names
> +            ]
> +
> +            # Add labels for clarity
> +            labels = [
> +                f"{hostname}\n({'Dev' if node_metrics[hostname]['is_dev'] else 'Baseline'})"
> +                for hostname in node_names
> +            ]
> +
> +            # Insert rate comparison
> +            insert_rates = [
> +                node_metrics[hostname]["avg_insert_rate"] for hostname in node_names
> +            ]
> +            bars1 = ax1.bar(labels, insert_rates, color=colors)
> +            ax1.set_title("Average Milvus Insert Rate by Node")
> +            ax1.set_ylabel("Vectors/Second")
> +            # Rotate labels for better readability
> +            ax1.set_xticklabels(labels, rotation=45, ha="right", fontsize=8)
> +
> +            # Index time comparison (lower is better)
> +            index_times = [
> +                node_metrics[hostname]["avg_index_time"] for hostname in node_names
> +            ]
> +            bars2 = ax2.bar(labels, index_times, color=colors)
> +            ax2.set_title("Average Milvus Index Time by Node")
> +            ax2.set_ylabel("Seconds (Lower is Better)")
> +            ax2.set_xticklabels(labels, rotation=45, ha="right", fontsize=8)
> +
> +            # Query QPS comparison
> +            query_qps = [
> +                node_metrics[hostname]["avg_query_qps"] for hostname in node_names
> +            ]
> +            bars3 = ax3.bar(labels, query_qps, color=colors)
> +            ax3.set_title("Average Milvus Query QPS by Node")
> +            ax3.set_ylabel("Queries/Second")
> +            ax3.set_xticklabels(labels, rotation=45, ha="right", fontsize=8)
> +
> +            # Add value labels on bars
> +            for bars, values in [
> +                (bars1, insert_rates),
> +                (bars2, index_times),
> +                (bars3, query_qps),
> +            ]:
> +                for bar, value in zip(bars, values):
> +                    height = bar.get_height()
> +                    ax = bar.axes
> +                    ax.text(
> +                        bar.get_x() + bar.get_width() / 2.0,
> +                        height + height * 0.01,
> +                        f"{value:.1f}",
> +                        ha="center",
> +                        va="bottom",
> +                        fontsize=10,
> +                    )
> +
> +            plt.suptitle(
> +                "Milvus Performance Comparison: Baseline vs Development Nodes",
> +                fontsize=16,
> +                y=1.02,
> +            )
> +            plt.tight_layout()
> +
> +            output_file = os.path.join(
> +                self.output_dir,
> +                f"filesystem_comparison.{self.config.get('graph_format', 'png')}",
> +            )
> +            plt.savefig(
> +                output_file, dpi=self.config.get("graph_dpi", 300), bbox_inches="tight"
> +            )
> +            plt.close()
> +
>      def analyze(self) -> bool:
>          """Run complete analysis"""
>          self.logger.info("Starting results analysis...")
> diff --git a/workflows/ai/scripts/generate_graphs.py b/workflows/ai/scripts/generate_graphs.py
> index 2e183e86..fafc62bf 100755
> --- a/workflows/ai/scripts/generate_graphs.py
> +++ b/workflows/ai/scripts/generate_graphs.py
> @@ -9,7 +9,6 @@ import sys
>  import glob
>  import numpy as np
>  import matplotlib
> -
>  matplotlib.use("Agg")  # Use non-interactive backend
>  import matplotlib.pyplot as plt
>  from datetime import datetime
> @@ -17,6 +16,66 @@ from pathlib import Path
>  from collections import defaultdict
>  
>  
> +def _extract_filesystem_config(result):
> +    """Extract filesystem type and block size from result data.
> +    Returns (fs_type, block_size, config_key)"""
> +    filename = result.get("_file", "")
> +
> +    # Primary: Extract filesystem type from filename (more reliable than JSON)
> +    fs_type = "unknown"
> +    block_size = "default"
> +
> +    if "xfs" in filename:
> +        fs_type = "xfs"
> +        # Check larger sizes first to avoid substring matches
> +        if "64k" in filename and "64k-" in filename:
> +            block_size = "64k"
> +        elif "32k" in filename and "32k-" in filename:
> +            block_size = "32k"
> +        elif "16k" in filename and "16k-" in filename:
> +            block_size = "16k"
> +        elif "4k" in filename and "4k-" in filename:
> +            block_size = "4k"
> +    elif "ext4" in filename:
> +        fs_type = "ext4"
> +        if "4k" in filename and "4k-" in filename:
> +            block_size = "4k"
> +        elif "16k" in filename and "16k-" in filename:
> +            block_size = "16k"
> +    elif "btrfs" in filename:
> +        fs_type = "btrfs"
> +
> +    # Fallback: Check JSON data if filename parsing failed
> +    if fs_type == "unknown":
> +        fs_type = result.get("filesystem", "unknown")
> +
> +    # Create descriptive config key
> +    config_key = f"{fs_type}-{block_size}" if block_size != "default" else fs_type
> +    return fs_type, block_size, config_key
> +
> +
> +def _extract_node_info(result):
> +    """Extract node hostname and determine if it's a dev node.
> +    Returns (hostname, is_dev_node)"""
> +    # Get hostname from system_info (preferred) or fall back to filename
> +    system_info = result.get("system_info", {})
> +    hostname = system_info.get("hostname", "")
> +    
> +    # If no hostname in system_info, try extracting from filename
> +    if not hostname:
> +        filename = result.get("_file", "")
> +        # Remove results_ prefix and .json suffix
> +        hostname = filename.replace("results_", "").replace(".json", "")
> +        # Remove iteration number if present (_1, _2, etc.)
> +        if "_" in hostname and hostname.split("_")[-1].isdigit():
> +            hostname = "_".join(hostname.split("_")[:-1])
> +    
> +    # Determine if this is a dev node
> +    is_dev = hostname.endswith("-dev")
> +    
> +    return hostname, is_dev
> +
> +
>  def load_results(results_dir):
>      """Load all JSON result files from the directory"""
>      results = []
> @@ -27,63 +86,8 @@ def load_results(results_dir):
>          try:
>              with open(json_file, "r") as f:
>                  data = json.load(f)
> -                # Extract filesystem info - prefer from JSON data over filename
> -                filename = os.path.basename(json_file)
> -
> -                # First, try to get filesystem from the JSON data itself
> -                fs_type = data.get("filesystem", None)
> -
> -                # If not in JSON, try to parse from filename (backwards compatibility)
> -                if not fs_type:
> -                    parts = (
> -                        filename.replace("results_", "").replace(".json", "").split("-")
> -                    )
> -
> -                    # Parse host info
> -                    if "debian13-ai-" in filename:
> -                        host_parts = (
> -                            filename.replace("results_debian13-ai-", "")
> -                            .replace("_1.json", "")
> -                            .replace("_2.json", "")
> -                            .replace("_3.json", "")
> -                            .split("-")
> -                        )
> -                        if "xfs" in host_parts[0]:
> -                            fs_type = "xfs"
> -                            # Extract block size (e.g., "4k", "16k", etc.)
> -                            block_size = (
> -                                host_parts[1] if len(host_parts) > 1 else "unknown"
> -                            )
> -                        elif "ext4" in host_parts[0]:
> -                            fs_type = "ext4"
> -                            block_size = host_parts[1] if len(host_parts) > 1 else "4k"
> -                        elif "btrfs" in host_parts[0]:
> -                            fs_type = "btrfs"
> -                            block_size = "default"
> -                        else:
> -                            fs_type = "unknown"
> -                            block_size = "unknown"
> -                    else:
> -                        fs_type = "unknown"
> -                        block_size = "unknown"
> -                else:
> -                    # If filesystem came from JSON, set appropriate block size
> -                    if fs_type == "btrfs":
> -                        block_size = "default"
> -                    elif fs_type in ["ext4", "xfs"]:
> -                        block_size = data.get("block_size", "4k")
> -                    else:
> -                        block_size = data.get("block_size", "default")
> -
> -                is_dev = "dev" in filename
> -
> -                # Use filesystem from JSON if available, otherwise use parsed value
> -                if "filesystem" not in data:
> -                    data["filesystem"] = fs_type
> -                data["block_size"] = block_size
> -                data["is_dev"] = is_dev
> -                data["filename"] = filename
> -
> +                # Add filename for filesystem detection
> +                data["_file"] = os.path.basename(json_file)
>                  results.append(data)
>          except Exception as e:
>              print(f"Error loading {json_file}: {e}")
> @@ -91,1023 +95,240 @@ def load_results(results_dir):
>      return results
>  
>  
> -def create_filesystem_comparison_chart(results, output_dir):
> -    """Create a bar chart comparing performance across filesystems"""
> -    # Group by filesystem and baseline/dev
> -    fs_data = defaultdict(lambda: {"baseline": [], "dev": []})
> -
> -    for result in results:
> -        fs = result.get("filesystem", "unknown")
> -        category = "dev" if result.get("is_dev", False) else "baseline"
> -
> -        # Extract actual performance data from results
> -        if "insert_performance" in result:
> -            insert_qps = result["insert_performance"].get("vectors_per_second", 0)
> -        else:
> -            insert_qps = 0
> -        fs_data[fs][category].append(insert_qps)
> -
> -    # Prepare data for plotting
> -    filesystems = list(fs_data.keys())
> -    baseline_means = [
> -        np.mean(fs_data[fs]["baseline"]) if fs_data[fs]["baseline"] else 0
> -        for fs in filesystems
> -    ]
> -    dev_means = [
> -        np.mean(fs_data[fs]["dev"]) if fs_data[fs]["dev"] else 0 for fs in filesystems
> -    ]
> -
> -    x = np.arange(len(filesystems))
> -    width = 0.35
> -
> -    fig, ax = plt.subplots(figsize=(10, 6))
> -    baseline_bars = ax.bar(
> -        x - width / 2, baseline_means, width, label="Baseline", color="#1f77b4"
> -    )
> -    dev_bars = ax.bar(
> -        x + width / 2, dev_means, width, label="Development", color="#ff7f0e"
> -    )
> -
> -    ax.set_xlabel("Filesystem")
> -    ax.set_ylabel("Insert QPS")
> -    ax.set_title("Vector Database Performance by Filesystem")
> -    ax.set_xticks(x)
> -    ax.set_xticklabels(filesystems)
> -    ax.legend()
> -    ax.grid(True, alpha=0.3)
> -
> -    # Add value labels on bars
> -    for bars in [baseline_bars, dev_bars]:
> -        for bar in bars:
> -            height = bar.get_height()
> -            if height > 0:
> -                ax.annotate(
> -                    f"{height:.0f}",
> -                    xy=(bar.get_x() + bar.get_width() / 2, height),
> -                    xytext=(0, 3),
> -                    textcoords="offset points",
> -                    ha="center",
> -                    va="bottom",
> -                )
> -
> -    plt.tight_layout()
> -    plt.savefig(os.path.join(output_dir, "filesystem_comparison.png"), dpi=150)
> -    plt.close()
> -
> -
> -def create_block_size_analysis(results, output_dir):
> -    """Create analysis for different block sizes (XFS specific)"""
> -    # Filter XFS results
> -    xfs_results = [r for r in results if r.get("filesystem") == "xfs"]
> -
> -    if not xfs_results:
> +def create_simple_performance_trends(results, output_dir):
> +    """Create multi-node performance trends chart"""
> +    if not results:
>          return
>  
> -    # Group by block size
> -    block_size_data = defaultdict(lambda: {"baseline": [], "dev": []})
> -
> -    for result in xfs_results:
> -        block_size = result.get("block_size", "unknown")
> -        category = "dev" if result.get("is_dev", False) else "baseline"
> -        if "insert_performance" in result:
> -            insert_qps = result["insert_performance"].get("vectors_per_second", 0)
> -        else:
> -            insert_qps = 0
> -        block_size_data[block_size][category].append(insert_qps)
> -
> -    # Sort block sizes
> -    block_sizes = sorted(
> -        block_size_data.keys(),
> -        key=lambda x: (
> -            int(x.replace("k", "").replace("s", ""))
> -            if x not in ["unknown", "default"]
> -            else 0
> -        ),
> -    )
> -
> -    # Create grouped bar chart
> -    baseline_means = [
> -        (
> -            np.mean(block_size_data[bs]["baseline"])
> -            if block_size_data[bs]["baseline"]
> -            else 0
> -        )
> -        for bs in block_sizes
> -    ]
> -    dev_means = [
> -        np.mean(block_size_data[bs]["dev"]) if block_size_data[bs]["dev"] else 0
> -        for bs in block_sizes
> -    ]
> -
> -    x = np.arange(len(block_sizes))
> -    width = 0.35
> -
> -    fig, ax = plt.subplots(figsize=(12, 6))
> -    baseline_bars = ax.bar(
> -        x - width / 2, baseline_means, width, label="Baseline", color="#2ca02c"
> -    )
> -    dev_bars = ax.bar(
> -        x + width / 2, dev_means, width, label="Development", color="#d62728"
> -    )
> -
> -    ax.set_xlabel("Block Size")
> -    ax.set_ylabel("Insert QPS")
> -    ax.set_title("XFS Performance by Block Size")
> -    ax.set_xticks(x)
> -    ax.set_xticklabels(block_sizes)
> -    ax.legend()
> -    ax.grid(True, alpha=0.3)
> -
> -    # Add value labels
> -    for bars in [baseline_bars, dev_bars]:
> -        for bar in bars:
> -            height = bar.get_height()
> -            if height > 0:
> -                ax.annotate(
> -                    f"{height:.0f}",
> -                    xy=(bar.get_x() + bar.get_width() / 2, height),
> -                    xytext=(0, 3),
> -                    textcoords="offset points",
> -                    ha="center",
> -                    va="bottom",
> -                )
> -
> -    plt.tight_layout()
> -    plt.savefig(os.path.join(output_dir, "xfs_block_size_analysis.png"), dpi=150)
> -    plt.close()
> -
> -
> -def create_heatmap_analysis(results, output_dir):
> -    """Create a heatmap showing AVERAGE performance across all test iterations"""
> -    # Group data by configuration and version, collecting ALL values for averaging
> -    config_data = defaultdict(
> -        lambda: {
> -            "baseline": {"insert": [], "query": [], "count": 0},
> -            "dev": {"insert": [], "query": [], "count": 0},
> -        }
> -    )
> +    # Group results by node
> +    node_performance = defaultdict(lambda: {
> +        "insert_rates": [],
> +        "insert_times": [],
> +        "iterations": [],
> +        "is_dev": False,
> +    })
>  
>      for result in results:
> -        fs = result.get("filesystem", "unknown")
> -        block_size = result.get("block_size", "default")
> -        config = f"{fs}-{block_size}"
> -        version = "dev" if result.get("is_dev", False) else "baseline"
> -
> -        # Get actual insert performance
> -        if "insert_performance" in result:
> -            insert_qps = result["insert_performance"].get("vectors_per_second", 0)
> -        else:
> -            insert_qps = 0
> -
> -        # Calculate average query QPS
> -        query_qps = 0
> -        if "query_performance" in result:
> -            qp = result["query_performance"]
> -            total_qps = 0
> -            count = 0
> -            for topk_key in ["topk_1", "topk_10", "topk_100"]:
> -                if topk_key in qp:
> -                    for batch_key in ["batch_1", "batch_10", "batch_100"]:
> -                        if batch_key in qp[topk_key]:
> -                            total_qps += qp[topk_key][batch_key].get(
> -                                "queries_per_second", 0
> -                            )
> -                            count += 1
> -            if count > 0:
> -                query_qps = total_qps / count
> -
> -        # Collect all values for averaging
> -        config_data[config][version]["insert"].append(insert_qps)
> -        config_data[config][version]["query"].append(query_qps)
> -        config_data[config][version]["count"] += 1
> -
> -    # Sort configurations
> -    configs = sorted(config_data.keys())
> -
> -    # Calculate averages for heatmap
> -    insert_baseline = []
> -    insert_dev = []
> -    query_baseline = []
> -    query_dev = []
> -    iteration_counts = {"baseline": 0, "dev": 0}
> -
> -    for c in configs:
> -        # Calculate average insert QPS
> -        baseline_insert_vals = config_data[c]["baseline"]["insert"]
> -        insert_baseline.append(
> -            np.mean(baseline_insert_vals) if baseline_insert_vals else 0
> -        )
> -
> -        dev_insert_vals = config_data[c]["dev"]["insert"]
> -        insert_dev.append(np.mean(dev_insert_vals) if dev_insert_vals else 0)
> -
> -        # Calculate average query QPS
> -        baseline_query_vals = config_data[c]["baseline"]["query"]
> -        query_baseline.append(
> -            np.mean(baseline_query_vals) if baseline_query_vals else 0
> -        )
> -
> -        dev_query_vals = config_data[c]["dev"]["query"]
> -        query_dev.append(np.mean(dev_query_vals) if dev_query_vals else 0)
> -
> -        # Track iteration counts
> -        iteration_counts["baseline"] = max(
> -            iteration_counts["baseline"], len(baseline_insert_vals)
> -        )
> -        iteration_counts["dev"] = max(iteration_counts["dev"], len(dev_insert_vals))
> -
> -    # Create figure with custom heatmap
> -    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 8))
> -
> -    # Create data matrices
> -    insert_data = np.array([insert_baseline, insert_dev]).T
> -    query_data = np.array([query_baseline, query_dev]).T
> -
> -    # Insert QPS heatmap
> -    im1 = ax1.imshow(insert_data, cmap="YlOrRd", aspect="auto")
> -    ax1.set_xticks([0, 1])
> -    ax1.set_xticklabels(["Baseline", "Development"])
> -    ax1.set_yticks(range(len(configs)))
> -    ax1.set_yticklabels(configs)
> -    ax1.set_title(
> -        f"Insert Performance - AVERAGE across {iteration_counts['baseline']} iterations\n(1M vectors, 128 dims, HNSW index)"
> -    )
> -    ax1.set_ylabel("Configuration")
> -
> -    # Add text annotations with dynamic color based on background
> -    # Get the colormap to determine actual colors
> -    cmap1 = plt.cm.YlOrRd
> -    norm1 = plt.Normalize(vmin=insert_data.min(), vmax=insert_data.max())
> -
> -    for i in range(len(configs)):
> -        for j in range(2):
> -            # Get the actual color from the colormap
> -            val = insert_data[i, j]
> -            rgba = cmap1(norm1(val))
> -            # Calculate luminance using standard formula
> -            # Perceived luminance: 0.299*R + 0.587*G + 0.114*B
> -            luminance = 0.299 * rgba[0] + 0.587 * rgba[1] + 0.114 * rgba[2]
> -            # Use white text on dark backgrounds (low luminance)
> -            text_color = "white" if luminance < 0.5 else "black"
> +        hostname, is_dev = _extract_node_info(result)
> +        
> +        if hostname not in node_performance:
> +            node_performance[hostname] = {
> +                "insert_rates": [],
> +                "insert_times": [],
> +                "iterations": [],
> +                "is_dev": is_dev,
> +            }
>  
> -            # Show average value with indicator
> -            text = ax1.text(
> -                j,
> -                i,
> -                f"{int(insert_data[i, j])}\n(avg)",
> -                ha="center",
> -                va="center",
> -                color=text_color,
> -                fontweight="bold",
> -                fontsize=9,
> +        insert_perf = result.get("insert_performance", {})
> +        if insert_perf:
> +            node_performance[hostname]["insert_rates"].append(
> +                insert_perf.get("vectors_per_second", 0)
>              )
> -
> -    # Add colorbar
> -    cbar1 = plt.colorbar(im1, ax=ax1)
> -    cbar1.set_label("Insert QPS")
> -
> -    # Query QPS heatmap
> -    im2 = ax2.imshow(query_data, cmap="YlGnBu", aspect="auto")
> -    ax2.set_xticks([0, 1])
> -    ax2.set_xticklabels(["Baseline", "Development"])
> -    ax2.set_yticks(range(len(configs)))
> -    ax2.set_yticklabels(configs)
> -    ax2.set_title(
> -        f"Query Performance - AVERAGE across {iteration_counts['dev']} iterations\n(1M vectors, 128 dims, HNSW index)"
> -    )
> -
> -    # Add text annotations with dynamic color based on background
> -    # Get the colormap to determine actual colors
> -    cmap2 = plt.cm.YlGnBu
> -    norm2 = plt.Normalize(vmin=query_data.min(), vmax=query_data.max())
> -
> -    for i in range(len(configs)):
> -        for j in range(2):
> -            # Get the actual color from the colormap
> -            val = query_data[i, j]
> -            rgba = cmap2(norm2(val))
> -            # Calculate luminance using standard formula
> -            # Perceived luminance: 0.299*R + 0.587*G + 0.114*B
> -            luminance = 0.299 * rgba[0] + 0.587 * rgba[1] + 0.114 * rgba[2]
> -            # Use white text on dark backgrounds (low luminance)
> -            text_color = "white" if luminance < 0.5 else "black"
> -
> -            # Show average value with indicator
> -            text = ax2.text(
> -                j,
> -                i,
> -                f"{int(query_data[i, j])}\n(avg)",
> -                ha="center",
> -                va="center",
> -                color=text_color,
> -                fontweight="bold",
> -                fontsize=9,
> +            fs_performance[config_key]["insert_times"].append(
> +                insert_perf.get("total_time_seconds", 0)
> +            )
> +            fs_performance[config_key]["iterations"].append(
> +                len(fs_performance[config_key]["insert_rates"])
>              )
>  
> -    # Add colorbar
> -    cbar2 = plt.colorbar(im2, ax=ax2)
> -    cbar2.set_label("Query QPS")
> -
> -    # Add overall figure title
> -    fig.suptitle(
> -        "Performance Heatmap - Showing AVERAGES across Multiple Test Iterations",
> -        fontsize=14,
> -        fontweight="bold",
> -        y=1.02,
> -    )
> -
> -    plt.tight_layout()
> -    plt.savefig(
> -        os.path.join(output_dir, "performance_heatmap.png"),
> -        dpi=150,
> -        bbox_inches="tight",
> -    )
> -    plt.close()
> -
> -
> -def create_performance_trends(results, output_dir):
> -    """Create line charts showing performance trends"""
> -    # Group by filesystem type
> -    fs_types = defaultdict(
> -        lambda: {
> -            "configs": [],
> -            "baseline_insert": [],
> -            "dev_insert": [],
> -            "baseline_query": [],
> -            "dev_query": [],
> -        }
> -    )
> -
> -    for result in results:
> -        fs = result.get("filesystem", "unknown")
> -        block_size = result.get("block_size", "default")
> -        config = f"{block_size}"
> -
> -        if config not in fs_types[fs]["configs"]:
> -            fs_types[fs]["configs"].append(config)
> -            fs_types[fs]["baseline_insert"].append(0)
> -            fs_types[fs]["dev_insert"].append(0)
> -            fs_types[fs]["baseline_query"].append(0)
> -            fs_types[fs]["dev_query"].append(0)
> -
> -        idx = fs_types[fs]["configs"].index(config)
> -
> -        # Calculate average query QPS from all test configurations
> -        query_qps = 0
> -        if "query_performance" in result:
> -            qp = result["query_performance"]
> -            total_qps = 0
> -            count = 0
> -            for topk_key in ["topk_1", "topk_10", "topk_100"]:
> -                if topk_key in qp:
> -                    for batch_key in ["batch_1", "batch_10", "batch_100"]:
> -                        if batch_key in qp[topk_key]:
> -                            total_qps += qp[topk_key][batch_key].get(
> -                                "queries_per_second", 0
> -                            )
> -                            count += 1
> -            if count > 0:
> -                query_qps = total_qps / count
> -
> -        if result.get("is_dev", False):
> -            if "insert_performance" in result:
> -                fs_types[fs]["dev_insert"][idx] = result["insert_performance"].get(
> -                    "vectors_per_second", 0
> -                )
> -            fs_types[fs]["dev_query"][idx] = query_qps
> -        else:
> -            if "insert_performance" in result:
> -                fs_types[fs]["baseline_insert"][idx] = result["insert_performance"].get(
> -                    "vectors_per_second", 0
> -                )
> -            fs_types[fs]["baseline_query"][idx] = query_qps
> -
> -    # Create separate plots for each filesystem
> -    for fs, data in fs_types.items():
> -        if not data["configs"]:
> -            continue
> -
> -        fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 10))
> -
> -        x = range(len(data["configs"]))
> -
> -        # Insert performance
> -        ax1.plot(
> -            x,
> -            data["baseline_insert"],
> -            "o-",
> -            label="Baseline",
> -            linewidth=2,
> -            markersize=8,
> -        )
> -        ax1.plot(
> -            x, data["dev_insert"], "s-", label="Development", linewidth=2, markersize=8
> -        )
> -        ax1.set_xlabel("Configuration")
> -        ax1.set_ylabel("Insert QPS")
> -        ax1.set_title(f"{fs.upper()} Insert Performance")
> -        ax1.set_xticks(x)
> -        ax1.set_xticklabels(data["configs"])
> -        ax1.legend()
> +    # Check if we have multi-filesystem data
> +    if len(fs_performance) > 1:
> +        # Multi-filesystem mode: separate lines for each filesystem
> +        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
> +        
> +        colors = ["b", "r", "g", "m", "c", "y", "k"]
> +        color_idx = 0
> +        
> +        for config_key, perf_data in fs_performance.items():
> +            if not perf_data["insert_rates"]:
> +                continue
> +                
> +            color = colors[color_idx % len(colors)]
> +            iterations = list(range(1, len(perf_data["insert_rates"]) + 1))
> +            
> +            # Plot insert rate  
> +            ax1.plot(
> +                iterations,
> +                perf_data["insert_rates"], 
> +                f"{color}-o",
> +                linewidth=2,
> +                markersize=6,
> +                label=config_key.upper(),
> +            )
> +            
> +            # Plot insert time
> +            ax2.plot(
> +                iterations,
> +                perf_data["insert_times"],
> +                f"{color}-o", 
> +                linewidth=2,
> +                markersize=6,
> +                label=config_key.upper(),
> +            )
> +            
> +            color_idx += 1
> +            
> +        ax1.set_xlabel("Iteration")
> +        ax1.set_ylabel("Vectors/Second")
> +        ax1.set_title("Milvus Insert Rate by Storage Filesystem")
>          ax1.grid(True, alpha=0.3)
> -
> -        # Query performance
> -        ax2.plot(
> -            x, data["baseline_query"], "o-", label="Baseline", linewidth=2, markersize=8
> -        )
> -        ax2.plot(
> -            x, data["dev_query"], "s-", label="Development", linewidth=2, markersize=8
> -        )
> -        ax2.set_xlabel("Configuration")
> -        ax2.set_ylabel("Query QPS")
> -        ax2.set_title(f"{fs.upper()} Query Performance")
> -        ax2.set_xticks(x)
> -        ax2.set_xticklabels(data["configs"])
> -        ax2.legend()
> +        ax1.legend()
> +        
> +        ax2.set_xlabel("Iteration")
> +        ax2.set_ylabel("Total Time (seconds)")
> +        ax2.set_title("Milvus Insert Time by Storage Filesystem")
>          ax2.grid(True, alpha=0.3)
> -
> -        plt.tight_layout()
> -        plt.savefig(os.path.join(output_dir, f"{fs}_performance_trends.png"), dpi=150)
> -        plt.close()
> -
> -
> -def create_simple_performance_trends(results, output_dir):
> -    """Create a simple performance trends chart for basic Milvus testing"""
> -    if not results:
> -        return
> -
> -    # Extract configuration from first result for display
> -    config_text = ""
> -    if results:
> -        first_result = results[0]
> -        if "config" in first_result:
> -            cfg = first_result["config"]
> -            config_text = (
> -                f"Test Config:\n"
> -                f"• {cfg.get('vector_dataset_size', 'N/A'):,} vectors/iteration\n"
> -                f"• {cfg.get('vector_dimensions', 'N/A')} dimensions\n"
> -                f"• {cfg.get('index_type', 'N/A')} index"
> +        ax2.legend()
> +    else:
> +        # Single filesystem mode: original behavior
> +        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
> +        
> +        # Extract insert data from single filesystem
> +        config_key = list(fs_performance.keys())[0] if fs_performance else None
> +        if config_key:
> +            perf_data = fs_performance[config_key]
> +            iterations = list(range(1, len(perf_data["insert_rates"]) + 1))
> +            
> +            # Plot insert rate
> +            ax1.plot(
> +                iterations,
> +                perf_data["insert_rates"],
> +                "b-o",
> +                linewidth=2,
> +                markersize=6,
>              )
> -
> -    # Separate baseline and dev results
> -    baseline_results = [r for r in results if not r.get("is_dev", False)]
> -    dev_results = [r for r in results if r.get("is_dev", False)]
> -
> -    if not baseline_results and not dev_results:
> -        return
> -
> -    fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 10))
> -
> -    # Prepare data
> -    baseline_insert = []
> -    baseline_query = []
> -    dev_insert = []
> -    dev_query = []
> -    labels = []
> -
> -    # Process baseline results
> -    for i, result in enumerate(baseline_results):
> -        if "insert_performance" in result:
> -            baseline_insert.append(
> -                result["insert_performance"].get("vectors_per_second", 0)
> +            ax1.set_xlabel("Iteration")
> +            ax1.set_ylabel("Vectors/Second") 
> +            ax1.set_title("Vector Insert Rate Performance")
> +            ax1.grid(True, alpha=0.3)
> +            
> +            # Plot insert time
> +            ax2.plot(
> +                iterations,
> +                perf_data["insert_times"],
> +                "r-o",
> +                linewidth=2,
> +                markersize=6,
>              )
> -        else:
> -            baseline_insert.append(0)
> -
> -        # Calculate average query QPS
> -        query_qps = 0
> -        if "query_performance" in result:
> -            qp = result["query_performance"]
> -            total_qps = 0
> -            count = 0
> -            for topk_key in ["topk_1", "topk_10", "topk_100"]:
> -                if topk_key in qp:
> -                    for batch_key in ["batch_1", "batch_10", "batch_100"]:
> -                        if batch_key in qp[topk_key]:
> -                            total_qps += qp[topk_key][batch_key].get(
> -                                "queries_per_second", 0
> -                            )
> -                            count += 1
> -            if count > 0:
> -                query_qps = total_qps / count
> -        baseline_query.append(query_qps)
> -        labels.append(f"Iteration {i+1}")
> -
> -    # Process dev results
> -    for result in dev_results:
> -        if "insert_performance" in result:
> -            dev_insert.append(result["insert_performance"].get("vectors_per_second", 0))
> -        else:
> -            dev_insert.append(0)
> -
> -        query_qps = 0
> -        if "query_performance" in result:
> -            qp = result["query_performance"]
> -            total_qps = 0
> -            count = 0
> -            for topk_key in ["topk_1", "topk_10", "topk_100"]:
> -                if topk_key in qp:
> -                    for batch_key in ["batch_1", "batch_10", "batch_100"]:
> -                        if batch_key in qp[topk_key]:
> -                            total_qps += qp[topk_key][batch_key].get(
> -                                "queries_per_second", 0
> -                            )
> -                            count += 1
> -            if count > 0:
> -                query_qps = total_qps / count
> -        dev_query.append(query_qps)
> -
> -    x = range(len(baseline_results) if baseline_results else len(dev_results))
> -
> -    # Insert performance - with visible markers for all points
> -    if baseline_insert:
> -        # Line plot with smaller markers
> -        ax1.plot(
> -            x,
> -            baseline_insert,
> -            "-",
> -            label="Baseline",
> -            linewidth=1.5,
> -            color="blue",
> -            alpha=0.6,
> -        )
> -        # Add distinct markers for each point
> -        ax1.scatter(
> -            x,
> -            baseline_insert,
> -            s=30,
> -            color="blue",
> -            alpha=0.8,
> -            edgecolors="darkblue",
> -            linewidth=0.5,
> -            zorder=5,
> -        )
> -    if dev_insert:
> -        # Line plot with smaller markers
> -        ax1.plot(
> -            x[: len(dev_insert)],
> -            dev_insert,
> -            "-",
> -            label="Development",
> -            linewidth=1.5,
> -            color="red",
> -            alpha=0.6,
> -        )
> -        # Add distinct markers for each point
> -        ax1.scatter(
> -            x[: len(dev_insert)],
> -            dev_insert,
> -            s=30,
> -            color="red",
> -            alpha=0.8,
> -            edgecolors="darkred",
> -            linewidth=0.5,
> -            marker="s",
> -            zorder=5,
> -        )
> -    ax1.set_xlabel("Test Iteration (same configuration, repeated for reliability)")
> -    ax1.set_ylabel("Insert QPS (vectors/second)")
> -    ax1.set_title("Milvus Insert Performance")
> -
> -    # Handle x-axis labels to prevent overlap
> -    num_points = len(x)
> -    if num_points > 20:
> -        # Show every 5th label for many iterations
> -        step = 5
> -        tick_positions = list(range(0, num_points, step))
> -        tick_labels = [
> -            labels[i] if labels else f"Iteration {i+1}" for i in tick_positions
> -        ]
> -        ax1.set_xticks(tick_positions)
> -        ax1.set_xticklabels(tick_labels, rotation=45, ha="right")
> -    elif num_points > 10:
> -        # Show every 2nd label for moderate iterations
> -        step = 2
> -        tick_positions = list(range(0, num_points, step))
> -        tick_labels = [
> -            labels[i] if labels else f"Iteration {i+1}" for i in tick_positions
> -        ]
> -        ax1.set_xticks(tick_positions)
> -        ax1.set_xticklabels(tick_labels, rotation=45, ha="right")
> -    else:
> -        # Show all labels for few iterations
> -        ax1.set_xticks(x)
> -        ax1.set_xticklabels(labels if labels else [f"Iteration {i+1}" for i in x])
> -
> -    ax1.legend()
> -    ax1.grid(True, alpha=0.3)
> -
> -    # Add configuration text box - compact
> -    if config_text:
> -        ax1.text(
> -            0.02,
> -            0.98,
> -            config_text,
> -            transform=ax1.transAxes,
> -            fontsize=6,
> -            verticalalignment="top",
> -            bbox=dict(boxstyle="round,pad=0.3", facecolor="wheat", alpha=0.85),
> -        )
> -
> -    # Query performance - with visible markers for all points
> -    if baseline_query:
> -        # Line plot
> -        ax2.plot(
> -            x,
> -            baseline_query,
> -            "-",
> -            label="Baseline",
> -            linewidth=1.5,
> -            color="blue",
> -            alpha=0.6,
> -        )
> -        # Add distinct markers for each point
> -        ax2.scatter(
> -            x,
> -            baseline_query,
> -            s=30,
> -            color="blue",
> -            alpha=0.8,
> -            edgecolors="darkblue",
> -            linewidth=0.5,
> -            zorder=5,
> -        )
> -    if dev_query:
> -        # Line plot
> -        ax2.plot(
> -            x[: len(dev_query)],
> -            dev_query,
> -            "-",
> -            label="Development",
> -            linewidth=1.5,
> -            color="red",
> -            alpha=0.6,
> -        )
> -        # Add distinct markers for each point
> -        ax2.scatter(
> -            x[: len(dev_query)],
> -            dev_query,
> -            s=30,
> -            color="red",
> -            alpha=0.8,
> -            edgecolors="darkred",
> -            linewidth=0.5,
> -            marker="s",
> -            zorder=5,
> -        )
> -    ax2.set_xlabel("Test Iteration (same configuration, repeated for reliability)")
> -    ax2.set_ylabel("Query QPS (queries/second)")
> -    ax2.set_title("Milvus Query Performance")
> -
> -    # Handle x-axis labels to prevent overlap
> -    num_points = len(x)
> -    if num_points > 20:
> -        # Show every 5th label for many iterations
> -        step = 5
> -        tick_positions = list(range(0, num_points, step))
> -        tick_labels = [
> -            labels[i] if labels else f"Iteration {i+1}" for i in tick_positions
> -        ]
> -        ax2.set_xticks(tick_positions)
> -        ax2.set_xticklabels(tick_labels, rotation=45, ha="right")
> -    elif num_points > 10:
> -        # Show every 2nd label for moderate iterations
> -        step = 2
> -        tick_positions = list(range(0, num_points, step))
> -        tick_labels = [
> -            labels[i] if labels else f"Iteration {i+1}" for i in tick_positions
> -        ]
> -        ax2.set_xticks(tick_positions)
> -        ax2.set_xticklabels(tick_labels, rotation=45, ha="right")
> -    else:
> -        # Show all labels for few iterations
> -        ax2.set_xticks(x)
> -        ax2.set_xticklabels(labels if labels else [f"Iteration {i+1}" for i in x])
> -
> -    ax2.legend()
> -    ax2.grid(True, alpha=0.3)
> -
> -    # Add configuration text box - compact
> -    if config_text:
> -        ax2.text(
> -            0.02,
> -            0.98,
> -            config_text,
> -            transform=ax2.transAxes,
> -            fontsize=6,
> -            verticalalignment="top",
> -            bbox=dict(boxstyle="round,pad=0.3", facecolor="wheat", alpha=0.85),
> -        )
> -
> +            ax2.set_xlabel("Iteration")
> +            ax2.set_ylabel("Total Time (seconds)")
> +            ax2.set_title("Vector Insert Time Performance") 
> +            ax2.grid(True, alpha=0.3)
> +            
>      plt.tight_layout()
>      plt.savefig(os.path.join(output_dir, "performance_trends.png"), dpi=150)
>      plt.close()
>  
>  
> -def generate_summary_statistics(results, output_dir):
> -    """Generate summary statistics and save to JSON"""
> -    # Get unique filesystems, excluding "unknown"
> -    filesystems = set()
> -    for r in results:
> -        fs = r.get("filesystem", "unknown")
> -        if fs != "unknown":
> -            filesystems.add(fs)
> -
> -    summary = {
> -        "total_tests": len(results),
> -        "filesystems_tested": sorted(list(filesystems)),
> -        "configurations": {},
> -        "performance_summary": {
> -            "best_insert_qps": {"value": 0, "config": ""},
> -            "best_query_qps": {"value": 0, "config": ""},
> -            "average_insert_qps": 0,
> -            "average_query_qps": 0,
> -        },
> -    }
> -
> -    # Calculate statistics
> -    all_insert_qps = []
> -    all_query_qps = []
> -
> -    for result in results:
> -        fs = result.get("filesystem", "unknown")
> -        block_size = result.get("block_size", "default")
> -        is_dev = "dev" if result.get("is_dev", False) else "baseline"
> -        config_name = f"{fs}-{block_size}-{is_dev}"
> -
> -        # Get actual performance metrics
> -        if "insert_performance" in result:
> -            insert_qps = result["insert_performance"].get("vectors_per_second", 0)
> -        else:
> -            insert_qps = 0
> -
> -        # Calculate average query QPS
> -        query_qps = 0
> -        if "query_performance" in result:
> -            qp = result["query_performance"]
> -            total_qps = 0
> -            count = 0
> -            for topk_key in ["topk_1", "topk_10", "topk_100"]:
> -                if topk_key in qp:
> -                    for batch_key in ["batch_1", "batch_10", "batch_100"]:
> -                        if batch_key in qp[topk_key]:
> -                            total_qps += qp[topk_key][batch_key].get(
> -                                "queries_per_second", 0
> -                            )
> -                            count += 1
> -            if count > 0:
> -                query_qps = total_qps / count
> -
> -        all_insert_qps.append(insert_qps)
> -        all_query_qps.append(query_qps)
> -
> -        summary["configurations"][config_name] = {
> -            "insert_qps": insert_qps,
> -            "query_qps": query_qps,
> -            "host": result.get("host", "unknown"),
> -        }
> -
> -        if insert_qps > summary["performance_summary"]["best_insert_qps"]["value"]:
> -            summary["performance_summary"]["best_insert_qps"] = {
> -                "value": insert_qps,
> -                "config": config_name,
> -            }
> -
> -        if query_qps > summary["performance_summary"]["best_query_qps"]["value"]:
> -            summary["performance_summary"]["best_query_qps"] = {
> -                "value": query_qps,
> -                "config": config_name,
> -            }
> -
> -    summary["performance_summary"]["average_insert_qps"] = (
> -        np.mean(all_insert_qps) if all_insert_qps else 0
> -    )
> -    summary["performance_summary"]["average_query_qps"] = (
> -        np.mean(all_query_qps) if all_query_qps else 0
> -    )
> -
> -    # Save summary
> -    with open(os.path.join(output_dir, "summary.json"), "w") as f:
> -        json.dump(summary, f, indent=2)
> -
> -    return summary
> -
> -
> -def create_comprehensive_fs_comparison(results, output_dir):
> -    """Create comprehensive filesystem performance comparison including all configurations"""
> -    import matplotlib.pyplot as plt
> -    import numpy as np
> -    from collections import defaultdict
> -
> -    # Collect data for all filesystem configurations
> -    config_data = defaultdict(lambda: {"baseline": [], "dev": []})
> -
> -    for result in results:
> -        fs = result.get("filesystem", "unknown")
> -        block_size = result.get("block_size", "")
> -
> -        # Create configuration label
> -        if block_size and block_size != "default":
> -            config_label = f"{fs}-{block_size}"
> -        else:
> -            config_label = fs
> -
> -        category = "dev" if result.get("is_dev", False) else "baseline"
> -
> -        # Extract performance metrics
> -        if "insert_performance" in result:
> -            insert_qps = result["insert_performance"].get("vectors_per_second", 0)
> -        else:
> -            insert_qps = 0
> -
> -        config_data[config_label][category].append(insert_qps)
> -
> -    # Sort configurations for consistent display
> -    configs = sorted(config_data.keys())
> -
> -    # Calculate means and standard deviations
> -    baseline_means = []
> -    baseline_stds = []
> -    dev_means = []
> -    dev_stds = []
> -
> -    for config in configs:
> -        baseline_vals = config_data[config]["baseline"]
> -        dev_vals = config_data[config]["dev"]
> -
> -        baseline_means.append(np.mean(baseline_vals) if baseline_vals else 0)
> -        baseline_stds.append(np.std(baseline_vals) if baseline_vals else 0)
> -        dev_means.append(np.mean(dev_vals) if dev_vals else 0)
> -        dev_stds.append(np.std(dev_vals) if dev_vals else 0)
> -
> -    # Create the plot
> -    fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(14, 10))
> -
> -    x = np.arange(len(configs))
> -    width = 0.35
> -
> -    # Top plot: Absolute performance
> -    baseline_bars = ax1.bar(
> -        x - width / 2,
> -        baseline_means,
> -        width,
> -        yerr=baseline_stds,
> -        label="Baseline",
> -        color="#1f77b4",
> -        capsize=5,
> -    )
> -    dev_bars = ax1.bar(
> -        x + width / 2,
> -        dev_means,
> -        width,
> -        yerr=dev_stds,
> -        label="Development",
> -        color="#ff7f0e",
> -        capsize=5,
> -    )
> -
> -    ax1.set_ylabel("Insert QPS")
> -    ax1.set_title("Vector Database Performance Across Filesystem Configurations")
> -    ax1.set_xticks(x)
> -    ax1.set_xticklabels(configs, rotation=45, ha="right")
> -    ax1.legend()
> -    ax1.grid(True, alpha=0.3)
> -
> -    # Add value labels on bars
> -    for bars in [baseline_bars, dev_bars]:
> -        for bar in bars:
> -            height = bar.get_height()
> -            if height > 0:
> -                ax1.annotate(
> -                    f"{height:.0f}",
> -                    xy=(bar.get_x() + bar.get_width() / 2, height),
> -                    xytext=(0, 3),
> -                    textcoords="offset points",
> -                    ha="center",
> -                    va="bottom",
> -                    fontsize=8,
> -                )
> -
> -    # Bottom plot: Percentage improvement (dev vs baseline)
> -    improvements = []
> -    for i in range(len(configs)):
> -        if baseline_means[i] > 0:
> -            improvement = ((dev_means[i] - baseline_means[i]) / baseline_means[i]) * 100
> -        else:
> -            improvement = 0
> -        improvements.append(improvement)
> -
> -    colors = ["green" if x > 0 else "red" for x in improvements]
> -    improvement_bars = ax2.bar(x, improvements, color=colors, alpha=0.7)
> -
> -    ax2.set_ylabel("Performance Change (%)")
> -    ax2.set_title("Development vs Baseline Performance Change")
> -    ax2.set_xticks(x)
> -    ax2.set_xticklabels(configs, rotation=45, ha="right")
> -    ax2.axhline(y=0, color="black", linestyle="-", linewidth=0.5)
> -    ax2.grid(True, alpha=0.3)
> -
> -    # Add percentage labels
> -    for bar, val in zip(improvement_bars, improvements):
> -        ax2.annotate(
> -            f"{val:.1f}%",
> -            xy=(bar.get_x() + bar.get_width() / 2, val),
> -            xytext=(0, 3 if val > 0 else -15),
> -            textcoords="offset points",
> -            ha="center",
> -            va="bottom" if val > 0 else "top",
> -            fontsize=8,
> -        )
> -
> -    plt.tight_layout()
> -    plt.savefig(os.path.join(output_dir, "comprehensive_fs_comparison.png"), dpi=150)
> -    plt.close()
> -
> -
> -def create_fs_latency_comparison(results, output_dir):
> -    """Create latency comparison across filesystems"""
> -    import matplotlib.pyplot as plt
> -    import numpy as np
> -    from collections import defaultdict
> -
> -    # Collect latency data
> -    config_latency = defaultdict(lambda: {"baseline": [], "dev": []})
> -
> -    for result in results:
> -        fs = result.get("filesystem", "unknown")
> -        block_size = result.get("block_size", "")
> -
> -        if block_size and block_size != "default":
> -            config_label = f"{fs}-{block_size}"
> -        else:
> -            config_label = fs
> -
> -        category = "dev" if result.get("is_dev", False) else "baseline"
> -
> -        # Extract latency metrics
> -        if "query_performance" in result:
> -            latency_p99 = result["query_performance"].get("latency_p99_ms", 0)
> -        else:
> -            latency_p99 = 0
> -
> -        if latency_p99 > 0:
> -            config_latency[config_label][category].append(latency_p99)
> -
> -    if not config_latency:
> +def create_heatmap_analysis(results, output_dir):
> +    """Create multi-filesystem heatmap showing query performance"""
> +    if not results:
>          return
>  
> -    # Sort configurations
> -    configs = sorted(config_latency.keys())
> -
> -    # Calculate statistics
> -    baseline_p99 = []
> -    dev_p99 = []
> -
> -    for config in configs:
> -        baseline_vals = config_latency[config]["baseline"]
> -        dev_vals = config_latency[config]["dev"]
> -
> -        baseline_p99.append(np.mean(baseline_vals) if baseline_vals else 0)
> -        dev_p99.append(np.mean(dev_vals) if dev_vals else 0)
> -
> -    # Create plot
> -    fig, ax = plt.subplots(figsize=(12, 6))
> -
> -    x = np.arange(len(configs))
> -    width = 0.35
> -
> -    baseline_bars = ax.bar(
> -        x - width / 2, baseline_p99, width, label="Baseline P99", color="#9467bd"
> -    )
> -    dev_bars = ax.bar(
> -        x + width / 2, dev_p99, width, label="Development P99", color="#e377c2"
> -    )
> -
> -    ax.set_xlabel("Filesystem Configuration")
> -    ax.set_ylabel("Latency P99 (ms)")
> -    ax.set_title("Query Latency (P99) Comparison Across Filesystems")
> -    ax.set_xticks(x)
> -    ax.set_xticklabels(configs, rotation=45, ha="right")
> -    ax.legend()
> -    ax.grid(True, alpha=0.3)
> -
> -    # Add value labels
> -    for bars in [baseline_bars, dev_bars]:
> -        for bar in bars:
> -            height = bar.get_height()
> -            if height > 0:
> -                ax.annotate(
> -                    f"{height:.1f}",
> -                    xy=(bar.get_x() + bar.get_width() / 2, height),
> -                    xytext=(0, 3),
> -                    textcoords="offset points",
> -                    ha="center",
> -                    va="bottom",
> -                    fontsize=8,
> -                )
> +    # Group data by filesystem configuration
> +    fs_performance = defaultdict(lambda: {
> +        "query_data": [],
> +        "config_key": "",
> +    })
>  
> +    for result in results:
> +        fs_type, block_size, config_key = _extract_filesystem_config(result)
> +        
> +        query_perf = result.get("query_performance", {})
> +        for topk, topk_data in query_perf.items():
> +            for batch, batch_data in topk_data.items():
> +                qps = batch_data.get("queries_per_second", 0)
> +                fs_performance[config_key]["query_data"].append({
> +                    "topk": topk,
> +                    "batch": batch,
> +                    "qps": qps,
> +                })
> +                fs_performance[config_key]["config_key"] = config_key
> +
> +    # Check if we have multi-filesystem data
> +    if len(fs_performance) > 1:
> +        # Multi-filesystem mode: separate heatmaps for each filesystem
> +        num_fs = len(fs_performance)
> +        fig, axes = plt.subplots(1, num_fs, figsize=(5*num_fs, 6))
> +        if num_fs == 1:
> +            axes = [axes]
> +        
> +        # Define common structure for consistency
> +        topk_order = ["topk_1", "topk_10", "topk_100"]
> +        batch_order = ["batch_1", "batch_10", "batch_100"]
> +        
> +        for idx, (config_key, perf_data) in enumerate(fs_performance.items()):
> +            # Create matrix for this filesystem
> +            matrix = np.zeros((len(topk_order), len(batch_order)))
> +            
> +            # Fill matrix with data
> +            query_dict = {}
> +            for item in perf_data["query_data"]:
> +                query_dict[(item["topk"], item["batch"])] = item["qps"]
> +                
> +            for i, topk in enumerate(topk_order):
> +                for j, batch in enumerate(batch_order):
> +                    matrix[i, j] = query_dict.get((topk, batch), 0)
> +            
> +            # Plot heatmap
> +            im = axes[idx].imshow(matrix, cmap='viridis', aspect='auto')
> +            axes[idx].set_title(f"{config_key.upper()} Query Performance")
> +            axes[idx].set_xticks(range(len(batch_order)))
> +            axes[idx].set_xticklabels([b.replace("batch_", "Batch ") for b in batch_order])
> +            axes[idx].set_yticks(range(len(topk_order)))
> +            axes[idx].set_yticklabels([t.replace("topk_", "Top-") for t in topk_order])
> +            
> +            # Add text annotations
> +            for i in range(len(topk_order)):
> +                for j in range(len(batch_order)):
> +                    axes[idx].text(j, i, f'{matrix[i, j]:.0f}',
> +                                 ha="center", va="center", color="white", fontweight="bold")
> +            
> +            # Add colorbar
> +            cbar = plt.colorbar(im, ax=axes[idx])
> +            cbar.set_label('Queries Per Second (QPS)')
> +    else:
> +        # Single filesystem mode
> +        fig, ax = plt.subplots(1, 1, figsize=(8, 6))
> +        
> +        if fs_performance:
> +            config_key = list(fs_performance.keys())[0]
> +            perf_data = fs_performance[config_key]
> +            
> +            # Create matrix
> +            topk_order = ["topk_1", "topk_10", "topk_100"]
> +            batch_order = ["batch_1", "batch_10", "batch_100"]
> +            matrix = np.zeros((len(topk_order), len(batch_order)))
> +            
> +            # Fill matrix with data
> +            query_dict = {}
> +            for item in perf_data["query_data"]:
> +                query_dict[(item["topk"], item["batch"])] = item["qps"]
> +                
> +            for i, topk in enumerate(topk_order):
> +                for j, batch in enumerate(batch_order):
> +                    matrix[i, j] = query_dict.get((topk, batch), 0)
> +            
> +            # Plot heatmap
> +            im = ax.imshow(matrix, cmap='viridis', aspect='auto')
> +            ax.set_title("Milvus Query Performance Heatmap")
> +            ax.set_xticks(range(len(batch_order)))
> +            ax.set_xticklabels([b.replace("batch_", "Batch ") for b in batch_order])
> +            ax.set_yticks(range(len(topk_order)))
> +            ax.set_yticklabels([t.replace("topk_", "Top-") for t in topk_order])
> +            
> +            # Add text annotations
> +            for i in range(len(topk_order)):
> +                for j in range(len(batch_order)):
> +                    ax.text(j, i, f'{matrix[i, j]:.0f}',
> +                           ha="center", va="center", color="white", fontweight="bold")
> +            
> +            # Add colorbar
> +            cbar = plt.colorbar(im, ax=ax)
> +            cbar.set_label('Queries Per Second (QPS)')
> +    
>      plt.tight_layout()
> -    plt.savefig(os.path.join(output_dir, "filesystem_latency_comparison.png"), dpi=150)
> +    plt.savefig(os.path.join(output_dir, "performance_heatmap.png"), dpi=150, bbox_inches="tight")
>      plt.close()
>  
>  
> @@ -1119,56 +340,23 @@ def main():
>      results_dir = sys.argv[1]
>      output_dir = sys.argv[2]
>  
> -    # Create output directory
> +    # Ensure output directory exists
>      os.makedirs(output_dir, exist_ok=True)
>  
>      # Load results
>      results = load_results(results_dir)
> -
>      if not results:
> -        print("No results found to analyze")
> +        print(f"No valid results found in {results_dir}")
>          sys.exit(1)
>  
>      print(f"Loaded {len(results)} result files")
>  
>      # Generate graphs
> -    print("Generating performance heatmap...")
> -    create_heatmap_analysis(results, output_dir)
> -
> -    print("Generating performance trends...")
>      create_simple_performance_trends(results, output_dir)
> +    create_heatmap_analysis(results, output_dir)
>  
> -    print("Generating summary statistics...")
> -    summary = generate_summary_statistics(results, output_dir)
> -
> -    # Check if we have multiple filesystems to compare
> -    filesystems = set(r.get("filesystem", "unknown") for r in results)
> -    if len(filesystems) > 1:
> -        print("Generating filesystem comparison chart...")
> -        create_filesystem_comparison_chart(results, output_dir)
> -
> -        print("Generating comprehensive filesystem comparison...")
> -        create_comprehensive_fs_comparison(results, output_dir)
> -
> -        print("Generating filesystem latency comparison...")
> -        create_fs_latency_comparison(results, output_dir)
> -
> -        # Check if we have XFS results with different block sizes
> -        xfs_results = [r for r in results if r.get("filesystem") == "xfs"]
> -        block_sizes = set(r.get("block_size", "unknown") for r in xfs_results)
> -        if len(block_sizes) > 1:
> -            print("Generating XFS block size analysis...")
> -            create_block_size_analysis(results, output_dir)
> -
> -    print(f"\nAnalysis complete! Graphs saved to {output_dir}")
> -    print(f"Total configurations tested: {summary['total_tests']}")
> -    print(
> -        f"Best insert QPS: {summary['performance_summary']['best_insert_qps']['value']} ({summary['performance_summary']['best_insert_qps']['config']})"
> -    )
> -    print(
> -        f"Best query QPS: {summary['performance_summary']['best_query_qps']['value']} ({summary['performance_summary']['best_query_qps']['config']})"
> -    )
> +    print(f"Graphs generated in {output_dir}")
>  
>  
>  if __name__ == "__main__":
> -    main()
> +    main()
> \ No newline at end of file
> diff --git a/workflows/ai/scripts/generate_html_report.py b/workflows/ai/scripts/generate_html_report.py
> index 3aa8342f..01ec734c 100755
> --- a/workflows/ai/scripts/generate_html_report.py
> +++ b/workflows/ai/scripts/generate_html_report.py
> @@ -180,7 +180,7 @@ HTML_TEMPLATE = """
>  </head>
>  <body>
>      <div class="header">
> -        <h1>AI Vector Database Benchmark Results</h1>
> +        <h1>Milvus Vector Database Benchmark Results</h1>
>          <div class="subtitle">Generated on {timestamp}</div>
>      </div>
>      
> @@ -238,11 +238,13 @@ HTML_TEMPLATE = """
>      </div>
>      
>      <div id="detailed-results" class="section">
> -        <h2>Detailed Results Table</h2>
> +        <h2>Milvus Performance by Storage Filesystem</h2>
> +        <p>This table shows how Milvus vector database performs when its data is stored on different filesystem types and configurations.</p>
>          <table class="results-table">
>              <thead>
>                  <tr>
> -                    <th>Host</th>
> +                    <th>Filesystem</th>
> +                    <th>Configuration</th>
>                      <th>Type</th>
>                      <th>Insert QPS</th>
>                      <th>Query QPS</th>
> @@ -293,27 +295,53 @@ def load_results(results_dir):
>                  # Get filesystem from JSON data
>                  fs_type = data.get("filesystem", None)
>  
> -                # If not in JSON, try to parse from filename (backwards compatibility)
> -                if not fs_type and "debian13-ai" in filename:
> -                    host_parts = (
> -                        filename.replace("results_debian13-ai-", "")
> -                        .replace("_1.json", "")
> +                # Always try to parse from filename first since JSON data might be wrong
> +                if "-ai-" in filename:
> +                    # Handle both debian13-ai- and prod-ai- prefixes
> +                    cleaned_filename = filename.replace("results_", "")
> +
> +                    # Extract the part after -ai-
> +                    if "debian13-ai-" in cleaned_filename:
> +                        host_part = cleaned_filename.replace("debian13-ai-", "")
> +                    elif "prod-ai-" in cleaned_filename:
> +                        host_part = cleaned_filename.replace("prod-ai-", "")
> +                    else:
> +                        # Generic extraction
> +                        ai_index = cleaned_filename.find("-ai-")
> +                        if ai_index != -1:
> +                            host_part = cleaned_filename[ai_index + 4 :]  # Skip "-ai-"
> +                        else:
> +                            host_part = cleaned_filename
> +
> +                    # Remove file extensions and dev suffix
> +                    host_part = (
> +                        host_part.replace("_1.json", "")
>                          .replace("_2.json", "")
>                          .replace("_3.json", "")
> -                        .split("-")
> +                        .replace("-dev", "")
>                      )
> -                    if "xfs" in host_parts[0]:
> +
> +                    # Parse filesystem type and block size
> +                    if host_part.startswith("xfs-"):
>                          fs_type = "xfs"
> -                        block_size = host_parts[1] if len(host_parts) > 1 else "4k"
> -                    elif "ext4" in host_parts[0]:
> +                        # Extract block size: xfs-4k-4ks -> 4k
> +                        parts = host_part.split("-")
> +                        if len(parts) >= 2:
> +                            block_size = parts[1]  # 4k, 16k, 32k, 64k
> +                        else:
> +                            block_size = "4k"
> +                    elif host_part.startswith("ext4-"):
>                          fs_type = "ext4"
> -                        block_size = host_parts[1] if len(host_parts) > 1 else "4k"
> -                    elif "btrfs" in host_parts[0]:
> +                        parts = host_part.split("-")
> +                        block_size = parts[1] if len(parts) > 1 else "4k"
> +                    elif host_part.startswith("btrfs"):
>                          fs_type = "btrfs"
>                          block_size = "default"
>                      else:
> -                        fs_type = "unknown"
> -                        block_size = "unknown"
> +                        # Fallback to JSON data if available
> +                        if not fs_type:
> +                            fs_type = "unknown"
> +                            block_size = "unknown"
>                  else:
>                      # Set appropriate block size based on filesystem
>                      if fs_type == "btrfs":
> @@ -371,12 +399,36 @@ def generate_table_rows(results, best_configs):
>          if config_key in best_configs:
>              row_class += " best-config"
>  
> +        # Generate descriptive labels showing Milvus is running on this filesystem
> +        if result["filesystem"] == "xfs" and result["block_size"] != "default":
> +            storage_label = f"XFS {result['block_size'].upper()}"
> +            config_details = f"Block size: {result['block_size']}, Milvus data on XFS"
> +        elif result["filesystem"] == "ext4":
> +            storage_label = "EXT4"
> +            if "bigalloc" in result.get("host", "").lower():
> +                config_details = "EXT4 with bigalloc, Milvus data on ext4"
> +            else:
> +                config_details = (
> +                    f"Block size: {result['block_size']}, Milvus data on ext4"
> +                )
> +        elif result["filesystem"] == "btrfs":
> +            storage_label = "BTRFS"
> +            config_details = "Default Btrfs settings, Milvus data on Btrfs"
> +        else:
> +            storage_label = result["filesystem"].upper()
> +            config_details = f"Milvus data on {result['filesystem']}"
> +
> +        # Extract clean node identifier from hostname
> +        node_name = result["host"].replace("results_", "").replace(".json", "")
> +
>          row = f"""
>          <tr class="{row_class}">
> -            <td>{result['host']}</td>
> +            <td><strong>{storage_label}</strong></td>
> +            <td>{config_details}</td>
>              <td>{result['type']}</td>
>              <td>{result['insert_qps']:,}</td>
>              <td>{result['query_qps']:,}</td>
> +            <td><code>{node_name}</code></td>
>              <td>{result['timestamp']}</td>
>          </tr>
>          """
> @@ -483,8 +535,8 @@ def generate_html_report(results_dir, graphs_dir, output_path):
>              <li><a href="#block-size-analysis">Block Size Analysis</a></li>"""
>  
>          filesystem_comparison_section = """<div id="filesystem-comparison" class="section">
> -        <h2>Filesystem Performance Comparison</h2>
> -        <p>Comparison of vector database performance across different filesystems, showing both baseline and development kernel results.</p>
> +        <h2>Milvus Storage Filesystem Comparison</h2>
> +        <p>Comparison of Milvus vector database performance when its data is stored on different filesystem types (XFS, ext4, Btrfs) with various configurations.</p>
>          <div class="graph-container">
>              <img src="graphs/filesystem_comparison.png" alt="Filesystem Comparison">
>          </div>
> @@ -499,9 +551,9 @@ def generate_html_report(results_dir, graphs_dir, output_path):
>      </div>"""
>  
>          # Multi-fs mode: show filesystem info
> -        fourth_card_title = "Filesystems Tested"
> +        fourth_card_title = "Storage Filesystems"
>          fourth_card_value = str(len(filesystems_tested))
> -        fourth_card_label = ", ".join(filesystems_tested).upper()
> +        fourth_card_label = ", ".join(filesystems_tested).upper() + " for Milvus Data"
>      else:
>          # Single filesystem mode - hide multi-fs sections
>          filesystem_nav_items = ""


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] ai: add multi-filesystem testing support for Milvus benchmarks
  2025-08-27 14:47   ` Chuck Lever
@ 2025-08-27 19:24     ` Luis Chamberlain
  0 siblings, 0 replies; 8+ messages in thread
From: Luis Chamberlain @ 2025-08-27 19:24 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Daniel Gomez, hui81.qi, kundan.kumar, kdevops

On Wed, Aug 27, 2025 at 10:47:51AM -0400, Chuck Lever wrote:
> On 8/27/25 5:32 AM, Luis Chamberlain wrote:
> > Extend the AI workflow to support testing Milvus across multiple
> > filesystem configurations simultaneously. This enables comprehensive
> > performance comparisons between different filesystems and their
> > configuration options.
> > 
> > Key features:
> > - Dynamic node generation based on enabled filesystem configurations
> > - Support for XFS, EXT4, and BTRFS with various mount options
> > - Per-filesystem result collection and analysis
> > - A/B testing across all filesystem configurations
> > - Automated comparison graphs between filesystems
> > 
> > Filesystem configurations:
> > - XFS: default, nocrc, bigtime with various block sizes (512, 1k, 2k, 4k)
> > - EXT4: default, nojournal, bigalloc configurations
> > - BTRFS: default, zlib, lzo, zstd compression options
> > 
> > Defconfigs:
> > - ai-milvus-multifs: Test 7 filesystem configs with A/B testing
> > - ai-milvus-multifs-distro: Test with distribution kernels
> > - ai-milvus-multifs-extended: Extended configs (14 filesystems total)
> > 
> > Node generation:
> > The system dynamically generates nodes based on enabled filesystem
> > configurations. With A/B testing enabled, this creates baseline and
> > dev nodes for each filesystem (e.g., debian13-ai-xfs-4k and
> > debian13-ai-xfs-4k-dev).
> > 
> > Usage:
> >   make defconfig-ai-milvus-multifs
> >   make bringup    # Creates nodes for each filesystem
> >   make ai         # Setup infrastructure on all nodes
> >   make ai-tests   # Run benchmarks on all filesystems
> >   make ai-results # Collect and compare results
> > 
> > This enables systematic evaluation of how different filesystems and
> > their configurations affect vector database performance.
> > 
> > Generated-by: Claude AI
> > Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> 
> Hey Luis -
> 
> I'm looking at adding "AI optimized" and "GPU optimized" machine size
> choices in the cloud provider Kconfigs. I assume this set can take
> advantage of those. Any suggestions, or let me know if I'm way off base.

That would be next to add variability support, so patches welcomed
on top! Some docs:

https://milvus.io/docs/v2.3.x/install_standalone-helm-gpu.md
https://milvus.io/docs/gpu_index.md
https://milvus.io/docs/gpu-cagra.md

  Luis

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/2] kdevops: add milvus with minio support
  2025-08-27  9:31 [PATCH 0/2] kdevops: add milvus with minio support Luis Chamberlain
  2025-08-27  9:32 ` [PATCH 1/2] ai: add Milvus vector database benchmarking support Luis Chamberlain
  2025-08-27  9:32 ` [PATCH 2/2] ai: add multi-filesystem testing support for Milvus benchmarks Luis Chamberlain
@ 2025-08-29  2:05 ` Luis Chamberlain
  2 siblings, 0 replies; 8+ messages in thread
From: Luis Chamberlain @ 2025-08-29  2:05 UTC (permalink / raw)
  To: Chuck Lever, Daniel Gomez, hui81.qi, kundan.kumar, kdevops

On Wed, Aug 27, 2025 at 02:31:59AM -0700, Luis Chamberlain wrote:
> This adds the ability to test milvus on minio with different filesystem
> configuration targets. There's a basic configuration you can run which
> will just support one default filesystem, which will be used where you
> place your docker image and also where we place the minio instance. Then
> there is multifs support where just as with fstests support on kdevops
> you can select a slew of different filesystem targets to try to test.
> 
> Recommendation is to stick to 40 iterations at 1,000,000 tests unless
> you have more than 100 GiB per guest to spare. If you have space to
> spare then you know how to ballpark it.
> 
> On High Capacity SSDs, the world is our oyster.
> 
> You can see a demo of results here:
> 
> https://github.com/mcgrof/demo-milvus-kdevops-results
> 
> These are just demos. On guests. Nothing really useful.
> I should point out this has AB testing automated as well so we can
> leverage this to test for instance ... parallel writeback in an
> automated way ;)
> 
> If you want to test this you can also use this branch on kdevops:
> 
> https://github.com/linux-kdevops/kdevops/tree/mcgrof/20250827-milvus
> 
> I am in hopes someone will just prompt an AI for bare metal support
> while I sleep. It should be... easy. Just create the partitions already,
> use one host and ask the prompt to not mkfs for you. So don't use
> multi-fs support at first. Just use the option to create the storage
> partition where you place docker. In fact you can copy and paste this
> prompt the the AI, and I think it will know what to do. You just skip
> some steps as the filesystems can be created and mounted for you. You
> just need the host file created by you manually for the target node.
> That and infer user and group id support (WORKFLOW_INFER_USER_AND_GROUP).

Pushed.

  Luis

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] ai: add multi-filesystem testing support for Milvus benchmarks
  2025-08-27  9:32 ` [PATCH 2/2] ai: add multi-filesystem testing support for Milvus benchmarks Luis Chamberlain
  2025-08-27 14:47   ` Chuck Lever
@ 2025-09-01 20:11   ` Daniel Gomez
  2025-09-01 20:27     ` Luis Chamberlain
  1 sibling, 1 reply; 8+ messages in thread
From: Daniel Gomez @ 2025-09-01 20:11 UTC (permalink / raw)
  To: Luis Chamberlain, Chuck Lever, Daniel Gomez, hui81.qi,
	kundan.kumar, kdevops

On 27/08/2025 11.32, Luis Chamberlain wrote:
> Extend the AI workflow to support testing Milvus across multiple
> filesystem configurations simultaneously. This enables comprehensive
> performance comparisons between different filesystems and their
> configuration options.
> 
> Key features:
> - Dynamic node generation based on enabled filesystem configurations
> - Support for XFS, EXT4, and BTRFS with various mount options
> - Per-filesystem result collection and analysis
> - A/B testing across all filesystem configurations
> - Automated comparison graphs between filesystems
> 
> Filesystem configurations:
> - XFS: default, nocrc, bigtime with various block sizes (512, 1k, 2k, 4k)
> - EXT4: default, nojournal, bigalloc configurations
> - BTRFS: default, zlib, lzo, zstd compression options
> 
> Defconfigs:
> - ai-milvus-multifs: Test 7 filesystem configs with A/B testing
> - ai-milvus-multifs-distro: Test with distribution kernels
> - ai-milvus-multifs-extended: Extended configs (14 filesystems total)
> 
> Node generation:
> The system dynamically generates nodes based on enabled filesystem
> configurations. With A/B testing enabled, this creates baseline and
> dev nodes for each filesystem (e.g., debian13-ai-xfs-4k and
> debian13-ai-xfs-4k-dev).
> 
> Usage:
>   make defconfig-ai-milvus-multifs
>   make bringup    # Creates nodes for each filesystem
>   make ai         # Setup infrastructure on all nodes
>   make ai-tests   # Run benchmarks on all filesystems
>   make ai-results # Collect and compare results
> 
> This enables systematic evaluation of how different filesystems and
> their configurations affect vector database performance.
> 
> Generated-by: Claude AI
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---

...

> diff --git a/playbooks/roles/gen_hosts/tasks/main.yml b/playbooks/roles/gen_hosts/tasks/main.yml
> index 4b35d9f6..d36790b0 100644
> --- a/playbooks/roles/gen_hosts/tasks/main.yml
> +++ b/playbooks/roles/gen_hosts/tasks/main.yml
> @@ -381,6 +381,25 @@
>      - workflows_reboot_limit
>      - ansible_hosts_template.stat.exists
>  
> +- name: Load AI nodes configuration for multi-filesystem setup
> +  include_vars:
> +    file: "{{ topdir_path }}/{{ kdevops_nodes }}"
> +    name: guestfs_nodes
> +  when:
> +    - kdevops_workflows_dedicated_workflow
> +    - kdevops_workflow_enable_ai
> +    - ai_enable_multifs_testing|default(false)|bool
> +    - ansible_hosts_template.stat.exists
> +
> +- name: Extract AI node names for multi-filesystem setup
> +  set_fact:
> +    all_generic_nodes: "{{ guestfs_nodes.guestfs_nodes | map(attribute='name') | list }}"
> +  when:
> +    - kdevops_workflows_dedicated_workflow
> +    - kdevops_workflow_enable_ai
> +    - ai_enable_multifs_testing|default(false)|bool
> +    - guestfs_nodes is defined
> +
>  - name: Generate the Ansible hosts file for a dedicated AI setup
>    tags: ['hosts']
>    ansible.builtin.template:
> diff --git a/playbooks/roles/gen_hosts/templates/fstests.j2 b/playbooks/roles/gen_hosts/templates/fstests.j2
> index ac086c6e..32d90abf 100644
> --- a/playbooks/roles/gen_hosts/templates/fstests.j2
> +++ b/playbooks/roles/gen_hosts/templates/fstests.j2
> @@ -70,6 +70,7 @@ ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
>  [krb5:vars]
>  ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
>  {% endif %}
> +{% if kdevops_enable_iscsi or kdevops_nfsd_enable or kdevops_smbd_enable or kdevops_krb5_enable %}
>  [service]
>  {% if kdevops_enable_iscsi %}
>  {{ kdevops_hosts_prefix }}-iscsi
> @@ -85,3 +86,4 @@ ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
>  {% endif %}
>  [service:vars]
>  ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
> +{% endif %}
> diff --git a/playbooks/roles/gen_hosts/templates/gitr.j2 b/playbooks/roles/gen_hosts/templates/gitr.j2
> index 7f9094d4..3f30a5fb 100644
> --- a/playbooks/roles/gen_hosts/templates/gitr.j2
> +++ b/playbooks/roles/gen_hosts/templates/gitr.j2
> @@ -38,6 +38,7 @@ ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
>  [nfsd:vars]
>  ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
>  {% endif %}
> +{% if kdevops_enable_iscsi or kdevops_nfsd_enable %}
>  [service]
>  {% if kdevops_enable_iscsi %}
>  {{ kdevops_hosts_prefix }}-iscsi
> @@ -47,3 +48,4 @@ ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
>  {% endif %}
>  [service:vars]
>  ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
> +{% endif %}
> diff --git a/playbooks/roles/gen_hosts/templates/hosts.j2 b/playbooks/roles/gen_hosts/templates/hosts.j2
> index cdcd1883..e9441605 100644
> --- a/playbooks/roles/gen_hosts/templates/hosts.j2
> +++ b/playbooks/roles/gen_hosts/templates/hosts.j2
> @@ -119,39 +119,30 @@ ansible_python_interpreter = "{{ kdevops_python_interpreter }}"
>  [ai:vars]
>  ansible_python_interpreter = "{{ kdevops_python_interpreter }}"
>  
> -{% set fs_configs = [] %}
> +{# Individual section groups for multi-filesystem testing #}
> +{% set section_names = [] %}
>  {% for node in all_generic_nodes %}
> -{% set node_parts = node.split('-') %}
> -{% if node_parts|length >= 3 %}
> -{% set fs_type = node_parts[2] %}
> -{% set fs_config = node_parts[3:] | select('ne', 'dev') | join('_') %}
> -{% set fs_group = fs_type + '_' + fs_config if fs_config else fs_type %}
> -{% if fs_group not in fs_configs %}
> -{% set _ = fs_configs.append(fs_group) %}
> +{% if not node.endswith('-dev') %}
> +{% set section = node.replace(kdevops_host_prefix + '-ai-', '') %}
> +{% if section != kdevops_host_prefix + '-ai' %}
> +{% if section_names.append(section) %}{% endif %}
>  {% endif %}
>  {% endif %}
>  {% endfor %}
>  
> -{% for fs_group in fs_configs %}
> -[ai_{{ fs_group }}]
> -{% for node in all_generic_nodes %}
> -{% set node_parts = node.split('-') %}
> -{% if node_parts|length >= 3 %}
> -{% set fs_type = node_parts[2] %}
> -{% set fs_config = node_parts[3:] | select('ne', 'dev') | join('_') %}
> -{% set node_fs_group = fs_type + '_' + fs_config if fs_config else fs_type %}
> -{% if node_fs_group == fs_group %}
> -{{ node }}
> -{% endif %}
> +{% for section in section_names %}
> +[ai_{{ section | replace('-', '_') }}]
> +{{ kdevops_host_prefix }}-ai-{{ section }}
> +{% if kdevops_baseline_and_dev %}
> +{{ kdevops_host_prefix }}-ai-{{ section }}-dev
>  {% endif %}
> -{% endfor %}
>  
> -[ai_{{ fs_group }}:vars]
> +[ai_{{ section | replace('-', '_') }}:vars]
>  ansible_python_interpreter = "{{ kdevops_python_interpreter }}"
>  
>  {% endfor %}
>  {% else %}
> -{# Single-node AI hosts #}
> +{# Single filesystem hosts (original behavior) #}
>  [all]
>  localhost ansible_connection=local
>  {{ kdevops_host_prefix }}-ai
> diff --git a/playbooks/roles/gen_hosts/templates/nfstest.j2 b/playbooks/roles/gen_hosts/templates/nfstest.j2
> index e427ac34..709d871d 100644
> --- a/playbooks/roles/gen_hosts/templates/nfstest.j2
> +++ b/playbooks/roles/gen_hosts/templates/nfstest.j2
> @@ -38,6 +38,7 @@ ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
>  [nfsd:vars]
>  ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
>  {% endif %}
> +{% if kdevops_enable_iscsi or kdevops_nfsd_enable %}
>  [service]
>  {% if kdevops_enable_iscsi %}
>  {{ kdevops_hosts_prefix }}-iscsi
> @@ -47,3 +48,4 @@ ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
>  {% endif %}
>  [service:vars]
>  ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
> +{% endif %}
> diff --git a/playbooks/roles/gen_hosts/templates/pynfs.j2 b/playbooks/roles/gen_hosts/templates/pynfs.j2
> index 85c87dae..55add4d1 100644
> --- a/playbooks/roles/gen_hosts/templates/pynfs.j2
> +++ b/playbooks/roles/gen_hosts/templates/pynfs.j2
> @@ -23,6 +23,7 @@ ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
>  {{ kdevops_hosts_prefix }}-nfsd
>  [nfsd:vars]
>  ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
> +{% if true %}
>  [service]
>  {% if kdevops_enable_iscsi %}
>  {{ kdevops_hosts_prefix }}-iscsi
> @@ -30,3 +31,4 @@ ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
>  {{ kdevops_hosts_prefix }}-nfsd
>  [service:vars]
>  ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
> +{% endif %}
> diff --git a/playbooks/roles/gen_nodes/tasks/main.yml b/playbooks/roles/gen_nodes/tasks/main.yml
> index d54977be..b294d294 100644
> --- a/playbooks/roles/gen_nodes/tasks/main.yml
> +++ b/playbooks/roles/gen_nodes/tasks/main.yml
> @@ -658,6 +658,7 @@
>      - kdevops_workflow_enable_ai
>      - ansible_nodes_template.stat.exists
>      - not kdevops_baseline_and_dev
> +    - not ai_enable_multifs_testing|default(false)|bool
>  
>  - name: Generate the AI kdevops nodes file with dev hosts using {{ kdevops_nodes_template }} as jinja2 source template
>    tags: ['hosts']
> @@ -675,6 +676,95 @@
>      - kdevops_workflow_enable_ai
>      - ansible_nodes_template.stat.exists
>      - kdevops_baseline_and_dev
> +    - not ai_enable_multifs_testing|default(false)|bool
> +
> +- name: Infer enabled AI multi-filesystem configurations
> +  vars:
> +    kdevops_config_data: "{{ lookup('file', topdir_path + '/.config') }}"
> +    # Find all enabled AI multifs configurations
> +    xfs_configs: >-
> +      {{
> +        kdevops_config_data | regex_findall('^CONFIG_AI_MULTIFS_XFS_(.*)=y$', multiline=True)
> +        | map('lower')
> +        | map('regex_replace', '_', '-')
> +        | map('regex_replace', '^', 'xfs-')
> +        | list
> +        if kdevops_config_data | regex_search('^CONFIG_AI_MULTIFS_TEST_XFS=y$', multiline=True)
> +        else []
> +      }}
> +    ext4_configs: >-
> +      {{
> +        kdevops_config_data | regex_findall('^CONFIG_AI_MULTIFS_EXT4_(.*)=y$', multiline=True)
> +        | map('lower')
> +        | map('regex_replace', '_', '-')
> +        | map('regex_replace', '^', 'ext4-')
> +        | list
> +        if kdevops_config_data | regex_search('^CONFIG_AI_MULTIFS_TEST_EXT4=y$', multiline=True)
> +        else []
> +      }}
> +    btrfs_configs: >-
> +      {{
> +        kdevops_config_data | regex_findall('^CONFIG_AI_MULTIFS_BTRFS_(.*)=y$', multiline=True)
> +        | map('lower')
> +        | map('regex_replace', '_', '-')
> +        | map('regex_replace', '^', 'btrfs-')
> +        | list
> +        if kdevops_config_data | regex_search('^CONFIG_AI_MULTIFS_TEST_BTRFS=y$', multiline=True)
> +        else []
> +      }}
> +  set_fact:
> +    ai_multifs_enabled_configs: "{{ (xfs_configs + ext4_configs + btrfs_configs) | unique }}"
> +  when:
> +    - kdevops_workflows_dedicated_workflow
> +    - kdevops_workflow_enable_ai
> +    - ai_enable_multifs_testing|default(false)|bool
> +    - ansible_nodes_template.stat.exists
> +
> +- name: Create AI nodes for each filesystem configuration (no dev)
> +  vars:
> +    filesystem_nodes: "{{ [kdevops_host_prefix + '-ai-'] | product(ai_multifs_enabled_configs | default([])) | map('join') | list }}"
> +  set_fact:
> +    ai_enabled_section_types: "{{ filesystem_nodes }}"
> +  when:
> +    - kdevops_workflows_dedicated_workflow
> +    - kdevops_workflow_enable_ai
> +    - ai_enable_multifs_testing|default(false)|bool
> +    - ansible_nodes_template.stat.exists
> +    - not kdevops_baseline_and_dev
> +    - ai_multifs_enabled_configs is defined
> +    - ai_multifs_enabled_configs | length > 0
> +
> +- name: Create AI nodes for each filesystem configuration with dev hosts
> +  vars:
> +    filesystem_nodes: "{{ [kdevops_host_prefix + '-ai-'] | product(ai_multifs_enabled_configs | default([])) | map('join') | list }}"
> +  set_fact:
> +    ai_enabled_section_types: "{{ filesystem_nodes | product(['', '-dev']) | map('join') | list }}"
> +  when:
> +    - kdevops_workflows_dedicated_workflow
> +    - kdevops_workflow_enable_ai
> +    - ai_enable_multifs_testing|default(false)|bool
> +    - ansible_nodes_template.stat.exists
> +    - kdevops_baseline_and_dev
> +    - ai_multifs_enabled_configs is defined
> +    - ai_multifs_enabled_configs | length > 0
> +
> +- name: Generate the AI multi-filesystem kdevops nodes file using {{ kdevops_nodes_template }} as jinja2 source template
> +  tags: [ 'hosts' ]
> +  vars:
> +    node_template: "{{ kdevops_nodes_template | basename }}"
> +    nodes: "{{ ai_enabled_section_types | regex_replace('\\[') | regex_replace('\\]') | replace(\"'\", '') | split(', ') }}"
> +    all_generic_nodes: "{{ ai_enabled_section_types }}"
> +  template:
> +    src: "{{ node_template }}"
> +    dest: "{{ topdir_path }}/{{ kdevops_nodes }}"
> +    force: yes
> +  when:
> +    - kdevops_workflows_dedicated_workflow
> +    - kdevops_workflow_enable_ai
> +    - ai_enable_multifs_testing|default(false)|bool
> +    - ansible_nodes_template.stat.exists
> +    - ai_enabled_section_types is defined
> +    - ai_enabled_section_types | length > 0
>  
>  - name: Get the control host's timezone
>    ansible.builtin.command: "timedatectl show -p Timezone --value"
> diff --git a/playbooks/roles/guestfs/tasks/bringup/main.yml b/playbooks/roles/guestfs/tasks/bringup/main.yml
> index c131de25..bd9f5260 100644
> --- a/playbooks/roles/guestfs/tasks/bringup/main.yml
> +++ b/playbooks/roles/guestfs/tasks/bringup/main.yml
> @@ -1,11 +1,16 @@
>  ---
>  - name: List defined libvirt guests
>    run_once: true
> +  delegate_to: localhost
>    community.libvirt.virt:
>      command: list_vms
>      uri: "{{ libvirt_uri }}"
>    register: defined_vms
>  
> +- name: Debug defined VMs
> +  debug:
> +    msg: "Hostname: {{ inventory_hostname }}, Defined VMs: {{ hostvars['localhost']['defined_vms']['list_vms'] | default([]) }}, Check: {{ inventory_hostname not in (hostvars['localhost']['defined_vms']['list_vms'] | default([])) }}"
> +
>  - name: Provision each target node
>    when:
>      - "inventory_hostname not in defined_vms.list_vms"
> @@ -25,10 +30,13 @@
>              path: "{{ ssh_key_dir }}"
>              state: directory
>              mode: "u=rwx"
> +          delegate_to: localhost
>  
>          - name: Generate fresh keys for each target node
>            ansible.builtin.command:
>              cmd: 'ssh-keygen -q -t ed25519 -f {{ ssh_key }} -N ""'
> +            creates: "{{ ssh_key }}"
> +          delegate_to: localhost
>  
>      - name: Set the pathname of the root disk image for each target node
>        ansible.builtin.set_fact:
> @@ -38,15 +46,18 @@
>        ansible.builtin.file:
>          path: "{{ storagedir }}/{{ inventory_hostname }}"
>          state: directory
> +      delegate_to: localhost
>  
>      - name: Duplicate the root disk image for each target node
>        ansible.builtin.command:
>          cmd: "cp --reflink=auto {{ base_image }} {{ root_image }}"
> +      delegate_to: localhost
>  
>      - name: Get the timezone of the control host
>        ansible.builtin.command:
>          cmd: "timedatectl show -p Timezone --value"
>        register: host_timezone
> +      delegate_to: localhost
>  
>      - name: Build the root image for each target node (as root)
>        become: true
> @@ -103,6 +114,7 @@
>          name: "{{ inventory_hostname }}"
>          xml: "{{ lookup('file', xml_file) }}"
>          uri: "{{ libvirt_uri }}"
> +      delegate_to: localhost
>  
>      - name: Find PCIe passthrough devices
>        ansible.builtin.find:
> @@ -110,6 +122,7 @@
>          file_type: file
>          patterns: "pcie_passthrough_*.xml"
>        register: passthrough_devices
> +      delegate_to: localhost
>  
>      - name: Attach PCIe passthrough devices to each target node
>        environment:
> @@ -124,6 +137,7 @@
>        loop: "{{ passthrough_devices.files }}"
>        loop_control:
>          label: "Doing PCI-E passthrough for device {{ item }}"
> +      delegate_to: localhost
>        when:
>          - passthrough_devices.matched > 0
>  
> @@ -142,3 +156,4 @@
>      name: "{{ inventory_hostname }}"
>      uri: "{{ libvirt_uri }}"
>      state: running
> +  delegate_to: localhost
> diff --git a/scripts/guestfs.Makefile b/scripts/guestfs.Makefile
> index bd03f58c..f6c350a4 100644
> --- a/scripts/guestfs.Makefile
> +++ b/scripts/guestfs.Makefile
> @@ -79,7 +79,7 @@ bringup_guestfs: $(GUESTFS_BRINGUP_DEPS)
>  		--extra-vars=@./extra_vars.yaml \
>  		--tags network,pool,base_image
>  	$(Q)ansible-playbook $(ANSIBLE_VERBOSE) \
> -		--limit 'baseline:dev:service' \
> +		--limit 'baseline:dev:service:ai' \

I'm not sure if I understand the need of this new ai group. Can you clarify?

Why aren't baseline or dev groups sufficient for this AI workload?
What's the role of this ai group?

Note: I kept the hunks above to make it easier to reference the part I believe
is most relevant to my questions (hopefully).

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] ai: add multi-filesystem testing support for Milvus benchmarks
  2025-09-01 20:11   ` Daniel Gomez
@ 2025-09-01 20:27     ` Luis Chamberlain
  0 siblings, 0 replies; 8+ messages in thread
From: Luis Chamberlain @ 2025-09-01 20:27 UTC (permalink / raw)
  To: Daniel Gomez; +Cc: Chuck Lever, Daniel Gomez, hui81.qi, kundan.kumar, kdevops

On Mon, Sep 1, 2025 at 1:11 PM Daniel Gomez <da.gomez@kernel.org> wrote:
>
> > diff --git a/scripts/guestfs.Makefile b/scripts/guestfs.Makefile
> > index bd03f58c..f6c350a4 100644
> > --- a/scripts/guestfs.Makefile
> > +++ b/scripts/guestfs.Makefile
> > @@ -79,7 +79,7 @@ bringup_guestfs: $(GUESTFS_BRINGUP_DEPS)
> >               --extra-vars=@./extra_vars.yaml \
> >               --tags network,pool,base_image
> >       $(Q)ansible-playbook $(ANSIBLE_VERBOSE) \
> > -             --limit 'baseline:dev:service' \
> > +             --limit 'baseline:dev:service:ai' \
>
> I'm not sure if I understand the need of this new ai group. Can you clarify?
>
> Why aren't baseline or dev groups sufficient for this AI workload?
> What's the role of this ai group?
>
> Note: I kept the hunks above to make it easier to reference the part I believe
> is most relevant to my questions (hopefully).

No good reason that I can think of, we can nuke it.

  Luis

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-09-01 20:27 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-27  9:31 [PATCH 0/2] kdevops: add milvus with minio support Luis Chamberlain
2025-08-27  9:32 ` [PATCH 1/2] ai: add Milvus vector database benchmarking support Luis Chamberlain
2025-08-27  9:32 ` [PATCH 2/2] ai: add multi-filesystem testing support for Milvus benchmarks Luis Chamberlain
2025-08-27 14:47   ` Chuck Lever
2025-08-27 19:24     ` Luis Chamberlain
2025-09-01 20:11   ` Daniel Gomez
2025-09-01 20:27     ` Luis Chamberlain
2025-08-29  2:05 ` [PATCH 0/2] kdevops: add milvus with minio support Luis Chamberlain

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).