From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 21A6E1CF5C6
	for <kdevops@lists.linux.dev>; Tue, 12 Aug 2025 00:42:54 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1754959378; cv=none; b=dSZxVTrSDh9NlNilLyQ+bsLJ+CutC1eRWxZFOqrc7hPEWKXFSlpxjDmpKwHz67+9DjA94bOZcHMgNQonToP9gC+CkRCxGykSaCzhvVQjN2f4MeL5w1Bbz5v/gYcxUO2J+ASNCV1m0zwDWDcqYfKABUiDDsH0MFZj+12WfGdEetw=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1754959378; c=relaxed/simple;
	bh=ZCMnsy8/UpaEPxkMsM+mydXtBr3yx6bpR910f1Bwir8=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version:Content-Type; b=duhMEMf9sK2CmRPTdIwQ6pTPUbJpqAWRDDdtMhgdPx5vJ0V4tD+97mvbBF7NUpOt/i7XTzSqP2hTpZZnSjCbdeYU5suW8wpVGfoCpjkHB0Kbz78Si5dDaGj2IQ/lBevi/DIM2G2tW3EwBzSJv3v1WQrc0ZPS/KC3fSor/dLeeOc=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=kI9OZSsF; arc=none smtp.client-ip=198.137.202.133
Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org
Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="kI9OZSsF"
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
	d=infradead.org; s=bombadil.20210309; h=Sender:Content-Transfer-Encoding:
	Content-Type:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc:
	To:From:Reply-To:Content-ID:Content-Description;
	bh=9A2E9yO53SPy6TzXxsNGj3wmONvuzaQEbDd33AYjtk8=; b=kI9OZSsFL1z7kLJqqMX33A1ZFz
	0S0OoEUzOkH2gGfIfbeVl2A56EogXgoqrDZLdRGrPrkt81c4OlNAfarIoKHiUh92jOQTovcCBmrrM
	Zi9m30ekYGxIxEGzcWjPYzTSRfjzd1h2M5+/dOaC1gKDF5pq03g5hheKONJrfuveShm0OiHAAbAoK
	scGUSxv+vxDvkmP+HY3sxUaxdWODzwFcUtFWYAwFyhXiCg3+sEiIf0nsaxgtIt/OmkHLMOimkhDn6
	Ivzf+/NiaoBuoG6GeJbNXb5THQ2e74apCJTNviGWPIjk2HAj8iKfMYEjhc2xDfWqnwwjk5iV5JgAy
	bKsvdUyQ==;
Received: from mcgrof by bombadil.infradead.org with local (Exim 4.98.2 #2 (Red Hat Linux))
	id 1uld6L-00000009T2X-3HMB;
	Tue, 12 Aug 2025 00:42:53 +0000
From: Luis Chamberlain <mcgrof@kernel.org>
To: Chuck Lever <cel@kernel.org>,
	Daniel Gomez <da.gomez@kruces.com>,
	kdevops@lists.linux.dev
Cc: Luis Chamberlain <mcgrof@kernel.org>
Subject: [PATCH 2/2] mmtests: add AB testing and comparison support
Date: Mon, 11 Aug 2025 17:42:51 -0700
Message-ID: <20250812004252.2256571-3-mcgrof@kernel.org>
X-Mailer: git-send-email 2.49.0
In-Reply-To: <20250812004252.2256571-1-mcgrof@kernel.org>
References: <20250812004252.2256571-1-mcgrof@kernel.org>
Precedence: bulk
X-Mailing-List: kdevops@lists.linux.dev
List-Id: <kdevops.lists.linux.dev>
List-Subscribe: <mailto:kdevops+subscribe@lists.linux.dev>
List-Unsubscribe: <mailto:kdevops+unsubscribe@lists.linux.dev>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Sender: Luis Chamberlain <mcgrof@infradead.org>

This commit introduces comprehensive A/B testing and comparison
capabilities for the mmtests workflow in kdevops, enabling performance
regression detection between baseline and development kernels.

Key Features Added:
- A/B testing configuration support for mmtests workflow
- Automated performance comparison between baseline and dev nodes
- Visual performance analysis with graph generation
- HTML reports with embedded performance graphs

New Components:
1. Defconfigs:
   - mmtests-ab-testing: Basic A/B testing setup
   - mmtests-ab-testing-thpcompact: Advanced config with monitoring

2. Comparison Infrastructure (playbooks/roles/mmtests_compare/):
   - Automated result collection from baseline and dev nodes
   - Local mmtests repository management with patch support
   - Multiple comparison output formats (HTML, text, graphs)
   - Shell scripts for graph generation and HTML embedding

3. Playbook Integration:
   - mmtests-compare.yml: Orchestrates the comparison workflow
   - Updated mmtests.yml to target mmtests group specifically
   - Enhanced hosts template with localhost and mmtests group

4. Result Visualization:
   - Performance graphs (main, sorted, smoothed trends)
   - Monitor data visualization (vmstat, mpstat, proc stats)
   - Professional HTML reports with embedded graphs
   - Comprehensive comparison tables with statistical analysis

5. Workflow Enhancements:
   - Support for applying patches from fixes directory
   - Python script for advanced graph generation
   - Makefile targets for comparison workflow
   - Results organization in workflows/mmtests/results/

Technical Improvements:
- Added localhost to mmtests hosts template for local operations
- Added dedicated mmtests group definition in hosts template
- Support for applying patches from fixes directory
- Robust error handling in comparison scripts
- Dependency management for Perl and Python tools
- Temporary file management in /tmp for comparisons

Included Patches:
- Fix undefined array reference in mmtests compare
- Fix library order in thpcompact gcc command

The implementation supports the standard kdevops A/B testing pattern
where baseline nodes run the stable kernel and dev nodes run the
development kernel, with automated comparison and visualization of
performance differences between them.

Usage:
  make defconfig-mmtests-ab-testing
  make bringup
  make mmtests
  make mmtests-compare

This enables developers to quickly identify performance regressions
and improvements between kernel versions with professional-quality
reports and visualizations.

Generated-by: Claude AI
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 .gitignore                                    |   2 +
 defconfigs/mmtests-ab-testing                 |  22 +
 defconfigs/mmtests-ab-testing-thpcompact      |  31 +
 playbooks/mmtests-compare.yml                 |   5 +
 playbooks/mmtests.yml                         |   2 +-
 .../roles/gen_hosts/templates/mmtests.j2      |  10 +
 playbooks/roles/mmtests/tasks/main.yaml       |  58 +-
 .../roles/mmtests_compare/defaults/main.yml   |   6 +
 .../mmtests_compare/files/apply_patch.sh      |  32 +
 .../files/embed_graphs_in_html.sh             | 100 +++
 .../mmtests_compare/files/generate_graphs.sh  | 148 +++++
 .../files/generate_html_with_graphs.sh        |  89 +++
 .../mmtests_compare/files/run_comparison.sh   |  58 ++
 .../roles/mmtests_compare/tasks/main.yml      | 472 ++++++++++++++
 .../templates/comparison_report.html.j2       | 414 ++++++++++++
 scripts/generate_mmtests_graphs.py            | 598 ++++++++++++++++++
 workflows/mmtests/Makefile                    |  19 +-
 ...fined-array-reference-when-no-operat.patch |  46 ++
 ...act-fix-library-order-in-gcc-command.patch |  33 +
 19 files changed, 2138 insertions(+), 7 deletions(-)
 create mode 100644 defconfigs/mmtests-ab-testing
 create mode 100644 defconfigs/mmtests-ab-testing-thpcompact
 create mode 100644 playbooks/mmtests-compare.yml
 create mode 100644 playbooks/roles/mmtests_compare/defaults/main.yml
 create mode 100755 playbooks/roles/mmtests_compare/files/apply_patch.sh
 create mode 100755 playbooks/roles/mmtests_compare/files/embed_graphs_in_html.sh
 create mode 100755 playbooks/roles/mmtests_compare/files/generate_graphs.sh
 create mode 100755 playbooks/roles/mmtests_compare/files/generate_html_with_graphs.sh
 create mode 100755 playbooks/roles/mmtests_compare/files/run_comparison.sh
 create mode 100644 playbooks/roles/mmtests_compare/tasks/main.yml
 create mode 100644 playbooks/roles/mmtests_compare/templates/comparison_report.html.j2
 create mode 100644 scripts/generate_mmtests_graphs.py
 create mode 100644 workflows/mmtests/fixes/0001-compare-Fix-undefined-array-reference-when-no-operat.patch
 create mode 100644 workflows/mmtests/fixes/0002-thpcompact-fix-library-order-in-gcc-command.patch

diff --git a/.gitignore b/.gitignore
index 2e28c3f7..095880ab 100644
--- a/.gitignore
+++ b/.gitignore
@@ -67,6 +67,8 @@ workflows/ltp/results/
 workflows/nfstest/results/
 
 workflows/sysbench/results/
+workflows/mmtests/results/
+tmp
 
 playbooks/roles/linux-mirror/linux-mirror-systemd/*.service
 playbooks/roles/linux-mirror/linux-mirror-systemd/*.timer
diff --git a/defconfigs/mmtests-ab-testing b/defconfigs/mmtests-ab-testing
new file mode 100644
index 00000000..5d4dd2db
--- /dev/null
+++ b/defconfigs/mmtests-ab-testing
@@ -0,0 +1,22 @@
+CONFIG_GUESTFS=y
+CONFIG_LIBVIRT=y
+
+CONFIG_WORKFLOWS=y
+CONFIG_WORKFLOW_LINUX_CUSTOM=y
+
+CONFIG_BOOTLINUX=y
+
+# Enable baseline and dev testing
+CONFIG_KDEVOPS_BASELINE_AND_DEV=y
+
+# Enable mmtests workflow
+CONFIG_WORKFLOWS_TESTS=y
+CONFIG_WORKFLOWS_LINUX_TESTS=y
+CONFIG_WORKFLOWS_DEDICATED_WORKFLOW=y
+CONFIG_KDEVOPS_WORKFLOW_DEDICATE_MMTESTS=y
+
+# mmtests configuration - using defaults
+CONFIG_MMTESTS_ENABLE_THPCOMPACT=y
+
+# Filesystem for tests
+CONFIG_MMTESTS_FS_XFS=y
diff --git a/defconfigs/mmtests-ab-testing-thpcompact b/defconfigs/mmtests-ab-testing-thpcompact
new file mode 100644
index 00000000..cbcb30b9
--- /dev/null
+++ b/defconfigs/mmtests-ab-testing-thpcompact
@@ -0,0 +1,31 @@
+CONFIG_GUESTFS=y
+CONFIG_LIBVIRT=y
+
+CONFIG_WORKFLOWS=y
+CONFIG_WORKFLOW_LINUX_CUSTOM=y
+
+CONFIG_BOOTLINUX=y
+
+# Enable baseline and dev testing
+CONFIG_KDEVOPS_BASELINE_AND_DEV=y
+
+# Enable A/B testing with different kernel references
+CONFIG_BOOTLINUX_AB_DIFFERENT_REF=y
+
+# Enable mmtests workflow
+CONFIG_WORKFLOWS_TESTS=y
+CONFIG_WORKFLOWS_LINUX_TESTS=y
+CONFIG_WORKFLOWS_DEDICATED_WORKFLOW=y
+CONFIG_KDEVOPS_WORKFLOW_DEDICATE_MMTESTS=y
+
+# mmtests configuration
+CONFIG_MMTESTS_ENABLE_THPCOMPACT=y
+CONFIG_MMTESTS_ITERATIONS=5
+CONFIG_MMTESTS_MONITOR_INTERVAL=1
+CONFIG_MMTESTS_MONITOR_ENABLE_FTRACE=y
+CONFIG_MMTESTS_MONITOR_ENABLE_PROC_MONITORING=y
+CONFIG_MMTESTS_MONITOR_ENABLE_MPSTAT=y
+CONFIG_MMTESTS_PRETEST_THP_SETTING="always"
+
+# Filesystem for tests
+CONFIG_MMTESTS_FS_XFS=y
diff --git a/playbooks/mmtests-compare.yml b/playbooks/mmtests-compare.yml
new file mode 100644
index 00000000..7e948672
--- /dev/null
+++ b/playbooks/mmtests-compare.yml
@@ -0,0 +1,5 @@
+---
+- hosts: localhost
+  roles:
+    - role: mmtests_compare
+      when: kdevops_baseline_and_dev|bool
diff --git a/playbooks/mmtests.yml b/playbooks/mmtests.yml
index f66e65db..4e395db6 100644
--- a/playbooks/mmtests.yml
+++ b/playbooks/mmtests.yml
@@ -1,4 +1,4 @@
 ---
-- hosts: all
+- hosts: mmtests
   roles:
     - role: mmtests
diff --git a/playbooks/roles/gen_hosts/templates/mmtests.j2 b/playbooks/roles/gen_hosts/templates/mmtests.j2
index d32ffe40..1252fe87 100644
--- a/playbooks/roles/gen_hosts/templates/mmtests.j2
+++ b/playbooks/roles/gen_hosts/templates/mmtests.j2
@@ -1,4 +1,5 @@
 [all]
+localhost ansible_connection=local
 {% for test_type in mmtests_enabled_test_types %}
 {{ kdevops_host_prefix }}-{{ test_type }}
 {% if kdevops_baseline_and_dev %}
@@ -21,3 +22,12 @@ ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
 {% endif %}
 [dev:vars]
 ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
+[mmtests]
+{% for test_type in mmtests_enabled_test_types %}
+{{ kdevops_host_prefix }}-{{ test_type }}
+{% if kdevops_baseline_and_dev %}
+{{ kdevops_host_prefix }}-{{ test_type }}-dev
+{% endif %}
+{% endfor %}
+[mmtests:vars]
+ansible_python_interpreter =  "{{ kdevops_python_interpreter }}"
diff --git a/playbooks/roles/mmtests/tasks/main.yaml b/playbooks/roles/mmtests/tasks/main.yaml
index 93bc4bd9..199c8bdd 100644
--- a/playbooks/roles/mmtests/tasks/main.yaml
+++ b/playbooks/roles/mmtests/tasks/main.yaml
@@ -21,7 +21,6 @@
     path: "{{ data_path }}"
     owner: "{{ data_user }}"
     group: "{{ data_group }}"
-    recurse: yes
     state: directory
 
 - name: Clone mmtests repository
@@ -32,6 +31,63 @@
     version: "{{ mmtests_git_version }}"
     force: yes
 
+- name: Check if mmtests fixes directory exists
+  tags: [ 'setup' ]
+  delegate_to: localhost
+  stat:
+    path: "{{ topdir_path }}/workflows/mmtests/fixes/"
+  register: fixes_dir
+  run_once: false
+
+- name: Find mmtests patches in fixes directory
+  tags: [ 'setup' ]
+  delegate_to: localhost
+  find:
+    paths: "{{ topdir_path }}/workflows/mmtests/fixes/"
+    patterns: "*.patch"
+  register: mmtests_patches
+  when: fixes_dir.stat.exists
+  run_once: false
+
+- name: Copy patches to remote host
+  tags: [ 'setup' ]
+  become: yes
+  become_method: sudo
+  copy:
+    src: "{{ item.path }}"
+    dest: "/tmp/{{ item.path | basename }}"
+    mode: '0644'
+  with_items: "{{ mmtests_patches.files }}"
+  when:
+    - fixes_dir.stat.exists
+    - mmtests_patches.files | length > 0
+
+- name: Apply mmtests patches on remote host
+  tags: [ 'setup' ]
+  become: yes
+  become_method: sudo
+  shell: |
+    cd {{ mmtests_data_dir }}
+    git am /tmp/{{ item.path | basename }}
+  with_items: "{{ mmtests_patches.files }}"
+  when:
+    - fixes_dir.stat.exists
+    - mmtests_patches.files | length > 0
+  ignore_errors: true
+  register: patch_results
+
+- name: Report patch application results
+  tags: [ 'setup' ]
+  debug:
+    msg: |
+      Applied {{ mmtests_patches.files | length | default(0) }} patches from fixes directory:
+      {% for patch in mmtests_patches.files | default([]) %}
+      - {{ patch.path | basename }}
+      {% endfor %}
+  when:
+    - fixes_dir.stat.exists
+    - mmtests_patches.files | length > 0
+
 - name: Generate mmtests configuration
   tags: [ 'setup' ]
   become: yes
diff --git a/playbooks/roles/mmtests_compare/defaults/main.yml b/playbooks/roles/mmtests_compare/defaults/main.yml
new file mode 100644
index 00000000..201a5278
--- /dev/null
+++ b/playbooks/roles/mmtests_compare/defaults/main.yml
@@ -0,0 +1,6 @@
+---
+# mmtests compare role defaults
+mmtests_data_dir: "{{ data_path }}/mmtests"
+mmtests_results_dir: "{{ mmtests_data_dir }}/work/log/{{ inventory_hostname }}-{{ kernel_version.stdout }}"
+# Git URL is from extra_vars.yaml, fallback to GitHub
+mmtests_git_url: "{{ mmtests_git_url | default('https://github.com/gormanm/mmtests.git') }}"
diff --git a/playbooks/roles/mmtests_compare/files/apply_patch.sh b/playbooks/roles/mmtests_compare/files/apply_patch.sh
new file mode 100755
index 00000000..172a5bd1
--- /dev/null
+++ b/playbooks/roles/mmtests_compare/files/apply_patch.sh
@@ -0,0 +1,32 @@
+#!/bin/bash
+# Script to apply mmtests patches with proper error handling
+
+TOPDIR="$1"
+PATCH_FILE="$2"
+
+cd "$TOPDIR/tmp/mmtests" || exit 1
+
+PATCH_NAME=$(basename "$PATCH_FILE")
+
+# Check if patch is already applied by looking for the specific fix
+if grep -q "if (@operations > 0 && exists" bin/lib/MMTests/Compare.pm 2>/dev/null; then
+    echo "Patch $PATCH_NAME appears to be already applied"
+    exit 0
+fi
+
+# Try to apply with git apply first
+if git apply --check "$PATCH_FILE" 2>/dev/null; then
+    git apply "$PATCH_FILE"
+    echo "Applied patch with git: $PATCH_NAME"
+    exit 0
+fi
+
+# Try with patch command as fallback
+if patch -p1 --dry-run < "$PATCH_FILE" >/dev/null 2>&1; then
+    patch -p1 < "$PATCH_FILE"
+    echo "Applied patch with patch command: $PATCH_NAME"
+    exit 0
+fi
+
+echo "Failed to apply $PATCH_NAME - may already be applied or conflicting"
+exit 0  # Don't fail the playbook
diff --git a/playbooks/roles/mmtests_compare/files/embed_graphs_in_html.sh b/playbooks/roles/mmtests_compare/files/embed_graphs_in_html.sh
new file mode 100755
index 00000000..ee591f0d
--- /dev/null
+++ b/playbooks/roles/mmtests_compare/files/embed_graphs_in_html.sh
@@ -0,0 +1,100 @@
+#!/bin/bash
+# Script to embed graphs in the comparison HTML
+
+COMPARISON_HTML="$1"
+COMPARE_DIR="$2"
+
+# Check if comparison.html exists
+if [ ! -f "$COMPARISON_HTML" ]; then
+    echo "ERROR: $COMPARISON_HTML not found"
+    exit 1
+fi
+
+# Create a backup of the original
+cp "$COMPARISON_HTML" "${COMPARISON_HTML}.bak"
+
+# Create new HTML with embedded graphs
+{
+    echo '<!DOCTYPE html>'
+    echo '<html><head>'
+    echo '<title>mmtests Comparison with Graphs</title>'
+    echo '<style>'
+    echo 'body { font-family: Arial, sans-serif; margin: 20px; background-color: #f5f5f5; }'
+    echo '.container { max-width: 1400px; margin: 0 auto; background: white; padding: 30px; border-radius: 10px; box-shadow: 0 0 20px rgba(0,0,0,0.1); }'
+    echo '.resultsTable { border-collapse: collapse; width: 100%; margin: 20px 0; }'
+    echo '.resultsTable th { background-color: #3498db; color: white; padding: 10px; text-align: center; font-weight: bold; }'
+    echo '.resultsTable td { padding: 8px; text-align: right; font-family: monospace; border-bottom: 1px solid #ddd; }'
+    echo '.resultsTable tr:nth-child(odd) { background: #f9f9f9; }'
+    echo '.graph-section { background: #ecf0f1; padding: 20px; border-radius: 10px; margin: 20px 0; }'
+    echo '.graph-grid { display: grid; grid-template-columns: repeat(auto-fit, minmax(500px, 1fr)); gap: 20px; }'
+    echo '.graph-item { text-align: center; }'
+    echo '.graph-item img { max-width: 100%; height: auto; border: 1px solid #ddd; border-radius: 5px; box-shadow: 0 2px 10px rgba(0,0,0,0.1); }'
+    echo 'h1 { color: #2c3e50; text-align: center; border-bottom: 3px solid #3498db; padding-bottom: 10px; }'
+    echo 'h2 { color: #34495e; border-left: 4px solid #3498db; padding-left: 15px; }'
+    echo 'h3 { color: #34495e; font-size: 1.1em; }'
+    echo '</style>'
+    echo '</head><body>'
+    echo '<div class="container">'
+    echo '<h1>mmtests Performance Comparison</h1>'
+
+    # Add graphs section if any graphs exist
+    if ls "$COMPARE_DIR"/*.png >/dev/null 2>&1; then
+        echo '<div class="graph-section">'
+        echo '<h2>Performance Graphs</h2>'
+        echo '<div class="graph-grid">'
+
+        # Main benchmark graph first
+        for graph in "$COMPARE_DIR"/graph-*compact.png; do
+            if [ -f "$graph" ] && [[ ! "$graph" =~ -sorted|-smooth ]]; then
+                echo '<div class="graph-item">'
+                echo '<h3>Main Performance Comparison</h3>'
+                echo "<img src='$(basename "$graph")' alt='Main Performance'>"
+                echo '</div>'
+            fi
+        done
+
+        # Sorted graph
+        if [ -f "$COMPARE_DIR/graph-thpcompact-sorted.png" ]; then
+            echo '<div class="graph-item">'
+            echo '<h3>Sorted Samples</h3>'
+            echo '<img src="graph-thpcompact-sorted.png" alt="Sorted Samples">'
+            echo '</div>'
+        fi
+
+        # Smooth graph
+        if [ -f "$COMPARE_DIR/graph-thpcompact-smooth.png" ]; then
+            echo '<div class="graph-item">'
+            echo '<h3>Smoothed Trend</h3>'
+            echo '<img src="graph-thpcompact-smooth.png" alt="Smoothed Trend">'
+            echo '</div>'
+        fi
+
+        # Any monitor graphs
+        for graph in "$COMPARE_DIR"/graph-vmstat.png "$COMPARE_DIR"/graph-proc-vmstat.png "$COMPARE_DIR"/graph-mpstat.png; do
+            if [ -f "$graph" ]; then
+                graphname=$(basename "$graph" .png | sed 's/graph-//')
+                echo '<div class="graph-item">'
+                echo "<h3>${graphname^^} Monitor</h3>"
+                echo "<img src='$(basename "$graph")' alt='$graphname'>"
+                echo '</div>'
+            fi
+        done
+
+        echo '</div>'
+        echo '</div>'
+    fi
+
+    # Add the original comparison table
+    echo '<div class="graph-section">'
+    echo '<h2>Detailed Comparison Table</h2>'
+    cat "$COMPARISON_HTML"
+    echo '</div>'
+
+    echo '</div></body></html>'
+} > "${COMPARISON_HTML}.new"
+
+# Replace the original with the new version
+mv "${COMPARISON_HTML}.new" "$COMPARISON_HTML"
+
+echo "Graphs embedded in $COMPARISON_HTML"
+exit 0
diff --git a/playbooks/roles/mmtests_compare/files/generate_graphs.sh b/playbooks/roles/mmtests_compare/files/generate_graphs.sh
new file mode 100755
index 00000000..28b06911
--- /dev/null
+++ b/playbooks/roles/mmtests_compare/files/generate_graphs.sh
@@ -0,0 +1,148 @@
+#!/bin/bash
+# Script to generate mmtests graphs with proper error handling
+
+set -e
+
+TOPDIR="$1"
+BENCHMARK="$2"
+BASELINE_NAME="$3"
+DEV_NAME="$4"
+OUTPUT_DIR="$5"
+
+cd "$TOPDIR/tmp/mmtests"
+
+echo "Generating graphs for $BENCHMARK comparison"
+
+# Create output directory if it doesn't exist
+mkdir -p "$OUTPUT_DIR"
+
+# Set up kernel list for graph generation
+KERNEL_LIST="$BASELINE_NAME,$DEV_NAME"
+
+# Check if we have the required tools
+if [ ! -f ./bin/graph-mmtests.sh ]; then
+    echo "ERROR: graph-mmtests.sh not found"
+    exit 1
+fi
+
+if [ ! -f ./bin/extract-mmtests.pl ]; then
+    echo "ERROR: extract-mmtests.pl not found"
+    exit 1
+fi
+
+# Generate the main benchmark comparison graph
+echo "Generating main benchmark graph..."
+./bin/graph-mmtests.sh \
+    -d work/log/ \
+    -b "$BENCHMARK" \
+    -n "$KERNEL_LIST" \
+    --format png \
+    --output "$OUTPUT_DIR/graph-$BENCHMARK" \
+    --title "$BENCHMARK Performance Comparison" 2>&1 | tee "$OUTPUT_DIR/graph-generation.log"
+
+# Check if the graph was created
+if [ -f "$OUTPUT_DIR/graph-$BENCHMARK.png" ]; then
+    echo "Main benchmark graph created: graph-$BENCHMARK.png"
+else
+    echo "WARNING: Main benchmark graph was not created"
+fi
+
+# Generate sorted sample graphs
+echo "Generating sorted sample graph..."
+./bin/graph-mmtests.sh \
+    -d work/log/ \
+    -b "$BENCHMARK" \
+    -n "$KERNEL_LIST" \
+    --format png \
+    --output "$OUTPUT_DIR/graph-$BENCHMARK-sorted" \
+    --title "$BENCHMARK Performance (Sorted)" \
+    --sort-samples-reverse \
+    --x-label "Sorted samples" 2>&1 | tee -a "$OUTPUT_DIR/graph-generation.log"
+
+# Generate smooth curve graphs
+echo "Generating smooth curve graph..."
+./bin/graph-mmtests.sh \
+    -d work/log/ \
+    -b "$BENCHMARK" \
+    -n "$KERNEL_LIST" \
+    --format png \
+    --output "$OUTPUT_DIR/graph-$BENCHMARK-smooth" \
+    --title "$BENCHMARK Performance (Smoothed)" \
+    --with-smooth 2>&1 | tee -a "$OUTPUT_DIR/graph-generation.log"
+
+# Generate monitor graphs if data is available
+echo "Checking for monitor data..."
+
+# Function to generate monitor graph
+generate_monitor_graph() {
+    local monitor_type="$1"
+    local title="$2"
+
+    # Check if monitor data exists for any of the kernels
+    for kernel in $BASELINE_NAME $DEV_NAME; do
+        if [ -f "work/log/$kernel/$monitor_type-$BENCHMARK.gz" ] || [ -f "work/log/$kernel/$monitor_type.gz" ]; then
+            echo "Generating $monitor_type graph..."
+            ./bin/graph-mmtests.sh \
+                -d work/log/ \
+                -b "$BENCHMARK" \
+                -n "$KERNEL_LIST" \
+                --format png \
+                --output "$OUTPUT_DIR/graph-$monitor_type" \
+                --title "$title" \
+                --print-monitor "$monitor_type" 2>&1 | tee -a "$OUTPUT_DIR/graph-generation.log"
+
+            if [ -f "$OUTPUT_DIR/graph-$monitor_type.png" ]; then
+                echo "Monitor graph created: graph-$monitor_type.png"
+            fi
+            break
+        fi
+    done
+}
+
+# Generate various monitor graphs
+generate_monitor_graph "vmstat" "VM Statistics"
+generate_monitor_graph "proc-vmstat" "Process VM Statistics"
+generate_monitor_graph "mpstat" "CPU Statistics"
+generate_monitor_graph "proc-buddyinfo" "Buddy Info"
+generate_monitor_graph "proc-pagetypeinfo" "Page Type Info"
+
+# List all generated graphs
+echo ""
+echo "Generated graphs:"
+ls -la "$OUTPUT_DIR"/*.png 2>/dev/null || echo "No PNG files generated"
+
+# Create an HTML file that embeds all the graphs
+cat > "$OUTPUT_DIR/graphs.html" << 'EOF'
+<!DOCTYPE html>
+<html><head>
+<title>mmtests Graphs</title>
+<style>
+body { font-family: Arial, sans-serif; margin: 20px; background-color: #f5f5f5; }
+.container { max-width: 1200px; margin: 0 auto; background: white; padding: 30px; border-radius: 10px; box-shadow: 0 0 20px rgba(0,0,0,0.1); }
+h1 { color: #2c3e50; text-align: center; border-bottom: 3px solid #3498db; padding-bottom: 10px; }
+h2 { color: #34495e; border-left: 4px solid #3498db; padding-left: 15px; }
+.graph { margin: 20px 0; text-align: center; }
+.graph img { max-width: 100%; height: auto; border: 1px solid #ddd; border-radius: 5px; box-shadow: 0 2px 10px rgba(0,0,0,0.1); }
+.description { background: #e8f8f5; border-left: 4px solid #2ecc71; padding: 15px; margin: 15px 0; }
+</style>
+</head>
+<body>
+<div class="container">
+<h1>mmtests Performance Graphs</h1>
+EOF
+
+# Add each graph to the HTML file
+for graph in "$OUTPUT_DIR"/*.png; do
+    if [ -f "$graph" ]; then
+        graphname=$(basename "$graph" .png)
+        echo "<div class='graph'>" >> "$OUTPUT_DIR/graphs.html"
+        echo "<h2>$graphname</h2>" >> "$OUTPUT_DIR/graphs.html"
+        echo "<img src='$(basename $graph)' alt='$graphname'>" >> "$OUTPUT_DIR/graphs.html"
+        echo "</div>" >> "$OUTPUT_DIR/graphs.html"
+    fi
+done
+
+echo "</div></body></html>" >> "$OUTPUT_DIR/graphs.html"
+
+echo "Graph generation complete. HTML summary: $OUTPUT_DIR/graphs.html"
+exit 0
diff --git a/playbooks/roles/mmtests_compare/files/generate_html_with_graphs.sh b/playbooks/roles/mmtests_compare/files/generate_html_with_graphs.sh
new file mode 100755
index 00000000..a334d0fa
--- /dev/null
+++ b/playbooks/roles/mmtests_compare/files/generate_html_with_graphs.sh
@@ -0,0 +1,89 @@
+#!/bin/bash
+# Script to generate HTML report with embedded graphs using compare-kernels.sh
+
+set -e
+
+TOPDIR="$1"
+BENCHMARK="$2"
+BASELINE_NAME="$3"
+DEV_NAME="$4"
+OUTPUT_DIR="$5"
+
+cd "$TOPDIR/tmp/mmtests/work/log"
+
+echo "Generating HTML report with embedded graphs using compare-kernels.sh"
+
+# Ensure output directory is absolute path
+if [[ "$OUTPUT_DIR" != /* ]]; then
+    OUTPUT_DIR="$TOPDIR/$OUTPUT_DIR"
+fi
+
+# Create output directory if it doesn't exist
+mkdir -p "$OUTPUT_DIR"
+
+# Check if compare-kernels.sh exists
+if [ ! -f ../../compare-kernels.sh ]; then
+    echo "ERROR: compare-kernels.sh not found"
+    exit 1
+fi
+
+# Generate the HTML report with graphs
+echo "Running compare-kernels.sh for $BASELINE_NAME vs $DEV_NAME"
+
+# Export R_TMPDIR for caching R objects (performance optimization)
+export R_TMPDIR="$TOPDIR/tmp/mmtests_r_tmp"
+mkdir -p "$R_TMPDIR"
+
+# Suppress package installation prompts by pre-answering
+export MMTESTS_AUTO_PACKAGE_INSTALL=never
+
+# Run compare-kernels.sh with HTML format
+# The HTML output goes to stdout, graphs go to output-dir
+echo "Generating graphs and HTML report..."
+../../compare-kernels.sh \
+    --baseline "$BASELINE_NAME" \
+    --compare "$DEV_NAME" \
+    --format html \
+    --output-dir "$OUTPUT_DIR" \
+    --report-title "$BENCHMARK Performance Comparison" \
+    > "$OUTPUT_DIR/comparison.html" 2> "$OUTPUT_DIR/compare-kernels.log"
+
+# Check if the HTML was created
+if [ -f "$OUTPUT_DIR/comparison.html" ] && [ -s "$OUTPUT_DIR/comparison.html" ]; then
+    echo "HTML report with graphs created: $OUTPUT_DIR/comparison.html"
+
+    # Clean up any package installation artifacts from the HTML
+    # Remove lines about package installation
+    sed -i '/MMTests needs to install/d' "$OUTPUT_DIR/comparison.html"
+    sed -i '/dpkg-query: no packages found/d' "$OUTPUT_DIR/comparison.html"
+    sed -i '/E: Unable to locate package/d' "$OUTPUT_DIR/comparison.html"
+    sed -i '/WARNING: Failed to cleanly install/d' "$OUTPUT_DIR/comparison.html"
+    sed -i '/Reading package lists/d' "$OUTPUT_DIR/comparison.html"
+    sed -i '/Building dependency tree/d' "$OUTPUT_DIR/comparison.html"
+    sed -i '/Reading state information/d' "$OUTPUT_DIR/comparison.html"
+    sed -i '/Installed perl-File-Which/d' "$OUTPUT_DIR/comparison.html"
+    sed -i '/Unrecognised argument:/d' "$OUTPUT_DIR/comparison.html"
+else
+    echo "ERROR: Failed to generate HTML report"
+    echo "Check $OUTPUT_DIR/compare-kernels.log for errors"
+    exit 1
+fi
+
+# Count generated graphs
+PNG_COUNT=$(ls -1 "$OUTPUT_DIR"/*.png 2>/dev/null | wc -l)
+echo "Generated $PNG_COUNT graph files"
+
+# List all generated files
+echo ""
+echo "Generated files:"
+ls -la "$OUTPUT_DIR"/*.html 2>/dev/null | head -5
+echo "..."
+ls -la "$OUTPUT_DIR"/*.png 2>/dev/null | head -10
+
+# Clean up R temp directory
+rm -rf "$R_TMPDIR"
+
+echo ""
+echo "HTML report generation complete"
+echo "Main report: $OUTPUT_DIR/comparison.html"
+exit 0
diff --git a/playbooks/roles/mmtests_compare/files/run_comparison.sh b/playbooks/roles/mmtests_compare/files/run_comparison.sh
new file mode 100755
index 00000000..b95bea63
--- /dev/null
+++ b/playbooks/roles/mmtests_compare/files/run_comparison.sh
@@ -0,0 +1,58 @@
+#!/bin/bash
+# Script to run mmtests comparison with proper error handling
+
+set -e
+
+TOPDIR="$1"
+BENCHMARK="$2"
+BASELINE_NAME="$3"
+DEV_NAME="$4"
+OUTPUT_DIR="$5"
+
+cd "$TOPDIR/tmp/mmtests"
+
+# First, verify the script exists and is executable
+if [ ! -f ./bin/compare-mmtests.pl ]; then
+    echo "ERROR: compare-mmtests.pl not found"
+    exit 1
+fi
+
+# Create output directory if it doesn't exist
+mkdir -p "$OUTPUT_DIR"
+
+# Run the comparison with error checking for HTML output
+echo "Running HTML comparison for $BASELINE_NAME vs $DEV_NAME"
+./bin/compare-mmtests.pl \
+    --directory work/log/ \
+    --benchmark "$BENCHMARK" \
+    --names "$BASELINE_NAME,$DEV_NAME" \
+    --format html > "$OUTPUT_DIR/comparison.html" 2>&1
+
+# Check if the output file was created and has content
+if [ ! -s "$OUTPUT_DIR/comparison.html" ]; then
+    echo "WARNING: comparison.html is empty or not created"
+    # Check for the specific error we're trying to fix
+    if grep -q "Can't use an undefined value as an ARRAY reference" "$OUTPUT_DIR/comparison.html" 2>/dev/null; then
+        echo "ERROR: The patch to fix undefined array reference was not applied correctly"
+        exit 1
+    fi
+else
+    echo "HTML comparison completed successfully"
+fi
+
+# Run text comparison
+echo "Running text comparison for $BASELINE_NAME vs $DEV_NAME"
+./bin/compare-mmtests.pl \
+    --directory work/log/ \
+    --benchmark "$BENCHMARK" \
+    --names "$BASELINE_NAME,$DEV_NAME" \
+    > "$OUTPUT_DIR/comparison.txt" 2>&1
+
+# Verify the text output was created
+if [ ! -s "$OUTPUT_DIR/comparison.txt" ]; then
+    echo "WARNING: comparison.txt is empty or not created"
+else
+    echo "Text comparison completed successfully"
+fi
+
+exit 0
diff --git a/playbooks/roles/mmtests_compare/tasks/main.yml b/playbooks/roles/mmtests_compare/tasks/main.yml
new file mode 100644
index 00000000..9ddbbbe0
--- /dev/null
+++ b/playbooks/roles/mmtests_compare/tasks/main.yml
@@ -0,0 +1,472 @@
+---
+- name: Install Perl dependencies for mmtests compare on localhost (Debian/Ubuntu)
+  delegate_to: localhost
+  become: yes
+  become_method: sudo
+  apt:
+    name:
+      - perl
+      - perl-doc
+      - cpanminus
+      - libfile-which-perl
+      - libfile-slurp-perl
+      - libjson-perl
+      - liblist-moreutils-perl
+      - gnuplot
+      - python3-matplotlib
+      - python3-numpy
+    state: present
+    update_cache: true
+  when: ansible_facts['os_family']|lower == 'debian'
+  run_once: true
+  tags: ['compare', 'deps']
+
+- name: Install additional Perl modules via CPAN on localhost (if needed)
+  delegate_to: localhost
+  become: yes
+  become_method: sudo
+  cpanm:
+    name: "{{ item }}"
+  with_items:
+    - File::Temp
+  when: ansible_facts['os_family']|lower == 'debian'
+  run_once: true
+  tags: ['compare', 'deps']
+  ignore_errors: true
+
+- name: Install Perl dependencies for mmtests compare on localhost (SUSE)
+  delegate_to: localhost
+  become: yes
+  become_method: sudo
+  zypper:
+    name:
+      - perl
+      - perl-File-Which
+      - perl-File-Slurp
+      - perl-JSON
+      - perl-List-MoreUtils
+      - perl-Data-Dumper
+      - perl-Digest-MD5
+      - perl-Getopt-Long
+      - perl-Pod-Usage
+      - perl-App-cpanminus
+      - gnuplot
+      - python3-matplotlib
+      - python3-numpy
+    state: present
+  when: ansible_facts['os_family']|lower == 'suse'
+  run_once: true
+  tags: ['compare', 'deps']
+
+- name: Install Perl dependencies for mmtests compare on localhost (RedHat/Fedora)
+  delegate_to: localhost
+  become: yes
+  become_method: sudo
+  yum:
+    name:
+      - perl
+      - perl-File-Which
+      - perl-File-Slurp
+      - perl-JSON
+      - perl-List-MoreUtils
+      - perl-Data-Dumper
+      - perl-Digest-MD5
+      - perl-Getopt-Long
+      - perl-Pod-Usage
+      - perl-App-cpanminus
+      - gnuplot
+      - python3-matplotlib
+      - python3-numpy
+    state: present
+  when: ansible_facts['os_family']|lower == 'redhat'
+  run_once: true
+  tags: ['compare', 'deps']
+
+- name: Create required directories
+  delegate_to: localhost
+  ansible.builtin.file:
+    path: "{{ item }}"
+    state: directory
+    mode: '0755'
+  loop:
+    - "{{ topdir_path }}/workflows/mmtests/results/compare"
+    - "{{ topdir_path }}/tmp"
+  run_once: true
+  tags: ['compare']
+
+- name: Clone mmtests repository locally
+  delegate_to: localhost
+  ansible.builtin.git:
+    repo: "{{ mmtests_git_url }}"
+    dest: "{{ topdir_path }}/tmp/mmtests"
+    version: "{{ mmtests_git_version | default('master') }}"
+    force: yes
+  run_once: true
+  tags: ['compare']
+
+- name: Check if mmtests fixes directory exists
+  delegate_to: localhost
+  stat:
+    path: "{{ topdir_path }}/workflows/mmtests/fixes/"
+  register: fixes_dir
+  run_once: true
+  tags: ['compare']
+
+- name: Find mmtests patches in fixes directory
+  delegate_to: localhost
+  find:
+    paths: "{{ topdir_path }}/workflows/mmtests/fixes/"
+    patterns: "*.patch"
+  register: mmtests_patches
+  when: fixes_dir.stat.exists
+  run_once: true
+  tags: ['compare']
+
+- name: Apply mmtests patches if found
+  delegate_to: localhost
+  ansible.builtin.patch:
+    src: "{{ item.path }}"
+    basedir: "{{ topdir_path }}/tmp/mmtests"
+    strip: 1
+  loop: "{{ mmtests_patches.files }}"
+  when:
+    - fixes_dir.stat.exists
+    - mmtests_patches.files | length > 0
+  run_once: true
+  tags: ['compare']
+  failed_when: false
+  register: patch_results
+
+- name: Get kernel versions from nodes
+  block:
+    - name: Get baseline kernel version
+      command: uname -r
+      register: baseline_kernel_version
+      delegate_to: "{{ groups['baseline'][0] }}"
+      run_once: true
+
+    - name: Get dev kernel version
+      command: uname -r
+      register: dev_kernel_version
+      delegate_to: "{{ groups['dev'][0] }}"
+      run_once: true
+      when:
+        - groups['dev'] is defined
+        - groups['dev'] | length > 0
+  tags: ['compare']
+
+- name: Set node information facts
+  set_fact:
+    baseline_hostname: "{{ groups['baseline'][0] }}"
+    baseline_kernel: "{{ baseline_kernel_version.stdout }}"
+    dev_hostname: "{{ groups['dev'][0] }}"
+    dev_kernel: "{{ dev_kernel_version.stdout }}"
+  run_once: true
+  delegate_to: localhost
+  tags: ['compare']
+
+- name: Create local results directories for mmtests data
+  delegate_to: localhost
+  ansible.builtin.file:
+    path: "{{ topdir_path }}/tmp/mmtests/work/log/{{ item }}"
+    state: directory
+    mode: '0755'
+  loop:
+    - "{{ baseline_hostname }}-{{ baseline_kernel }}"
+    - "{{ dev_hostname }}-{{ dev_kernel }}"
+  run_once: true
+  when: kdevops_baseline_and_dev|bool
+  tags: ['compare']
+
+- name: Archive baseline results on remote
+  archive:
+    path: "{{ mmtests_data_dir }}/work/log/{{ baseline_hostname }}-{{ baseline_kernel }}"
+    dest: "/tmp/baseline-mmtests-results.tar.gz"
+    format: gz
+  delegate_to: "{{ groups['baseline'][0] }}"
+  run_once: true
+  tags: ['compare']
+
+- name: Archive dev results on remote
+  archive:
+    path: "{{ mmtests_data_dir }}/work/log/{{ dev_hostname }}-{{ dev_kernel }}"
+    dest: "/tmp/dev-mmtests-results.tar.gz"
+    format: gz
+  delegate_to: "{{ groups['dev'][0] }}"
+  run_once: true
+  when: kdevops_baseline_and_dev|bool
+  tags: ['compare']
+
+- name: Fetch baseline results to localhost
+  fetch:
+    src: "/tmp/baseline-mmtests-results.tar.gz"
+    dest: "{{ topdir_path }}/tmp/"
+    flat: yes
+  delegate_to: "{{ groups['baseline'][0] }}"
+  run_once: true
+  tags: ['compare']
+
+- name: Fetch dev results to localhost
+  fetch:
+    src: "/tmp/dev-mmtests-results.tar.gz"
+    dest: "{{ topdir_path }}/tmp/"
+    flat: yes
+  delegate_to: "{{ groups['dev'][0] }}"
+  run_once: true
+  when: kdevops_baseline_and_dev|bool
+  tags: ['compare']
+
+- name: Extract baseline results locally
+  delegate_to: localhost
+  unarchive:
+    src: "{{ topdir_path }}/tmp/baseline-mmtests-results.tar.gz"
+    dest: "{{ topdir_path }}/tmp/mmtests/work/log/"
+    remote_src: yes
+  run_once: true
+  tags: ['compare']
+
+- name: Extract dev results locally
+  delegate_to: localhost
+  unarchive:
+    src: "{{ topdir_path }}/tmp/dev-mmtests-results.tar.gz"
+    dest: "{{ topdir_path }}/tmp/mmtests/work/log/"
+    remote_src: yes
+  run_once: true
+  when: kdevops_baseline_and_dev|bool
+  tags: ['compare']
+
+- name: Run mmtests comparison
+  delegate_to: localhost
+  ansible.builtin.command:
+    cmd: |
+      ./bin/compare-mmtests.pl \
+        --directory work/log/ \
+        --benchmark {{ mmtests_test_type }} \
+        --names {{ baseline_hostname }}-{{ baseline_kernel }},{{ dev_hostname }}-{{ dev_kernel }}
+    chdir: "{{ topdir_path }}/tmp/mmtests"
+  run_once: true
+  when: kdevops_baseline_and_dev|bool
+  register: comparison_text_output
+  tags: ['compare']
+
+- name: Generate HTML comparison output
+  delegate_to: localhost
+  ansible.builtin.command:
+    cmd: |
+      ./bin/compare-mmtests.pl \
+        --directory work/log/ \
+        --benchmark {{ mmtests_test_type }} \
+        --names {{ baseline_hostname }}-{{ baseline_kernel }},{{ dev_hostname }}-{{ dev_kernel }} \
+        --format html
+    chdir: "{{ topdir_path }}/tmp/mmtests"
+  run_once: true
+  when: kdevops_baseline_and_dev|bool
+  register: comparison_html_output
+  tags: ['compare']
+
+- name: Parse comparison data for template
+  delegate_to: localhost
+  set_fact:
+    comparison_metrics: []
+  run_once: true
+  when: kdevops_baseline_and_dev|bool
+  tags: ['compare']
+
+- name: Generate performance graphs using gnuplot
+  delegate_to: localhost
+  block:
+    - name: Check for available iterations data
+      find:
+        paths: "{{ topdir_path }}/tmp/mmtests/work/log/{{ item }}/{{ mmtests_test_type }}"
+        patterns: "*.gz"
+        recurse: yes
+      register: iteration_files
+      loop:
+        - "{{ baseline_hostname }}-{{ baseline_kernel }}"
+        - "{{ dev_hostname }}-{{ dev_kernel }}"
+
+    - name: Extract iteration data files
+      ansible.builtin.unarchive:
+        src: "{{ item.path }}"
+        dest: "{{ item.path | dirname }}"
+        remote_src: yes
+      loop: "{{ iteration_files.results | map(attribute='files') | flatten }}"
+      when: iteration_files.results is defined
+
+    - name: Generate comparison graphs with compare-kernels.sh
+      ansible.builtin.command:
+        cmd: |
+          ./compare-kernels.sh \
+            --baseline {{ baseline_hostname }}-{{ baseline_kernel }} \
+            --compare {{ dev_hostname }}-{{ dev_kernel }} \
+            --output-dir {{ topdir_path }}/workflows/mmtests/results/compare
+        chdir: "{{ topdir_path }}/tmp/mmtests/work/log"
+      register: graph_generation
+      failed_when: false
+      environment:
+        MMTESTS_AUTO_PACKAGE_INSTALL: never
+  run_once: true
+  when: kdevops_baseline_and_dev|bool
+  tags: ['compare', 'graphs']
+
+- name: Find generated graph files
+  delegate_to: localhost
+  find:
+    paths: "{{ topdir_path }}/workflows/mmtests/results/compare"
+    patterns: "*.png"
+  register: graph_files
+  run_once: true
+  tags: ['compare', 'graphs']
+
+- name: Read graph files for embedding
+  delegate_to: localhost
+  slurp:
+    src: "{{ item.path }}"
+  register: graph_data
+  loop: "{{ graph_files.files[:10] }}"  # Limit to first 10 graphs
+  when: graph_files.files is defined
+  run_once: true
+  tags: ['compare', 'graphs']
+
+- name: Prepare graph data for template
+  delegate_to: localhost
+  set_fact:
+    performance_graphs: []
+  run_once: true
+  when: graph_files.files is not defined or graph_files.files | length == 0
+  tags: ['compare', 'graphs']
+
+- name: Build graph data list
+  delegate_to: localhost
+  set_fact:
+    performance_graphs: "{{ performance_graphs | default([]) + [{'embedded_data': item.content, 'title': item.item.path | basename | regex_replace('.png', '')}] }}"
+  loop: "{{ graph_data.results | default([]) }}"
+  run_once: true
+  when:
+    - graph_data is defined
+    - graph_data.results is defined
+  tags: ['compare', 'graphs']
+
+- name: Generate benchmark description
+  delegate_to: localhost
+  set_fact:
+    benchmark_description: |
+      {% if mmtests_test_type == 'thpcompact' %}
+      <p><strong>thpcompact</strong> tests memory management performance, specifically:</p>
+      <ul>
+        <li>Base page (4KB) allocation performance</li>
+        <li>Huge page (2MB) allocation performance</li>
+        <li>Memory compaction efficiency</li>
+        <li>Threading scalability</li>
+      </ul>
+      <p><em>Lower values indicate better (faster) performance.</em></p>
+      {% elif mmtests_test_type == 'hackbench' %}
+      <p><strong>hackbench</strong> measures scheduler and IPC performance through:</p>
+      <ul>
+        <li>Process/thread creation and destruction</li>
+        <li>Context switching overhead</li>
+        <li>Inter-process communication</li>
+        <li>Scheduler scalability</li>
+      </ul>
+      <p><em>Lower values indicate better performance.</em></p>
+      {% elif mmtests_test_type == 'kernbench' %}
+      <p><strong>kernbench</strong> measures kernel compilation performance:</p>
+      <ul>
+        <li>Overall system performance</li>
+        <li>CPU and memory bandwidth</li>
+        <li>I/O subsystem performance</li>
+        <li>Parallel compilation efficiency</li>
+      </ul>
+      <p><em>Lower compilation times indicate better performance.</em></p>
+      {% else %}
+      <p><strong>{{ mmtests_test_type }}</strong> benchmark for Linux kernel performance testing.</p>
+      {% endif %}
+  run_once: true
+  tags: ['compare']
+
+- name: Generate comparison report from template
+  delegate_to: localhost
+  template:
+    src: comparison_report.html.j2
+    dest: "{{ topdir_path }}/workflows/mmtests/results/compare/comparison_report.html"
+    mode: '0644'
+  vars:
+    benchmark_name: "{{ mmtests_test_type }}"
+    test_description: "Performance Benchmark"
+    analysis_date: "{{ ansible_date_time.date }}"
+    analysis_time: "{{ ansible_date_time.time }}"
+    baseline_hostname: "{{ baseline_hostname }}"
+    baseline_kernel: "{{ baseline_kernel }}"
+    dev_hostname: "{{ dev_hostname }}"
+    dev_kernel: "{{ dev_kernel }}"
+    benchmark_description: "{{ benchmark_description | default('') }}"
+    raw_comparison_html: "{{ comparison_html_output.stdout | default('') }}"
+    comparison_data: "{{ comparison_metrics | default([]) }}"
+    performance_graphs: "{{ performance_graphs | default([]) }}"
+    monitor_graphs: []
+    summary_stats: []
+  run_once: true
+  when: kdevops_baseline_and_dev|bool
+  tags: ['compare']
+
+- name: Save comparison outputs
+  delegate_to: localhost
+  copy:
+    content: "{{ item.content }}"
+    dest: "{{ item.dest }}"
+    mode: '0644'
+  loop:
+    - content: "{{ comparison_text_output.stdout | default('No comparison data') }}"
+      dest: "{{ topdir_path }}/workflows/mmtests/results/compare/comparison.txt"
+    - content: "{{ comparison_html_output.stdout | default('<p>No comparison data</p>') }}"
+      dest: "{{ topdir_path }}/workflows/mmtests/results/compare/comparison_raw.html"
+  run_once: true
+  when: kdevops_baseline_and_dev|bool
+  tags: ['compare']
+
+- name: Copy full results to final location
+  delegate_to: localhost
+  ansible.builtin.copy:
+    src: "{{ topdir_path }}/tmp/mmtests/work/log/{{ item }}/"
+    dest: "{{ topdir_path }}/workflows/mmtests/results/{{ item.split('-')[0] }}/"
+    remote_src: yes
+  loop:
+    - "{{ baseline_hostname }}-{{ baseline_kernel }}"
+    - "{{ dev_hostname }}-{{ dev_kernel }}"
+  run_once: true
+  when: kdevops_baseline_and_dev|bool
+  tags: ['compare']
+
+- name: Display comparison report location
+  debug:
+    msg: |
+      🎯 mmtests Comparison Reports Generated:
+
+      📊 Enhanced Analysis:
+      - Template-based HTML: {{ topdir_path }}/workflows/mmtests/results/compare/comparison_report.html
+      - PNG graphs: {{ graph_files.matched | default(0) }} files in {{ topdir_path }}/workflows/mmtests/results/compare/
+
+      📋 Standard Reports:
+      - Raw HTML: {{ topdir_path }}/workflows/mmtests/results/compare/comparison_raw.html
+      - Text: {{ topdir_path }}/workflows/mmtests/results/compare/comparison.txt
+
+      📁 Full Results:
+      - Baseline: {{ topdir_path }}/workflows/mmtests/results/{{ baseline_hostname }}/
+      - Dev: {{ topdir_path }}/workflows/mmtests/results/{{ dev_hostname }}/
+
+      🚀 Open comparison_report.html for the best analysis experience!
+  run_once: true
+  when: kdevops_baseline_and_dev|bool
+  tags: ['compare']
+
+- name: Clean up temporary archives on remote nodes
+  file:
+    path: "/tmp/{{ item }}-mmtests-results.tar.gz"
+    state: absent
+  delegate_to: "{{ groups[item][0] }}"
+  loop:
+    - baseline
+    - dev
+  run_once: true
+  when: kdevops_baseline_and_dev|bool
+  tags: ['compare', 'cleanup']
diff --git a/playbooks/roles/mmtests_compare/templates/comparison_report.html.j2 b/playbooks/roles/mmtests_compare/templates/comparison_report.html.j2
new file mode 100644
index 00000000..55e6d407
--- /dev/null
+++ b/playbooks/roles/mmtests_compare/templates/comparison_report.html.j2
@@ -0,0 +1,414 @@
+<!DOCTYPE html>
+<html>
+<head>
+    <title>mmtests Analysis: {{ baseline_hostname }}-{{ baseline_kernel }} vs {{ dev_hostname }}-{{ dev_kernel }}</title>
+    <style>
+        body {
+            font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, 'Helvetica Neue', Arial, sans-serif;
+            margin: 0;
+            padding: 0;
+            background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
+            min-height: 100vh;
+        }
+        .container {
+            max-width: 1400px;
+            margin: 20px auto;
+            background: white;
+            border-radius: 20px;
+            box-shadow: 0 20px 60px rgba(0,0,0,0.3);
+            overflow: hidden;
+        }
+        .header {
+            background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
+            color: white;
+            padding: 40px;
+            text-align: center;
+        }
+        .header h1 {
+            margin: 0;
+            font-size: 2.5em;
+            font-weight: 300;
+            letter-spacing: -1px;
+        }
+        .header .subtitle {
+            margin-top: 10px;
+            font-size: 1.2em;
+            opacity: 0.9;
+        }
+        .content {
+            padding: 40px;
+        }
+        .info-grid {
+            display: grid;
+            grid-template-columns: repeat(auto-fit, minmax(250px, 1fr));
+            gap: 20px;
+            margin-bottom: 40px;
+        }
+        .info-card {
+            background: #f8f9fa;
+            border-radius: 10px;
+            padding: 20px;
+            border-left: 4px solid #667eea;
+        }
+        .info-card h3 {
+            margin: 0 0 10px 0;
+            color: #495057;
+            font-size: 0.9em;
+            text-transform: uppercase;
+            letter-spacing: 1px;
+        }
+        .info-card .value {
+            font-size: 1.3em;
+            color: #212529;
+            font-weight: 600;
+        }
+        .section {
+            margin: 40px 0;
+        }
+        .section-header {
+            background: linear-gradient(90deg, #667eea 0%, transparent 100%);
+            color: white;
+            padding: 15px 25px;
+            border-radius: 10px 10px 0 0;
+            margin-bottom: 0;
+        }
+        .section-header h2 {
+            margin: 0;
+            font-weight: 400;
+        }
+        .description {
+            background: #e8f4fd;
+            border-left: 4px solid #0066cc;
+            padding: 20px;
+            margin: 20px 0;
+            border-radius: 5px;
+        }
+        .description h3 {
+            margin-top: 0;
+            color: #0066cc;
+        }
+        .resultsTable {
+            width: 100%;
+            border-collapse: separate;
+            border-spacing: 0;
+            margin: 0;
+            background: white;
+            border-radius: 0 0 10px 10px;
+            overflow: hidden;
+            box-shadow: 0 4px 6px rgba(0,0,0,0.1);
+        }
+        .resultsTable thead th {
+            background: #f8f9fa;
+            color: #495057;
+            padding: 15px;
+            text-align: left;
+            font-weight: 600;
+            border-bottom: 2px solid #dee2e6;
+        }
+        .resultsTable tbody td {
+            padding: 12px 15px;
+            border-bottom: 1px solid #e9ecef;
+        }
+        .resultsTable tbody tr:hover {
+            background: #f8f9fa;
+        }
+        .resultsTable tbody tr:last-child td {
+            border-bottom: none;
+        }
+        .metric-name {
+            font-weight: 600;
+            color: #495057;
+        }
+        .metric-value {
+            font-family: 'SF Mono', Monaco, 'Cascadia Code', monospace;
+            text-align: right;
+        }
+        .metric-diff {
+            text-align: right;
+            font-weight: 600;
+        }
+        .metric-diff.improvement {
+            color: #28a745;
+        }
+        .metric-diff.regression {
+            color: #dc3545;
+        }
+        .metric-diff.neutral {
+            color: #6c757d;
+        }
+        .graph-grid {
+            display: grid;
+            grid-template-columns: repeat(auto-fit, minmax(500px, 1fr));
+            gap: 30px;
+            margin: 30px 0;
+        }
+        .graph-card {
+            background: white;
+            border-radius: 10px;
+            overflow: hidden;
+            box-shadow: 0 4px 6px rgba(0,0,0,0.1);
+            transition: transform 0.3s ease;
+        }
+        .graph-card:hover {
+            transform: translateY(-5px);
+            box-shadow: 0 6px 20px rgba(0,0,0,0.15);
+        }
+        .graph-card-header {
+            background: #f8f9fa;
+            padding: 15px 20px;
+            border-bottom: 1px solid #dee2e6;
+        }
+        .graph-card-header h3 {
+            margin: 0;
+            color: #495057;
+            font-size: 1.1em;
+        }
+        .graph-card-body {
+            padding: 20px;
+            text-align: center;
+        }
+        .graph-card img {
+            max-width: 100%;
+            height: auto;
+            border-radius: 5px;
+        }
+        .graph-placeholder {
+            background: #f8f9fa;
+            border: 2px dashed #dee2e6;
+            border-radius: 5px;
+            padding: 40px;
+            text-align: center;
+            color: #6c757d;
+        }
+        .stats-summary {
+            display: grid;
+            grid-template-columns: repeat(auto-fit, minmax(200px, 1fr));
+            gap: 20px;
+            margin: 30px 0;
+            padding: 20px;
+            background: #f8f9fa;
+            border-radius: 10px;
+        }
+        .stat-item {
+            text-align: center;
+        }
+        .stat-value {
+            font-size: 2em;
+            font-weight: bold;
+            color: #667eea;
+        }
+        .stat-label {
+            color: #6c757d;
+            font-size: 0.9em;
+            text-transform: uppercase;
+            letter-spacing: 1px;
+            margin-top: 5px;
+        }
+        .footer {
+            background: #f8f9fa;
+            padding: 30px;
+            text-align: center;
+            color: #6c757d;
+            border-top: 1px solid #dee2e6;
+        }
+        .badge {
+            display: inline-block;
+            padding: 4px 8px;
+            border-radius: 4px;
+            font-size: 0.85em;
+            font-weight: 600;
+            margin-left: 10px;
+        }
+        .badge-success {
+            background: #d4edda;
+            color: #155724;
+        }
+        .badge-danger {
+            background: #f8d7da;
+            color: #721c24;
+        }
+        .badge-info {
+            background: #d1ecf1;
+            color: #0c5460;
+        }
+        @media (max-width: 768px) {
+            .graph-grid {
+                grid-template-columns: 1fr;
+            }
+            .info-grid {
+                grid-template-columns: 1fr;
+            }
+        }
+    </style>
+</head>
+<body>
+    <div class="container">
+        <div class="header">
+            <h1>mmtests Performance Analysis</h1>
+            <div class="subtitle">{{ benchmark_name }} Benchmark Comparison</div>
+        </div>
+
+        <div class="content">
+            <div class="info-grid">
+                <div class="info-card">
+                    <h3>Baseline System</h3>
+                    <div class="value">{{ baseline_hostname }}</div>
+                    <div style="color: #6c757d; margin-top: 5px;">{{ baseline_kernel }}</div>
+                </div>
+                <div class="info-card">
+                    <h3>Development System</h3>
+                    <div class="value">{{ dev_hostname }}</div>
+                    <div style="color: #6c757d; margin-top: 5px;">{{ dev_kernel }}</div>
+                </div>
+                <div class="info-card">
+                    <h3>Test Type</h3>
+                    <div class="value">{{ benchmark_name }}</div>
+                    <div style="color: #6c757d; margin-top: 5px;">{{ test_description }}</div>
+                </div>
+                <div class="info-card">
+                    <h3>Analysis Date</h3>
+                    <div class="value">{{ analysis_date }}</div>
+                    <div style="color: #6c757d; margin-top: 5px;">{{ analysis_time }}</div>
+                </div>
+            </div>
+
+            {% if benchmark_description %}
+            <div class="description">
+                <h3>📊 About {{ benchmark_name }}</h3>
+                {{ benchmark_description | safe }}
+            </div>
+            {% endif %}
+
+            {% if summary_stats %}
+            <div class="section">
+                <div class="section-header">
+                    <h2>📈 Performance Summary</h2>
+                </div>
+                <div class="stats-summary">
+                    {% for stat in summary_stats %}
+                    <div class="stat-item">
+                        <div class="stat-value">{{ stat.value }}</div>
+                        <div class="stat-label">{{ stat.label }}</div>
+                    </div>
+                    {% endfor %}
+                </div>
+            </div>
+            {% endif %}
+
+            {% if comparison_data %}
+            <div class="section">
+                <div class="section-header">
+                    <h2>📋 Detailed Performance Metrics</h2>
+                </div>
+                <table class="resultsTable">
+                    <thead>
+                        <tr>
+                            <th>Metric</th>
+                            <th>Baseline</th>
+                            <th>Development</th>
+                            <th>Difference</th>
+                            <th>Change %</th>
+                        </tr>
+                    </thead>
+                    <tbody>
+                        {% for metric in comparison_data %}
+                        <tr>
+                            <td class="metric-name">{{ metric.name }}</td>
+                            <td class="metric-value">{{ metric.baseline }}</td>
+                            <td class="metric-value">{{ metric.dev }}</td>
+                            <td class="metric-diff {{ metric.diff_class }}">
+                                {{ metric.diff }}
+                                {% if metric.is_improvement %}
+                                <span class="badge badge-success">↓ Better</span>
+                                {% elif metric.is_regression %}
+                                <span class="badge badge-danger">↑ Worse</span>
+                                {% else %}
+                                <span class="badge badge-info">→ Same</span>
+                                {% endif %}
+                            </td>
+                            <td class="metric-diff {{ metric.diff_class }}">{{ metric.percent_change }}%</td>
+                        </tr>
+                        {% endfor %}
+                    </tbody>
+                </table>
+            </div>
+            {% endif %}
+
+            {% if performance_graphs %}
+            <div class="section">
+                <div class="section-header">
+                    <h2>📊 Performance Visualization</h2>
+                </div>
+                <div class="graph-grid">
+                    {% for graph in performance_graphs %}
+                    <div class="graph-card">
+                        <div class="graph-card-header">
+                            <h3>{{ graph.title }}</h3>
+                        </div>
+                        <div class="graph-card-body">
+                            {% if graph.embedded_data %}
+                            <img src="data:image/png;base64,{{ graph.embedded_data }}" alt="{{ graph.title }}">
+                            {% elif graph.path %}
+                            <img src="{{ graph.filename }}" alt="{{ graph.title }}">
+                            {% else %}
+                            <div class="graph-placeholder">
+                                <p>Graph pending generation</p>
+                            </div>
+                            {% endif %}
+                        </div>
+                    </div>
+                    {% endfor %}
+                </div>
+            </div>
+            {% endif %}
+
+            {% if monitor_graphs %}
+            <div class="section">
+                <div class="section-header">
+                    <h2>🖥️ System Monitor Data</h2>
+                </div>
+                <div class="graph-grid">
+                    {% for graph in monitor_graphs %}
+                    <div class="graph-card">
+                        <div class="graph-card-header">
+                            <h3>{{ graph.title }}</h3>
+                        </div>
+                        <div class="graph-card-body">
+                            {% if graph.embedded_data %}
+                            <img src="data:image/png;base64,{{ graph.embedded_data }}" alt="{{ graph.title }}">
+                            {% elif graph.path %}
+                            <img src="{{ graph.filename }}" alt="{{ graph.title }}">
+                            {% else %}
+                            <div class="graph-placeholder">
+                                <p>Monitor data unavailable</p>
+                            </div>
+                            {% endif %}
+                        </div>
+                    </div>
+                    {% endfor %}
+                </div>
+            </div>
+            {% endif %}
+
+            {% if raw_comparison_html %}
+            <div class="section">
+                <div class="section-header">
+                    <h2>📄 Raw Comparison Output</h2>
+                </div>
+                <div style="background: #f8f9fa; padding: 20px; border-radius: 0 0 10px 10px;">
+                    {{ raw_comparison_html | safe }}
+                </div>
+            </div>
+            {% endif %}
+        </div>
+
+        <div class="footer">
+            <p><strong>Generated by kdevops mmtests</strong></p>
+            <p>Performance analysis and benchmarking framework for Linux kernel development</p>
+            <p style="margin-top: 20px; font-size: 0.9em;">
+                Report generated on {{ analysis_date }} at {{ analysis_time }}
+            </p>
+        </div>
+    </div>
+</body>
+</html>
diff --git a/scripts/generate_mmtests_graphs.py b/scripts/generate_mmtests_graphs.py
new file mode 100644
index 00000000..bc261249
--- /dev/null
+++ b/scripts/generate_mmtests_graphs.py
@@ -0,0 +1,598 @@
+#!/usr/bin/env python3
+"""
+Generate visualization graphs for mmtests comparison results.
+
+This script parses mmtests comparison output and creates informative graphs
+that help understand performance differences between baseline and dev kernels.
+"""
+
+import sys
+import re
+import matplotlib.pyplot as plt
+import numpy as np
+import os
+from pathlib import Path
+
+# Set matplotlib to use Agg backend for headless operation
+import matplotlib
+
+matplotlib.use("Agg")
+
+
+def parse_comparison_file(filepath):
+    """Parse mmtests comparison text file and extract data."""
+    data = {
+        "fault_base": {"threads": [], "baseline": [], "dev": [], "improvement": []},
+        "fault_huge": {"threads": [], "baseline": [], "dev": [], "improvement": []},
+        "fault_both": {"threads": [], "baseline": [], "dev": [], "improvement": []},
+    }
+
+    with open(filepath, "r") as f:
+        for line in f:
+            # Parse lines like: Min       fault-base-1       742.00 (   0.00%)     1228.00 ( -65.50%)
+            match = re.match(
+                r"(\w+)\s+fault-(\w+)-(\d+)\s+(\d+\.\d+)\s+\([^)]+\)\s+(\d+\.\d+)\s+\(\s*([+-]?\d+\.\d+)%\)",
+                line.strip(),
+            )
+            if match:
+                metric, fault_type, threads, baseline, dev, improvement = match.groups()
+
+                # Focus on Amean (arithmetic mean) as it's most representative
+                if metric == "Amean" and fault_type in ["base", "huge", "both"]:
+                    key = f"fault_{fault_type}"
+                    data[key]["threads"].append(int(threads))
+                    data[key]["baseline"].append(float(baseline))
+                    data[key]["dev"].append(float(dev))
+                    data[key]["improvement"].append(float(improvement))
+
+    return data
+
+
+def create_performance_comparison_graph(data, output_dir):
+    """Create a comprehensive performance comparison graph."""
+    fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(16, 12))
+    fig.suptitle(
+        "mmtests thpcompact: Baseline vs Dev Kernel Performance",
+        fontsize=16,
+        fontweight="bold",
+    )
+
+    colors = {"fault_base": "#1f77b4", "fault_huge": "#ff7f0e", "fault_both": "#2ca02c"}
+    labels = {
+        "fault_base": "Base Pages",
+        "fault_huge": "Huge Pages",
+        "fault_both": "Both Pages",
+    }
+
+    # Plot 1: Raw performance comparison
+    for fault_type in ["fault_base", "fault_huge", "fault_both"]:
+        if data[fault_type]["threads"]:
+            threads = np.array(data[fault_type]["threads"])
+            baseline = np.array(data[fault_type]["baseline"])
+            dev = np.array(data[fault_type]["dev"])
+
+            ax1.plot(
+                threads,
+                baseline,
+                "o-",
+                color=colors[fault_type],
+                alpha=0.7,
+                label=f"{labels[fault_type]} - Baseline",
+                linewidth=2,
+            )
+            ax1.plot(
+                threads,
+                dev,
+                "s--",
+                color=colors[fault_type],
+                alpha=0.9,
+                label=f"{labels[fault_type]} - Dev",
+                linewidth=2,
+            )
+
+    ax1.set_xlabel("Number of Threads")
+    ax1.set_ylabel("Fault Time (microseconds)")
+    ax1.set_title("Raw Performance: Lower is Better")
+    ax1.legend()
+    ax1.grid(True, alpha=0.3)
+    ax1.set_yscale("log")  # Log scale for better visibility of differences
+
+    # Plot 2: Performance improvement percentage
+    for fault_type in ["fault_base", "fault_huge", "fault_both"]:
+        if data[fault_type]["threads"]:
+            threads = np.array(data[fault_type]["threads"])
+            improvement = np.array(data[fault_type]["improvement"])
+
+            ax2.plot(
+                threads,
+                improvement,
+                "o-",
+                color=colors[fault_type],
+                label=labels[fault_type],
+                linewidth=2,
+                markersize=6,
+            )
+
+    ax2.axhline(y=0, color="black", linestyle="-", alpha=0.5)
+    ax2.fill_between(
+        ax2.get_xlim(), 0, 100, alpha=0.1, color="green", label="Improvement"
+    )
+    ax2.fill_between(
+        ax2.get_xlim(), -100, 0, alpha=0.1, color="red", label="Regression"
+    )
+    ax2.set_xlabel("Number of Threads")
+    ax2.set_ylabel("Performance Change (%)")
+    ax2.set_title("Performance Change: Positive = Better Dev Kernel")
+    ax2.legend()
+    ax2.grid(True, alpha=0.3)
+
+    # Plot 3: Scalability comparison (normalized to single thread)
+    for fault_type in ["fault_base", "fault_huge"]:  # Skip 'both' to reduce clutter
+        if data[fault_type]["threads"] and len(data[fault_type]["threads"]) > 1:
+            threads = np.array(data[fault_type]["threads"])
+            baseline = np.array(data[fault_type]["baseline"])
+            dev = np.array(data[fault_type]["dev"])
+
+            # Normalize to single thread performance
+            baseline_norm = baseline / baseline[0] if baseline[0] > 0 else baseline
+            dev_norm = dev / dev[0] if dev[0] > 0 else dev
+
+            ax3.plot(
+                threads,
+                baseline_norm,
+                "o-",
+                color=colors[fault_type],
+                alpha=0.7,
+                label=f"{labels[fault_type]} - Baseline",
+                linewidth=2,
+            )
+            ax3.plot(
+                threads,
+                dev_norm,
+                "s--",
+                color=colors[fault_type],
+                alpha=0.9,
+                label=f"{labels[fault_type]} - Dev",
+                linewidth=2,
+            )
+
+    ax3.set_xlabel("Number of Threads")
+    ax3.set_ylabel("Relative Performance (vs 1 thread)")
+    ax3.set_title("Scalability: How Performance Changes with Thread Count")
+    ax3.legend()
+    ax3.grid(True, alpha=0.3)
+
+    # Plot 4: Summary statistics
+    summary_data = []
+    categories = []
+
+    for fault_type in ["fault_base", "fault_huge", "fault_both"]:
+        if data[fault_type]["improvement"]:
+            improvements = np.array(data[fault_type]["improvement"])
+            avg_improvement = np.mean(improvements)
+            summary_data.append(avg_improvement)
+            categories.append(labels[fault_type])
+
+    bars = ax4.bar(
+        categories,
+        summary_data,
+        color=[
+            colors[f'fault_{k.lower().replace(" ", "_")}']
+            for k in ["base", "huge", "both"]
+        ][: len(categories)],
+    )
+    ax4.axhline(y=0, color="black", linestyle="-", alpha=0.5)
+    ax4.set_ylabel("Average Performance Change (%)")
+    ax4.set_title("Overall Performance Summary")
+    ax4.grid(True, alpha=0.3, axis="y")
+
+    # Add value labels on bars
+    for bar, value in zip(bars, summary_data):
+        height = bar.get_height()
+        ax4.text(
+            bar.get_x() + bar.get_width() / 2.0,
+            height + (1 if height >= 0 else -3),
+            f"{value:.1f}%",
+            ha="center",
+            va="bottom" if height >= 0 else "top",
+        )
+
+    plt.tight_layout()
+    plt.savefig(
+        os.path.join(output_dir, "performance_comparison.png"),
+        dpi=150,
+        bbox_inches="tight",
+    )
+    plt.close()
+
+
+def create_detailed_thread_analysis(data, output_dir):
+    """Create detailed analysis for different thread counts."""
+    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))
+    fig.suptitle("Thread Scaling Analysis", fontsize=14, fontweight="bold")
+
+    colors = {"fault_base": "#1f77b4", "fault_huge": "#ff7f0e"}
+    labels = {"fault_base": "Base Pages", "fault_huge": "Huge Pages"}
+
+    # Plot thread efficiency (performance per thread)
+    for fault_type in ["fault_base", "fault_huge"]:
+        if data[fault_type]["threads"]:
+            threads = np.array(data[fault_type]["threads"])
+            baseline = np.array(data[fault_type]["baseline"])
+            dev = np.array(data[fault_type]["dev"])
+
+            # Calculate efficiency (lower time per operation = better efficiency)
+            baseline_eff = baseline / threads  # Time per thread
+            dev_eff = dev / threads
+
+            ax1.plot(
+                threads,
+                baseline_eff,
+                "o-",
+                color=colors[fault_type],
+                alpha=0.7,
+                label=f"{labels[fault_type]} - Baseline",
+                linewidth=2,
+            )
+            ax1.plot(
+                threads,
+                dev_eff,
+                "s--",
+                color=colors[fault_type],
+                alpha=0.9,
+                label=f"{labels[fault_type]} - Dev",
+                linewidth=2,
+            )
+
+    ax1.set_xlabel("Number of Threads")
+    ax1.set_ylabel("Time per Thread (microseconds)")
+    ax1.set_title("Threading Efficiency: Lower is Better")
+    ax1.legend()
+    ax1.grid(True, alpha=0.3)
+    ax1.set_yscale("log")
+
+    # Plot improvement by thread count
+    thread_counts = set()
+    for fault_type in ["fault_base", "fault_huge"]:
+        thread_counts.update(data[fault_type]["threads"])
+
+    thread_counts = sorted(list(thread_counts))
+
+    base_improvements = []
+    huge_improvements = []
+
+    for tc in thread_counts:
+        base_imp = None
+        huge_imp = None
+
+        for i, t in enumerate(data["fault_base"]["threads"]):
+            if t == tc:
+                base_imp = data["fault_base"]["improvement"][i]
+                break
+
+        for i, t in enumerate(data["fault_huge"]["threads"]):
+            if t == tc:
+                huge_imp = data["fault_huge"]["improvement"][i]
+                break
+
+        base_improvements.append(base_imp if base_imp is not None else 0)
+        huge_improvements.append(huge_imp if huge_imp is not None else 0)
+
+    x = np.arange(len(thread_counts))
+    width = 0.35
+
+    bars1 = ax2.bar(
+        x - width / 2,
+        base_improvements,
+        width,
+        label="Base Pages",
+        color=colors["fault_base"],
+        alpha=0.8,
+    )
+    bars2 = ax2.bar(
+        x + width / 2,
+        huge_improvements,
+        width,
+        label="Huge Pages",
+        color=colors["fault_huge"],
+        alpha=0.8,
+    )
+
+    ax2.set_xlabel("Thread Count")
+    ax2.set_ylabel("Performance Improvement (%)")
+    ax2.set_title("Improvement by Thread Count")
+    ax2.set_xticks(x)
+    ax2.set_xticklabels(thread_counts)
+    ax2.legend()
+    ax2.grid(True, alpha=0.3, axis="y")
+    ax2.axhline(y=0, color="black", linestyle="-", alpha=0.5)
+
+    # Add value labels on bars
+    for bars in [bars1, bars2]:
+        for bar in bars:
+            height = bar.get_height()
+            if abs(height) > 0.1:  # Only show labels for non-zero values
+                ax2.text(
+                    bar.get_x() + bar.get_width() / 2.0,
+                    height + (1 if height >= 0 else -3),
+                    f"{height:.1f}%",
+                    ha="center",
+                    va="bottom" if height >= 0 else "top",
+                    fontsize=8,
+                )
+
+    plt.tight_layout()
+    plt.savefig(
+        os.path.join(output_dir, "thread_analysis.png"), dpi=150, bbox_inches="tight"
+    )
+    plt.close()
+
+
+def generate_graphs_html(output_dir, baseline_kernel, dev_kernel):
+    """Generate an HTML file with explanations and embedded graphs."""
+    html_content = f"""<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>mmtests Performance Analysis: {baseline_kernel} vs {dev_kernel}</title>
+    <style>
+        body {{
+            font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
+            line-height: 1.6;
+            margin: 0;
+            padding: 20px;
+            background-color: #f5f5f5;
+        }}
+        .container {{
+            max-width: 1200px;
+            margin: 0 auto;
+            background: white;
+            padding: 30px;
+            border-radius: 10px;
+            box-shadow: 0 0 20px rgba(0,0,0,0.1);
+        }}
+        h1 {{
+            color: #2c3e50;
+            text-align: center;
+            border-bottom: 3px solid #3498db;
+            padding-bottom: 10px;
+        }}
+        h2 {{
+            color: #34495e;
+            margin-top: 30px;
+            border-left: 4px solid #3498db;
+            padding-left: 15px;
+        }}
+        .summary {{
+            background: #ecf0f1;
+            padding: 20px;
+            border-radius: 8px;
+            margin: 20px 0;
+        }}
+        .graph {{
+            text-align: center;
+            margin: 30px 0;
+        }}
+        .graph img {{
+            max-width: 100%;
+            height: auto;
+            border: 1px solid #ddd;
+            border-radius: 5px;
+            box-shadow: 0 2px 10px rgba(0,0,0,0.1);
+        }}
+        .explanation {{
+            background: #fff;
+            padding: 15px;
+            border-left: 4px solid #2ecc71;
+            margin: 15px 0;
+        }}
+        .warning {{
+            background: #ffeaa7;
+            border-left: 4px solid #fdcb6e;
+            padding: 15px;
+            margin: 15px 0;
+        }}
+        .good {{
+            background: #d5f4e6;
+            border-left: 4px solid #2ecc71;
+        }}
+        .bad {{
+            background: #ffeaea;
+            border-left: 4px solid #e74c3c;
+        }}
+        .metric {{
+            font-family: monospace;
+            background: #ecf0f1;
+            padding: 2px 6px;
+            border-radius: 3px;
+        }}
+        table {{
+            width: 100%;
+            border-collapse: collapse;
+            margin: 20px 0;
+        }}
+        th, td {{
+            padding: 10px;
+            text-align: left;
+            border-bottom: 1px solid #ddd;
+        }}
+        th {{
+            background-color: #3498db;
+            color: white;
+        }}
+    </style>
+</head>
+<body>
+    <div class="container">
+        <h1>mmtests thpcompact Performance Analysis</h1>
+        <h2>Kernel Comparison: {baseline_kernel} vs {dev_kernel}</h2>
+
+        <div class="summary">
+            <h3>🎯 What is thpcompact testing?</h3>
+            <p><strong>thpcompact</strong> is a memory management benchmark that tests how well the kernel handles:</p>
+            <ul>
+                <li><strong>Base Pages (4KB)</strong>: Standard memory pages</li>
+                <li><strong>Huge Pages (2MB)</strong>: Large memory pages that reduce TLB misses</li>
+                <li><strong>Memory Compaction</strong>: Kernel's ability to defragment memory</li>
+                <li><strong>Thread Scaling</strong>: Performance under different levels of parallelism</li>
+            </ul>
+            <p><strong>Lower numbers are better</strong> - they represent faster memory allocation times.</p>
+        </div>
+
+        <h2>📊 Performance Overview</h2>
+        <div class="graph">
+            <img src="performance_comparison.png" alt="Performance Comparison">
+        </div>
+
+        <div class="explanation">
+            <h3>Understanding the Performance Graphs:</h3>
+            <ul>
+                <li><strong>Top Left</strong>: Raw performance comparison. Lower lines = faster kernel.</li>
+                <li><strong>Top Right</strong>: Performance changes. Green area = dev kernel improved, red = regressed.</li>
+                <li><strong>Bottom Left</strong>: Scalability. Shows how performance changes with more threads.</li>
+                <li><strong>Bottom Right</strong>: Overall summary of improvements/regressions.</li>
+            </ul>
+        </div>
+
+        <h2>🧵 Thread Scaling Analysis</h2>
+        <div class="graph">
+            <img src="thread_analysis.png" alt="Thread Analysis">
+        </div>
+
+        <div class="explanation">
+            <h3>Thread Scaling Insights:</h3>
+            <ul>
+                <li><strong>Left Graph</strong>: Threading efficiency - how well work is distributed across threads</li>
+                <li><strong>Right Graph</strong>: Performance improvement at each thread count</li>
+                <li>Good scaling means the kernel can efficiently use multiple CPUs for memory operations</li>
+            </ul>
+        </div>
+
+        <h2>🔍 What These Results Mean</h2>
+
+        <div class="explanation good">
+            <h3>✅ Positive Performance Changes</h3>
+            <p>When you see <strong>positive percentages</strong> in the comparison:</p>
+            <ul>
+                <li>The dev kernel is <em>faster</em> at memory allocation</li>
+                <li>Applications will experience reduced memory latency</li>
+                <li>Better overall system responsiveness</li>
+            </ul>
+        </div>
+
+        <div class="explanation bad">
+            <h3>❌ Negative Performance Changes</h3>
+            <p>When you see <strong>negative percentages</strong>:</p>
+            <ul>
+                <li>The dev kernel is <em>slower</em> at memory allocation</li>
+                <li>This might indicate a regression that needs investigation</li>
+                <li>Consider the trade-offs - sometimes slower allocation enables other benefits</li>
+            </ul>
+        </div>
+
+        <div class="warning">
+            <h3>⚠️ Important Notes</h3>
+            <ul>
+                <li><strong>Variability is normal</strong>: Performance can vary significantly across different thread counts</li>
+                <li><strong>Context matters</strong>: A regression in one area might be acceptable if it enables improvements elsewhere</li>
+                <li><strong>Real-world impact</strong>: The significance depends on your workload's memory access patterns</li>
+            </ul>
+        </div>
+
+        <h2>📈 Key Metrics Explained</h2>
+        <table>
+            <tr>
+                <th>Metric</th>
+                <th>Description</th>
+                <th>What it Means</th>
+            </tr>
+            <tr>
+                <td><span class="metric">fault-base</span></td>
+                <td>Standard 4KB page fault handling</td>
+                <td>How quickly the kernel can allocate regular memory pages</td>
+            </tr>
+            <tr>
+                <td><span class="metric">fault-huge</span></td>
+                <td>2MB huge page fault handling</td>
+                <td>Performance of large page allocations (important for databases, HPC)</td>
+            </tr>
+            <tr>
+                <td><span class="metric">fault-both</span></td>
+                <td>Mixed workload simulation</td>
+                <td>Real-world scenario with both page sizes</td>
+            </tr>
+            <tr>
+                <td><span class="metric">Thread Count</span></td>
+                <td>Number of parallel threads</td>
+                <td>Tests scalability across multiple CPU cores</td>
+            </tr>
+        </table>
+
+        <h2>🎯 Bottom Line</h2>
+        <div class="summary">
+            <p>This analysis helps you understand:</p>
+            <ul>
+                <li>Whether your kernel changes improved or regressed memory performance</li>
+                <li>How well your changes scale across multiple CPU cores</li>
+                <li>Which types of memory allocations are affected</li>
+                <li>The magnitude of performance changes in real-world scenarios</li>
+            </ul>
+            <p><strong>Use this data to make informed decisions about kernel optimizations and to identify areas needing further investigation.</strong></p>
+        </div>
+
+        <hr style="margin: 40px 0;">
+        <p style="text-align: center; color: #7f8c8d; font-size: 0.9em;">
+            Generated by kdevops mmtests analysis •
+            Baseline: {baseline_kernel} • Dev: {dev_kernel}
+        </p>
+    </div>
+</body>
+</html>"""
+
+    with open(os.path.join(output_dir, "graphs.html"), "w") as f:
+        f.write(html_content)
+
+
+def main():
+    if len(sys.argv) != 4:
+        print(
+            "Usage: python3 generate_mmtests_graphs.py <comparison.txt> <output_dir> <baseline_kernel>-<dev_kernel>"
+        )
+        sys.exit(1)
+
+    comparison_file = sys.argv[1]
+    output_dir = sys.argv[2]
+    kernel_names = sys.argv[3].split("-", 1)
+
+    if len(kernel_names) != 2:
+        print("Kernel names should be in format: baseline-dev")
+        sys.exit(1)
+
+    baseline_kernel, dev_kernel = kernel_names
+
+    # Create output directory if it doesn't exist
+    Path(output_dir).mkdir(parents=True, exist_ok=True)
+
+    # Parse the comparison data
+    print(f"Parsing comparison data from {comparison_file}...")
+    data = parse_comparison_file(comparison_file)
+
+    # Generate graphs
+    print("Generating performance comparison graphs...")
+    create_performance_comparison_graph(data, output_dir)
+
+    print("Generating thread analysis graphs...")
+    create_detailed_thread_analysis(data, output_dir)
+
+    # Generate HTML report
+    print("Generating HTML report...")
+    generate_graphs_html(output_dir, baseline_kernel, dev_kernel)
+
+    print(
+        f"✅ Analysis complete! Open {output_dir}/graphs.html in your browser to view results."
+    )
+
+
+if __name__ == "__main__":
+    main()
diff --git a/workflows/mmtests/Makefile b/workflows/mmtests/Makefile
index 06a75ead..b65d256b 100644
--- a/workflows/mmtests/Makefile
+++ b/workflows/mmtests/Makefile
@@ -13,19 +13,19 @@ mmtests:
 
 mmtests-baseline:
 	$(Q)ansible-playbook $(ANSIBLE_VERBOSE) \
-		-l baseline playbooks/mmtests.yml
+		-l 'mmtests:&baseline' playbooks/mmtests.yml \
 		--extra-vars=@./extra_vars.yaml  \
 		--tags run_tests \
 		$(MMTESTS_ARGS)
 
 mmtests-dev:
 	$(Q)ansible-playbook $(ANSIBLE_VERBOSE) \
-		-l dev playbooks/mmtests.yml \
+		-l 'mmtests:&dev' playbooks/mmtests.yml \
 		--extra-vars=@./extra_vars.yaml \
 		--tags run_tests \
 		$(MMTESTS_ARGS)
 
-mmtests-test: mmtests
+mmtests-tests: mmtests
 	$(Q)ansible-playbook $(ANSIBLE_VERBOSE) \
 		playbooks/mmtests.yml \
 		--extra-vars=@./extra_vars.yaml \
@@ -39,6 +39,13 @@ mmtests-results:
 		--tags results \
 		$(MMTESTS_ARGS)
 
+mmtests-compare:
+	$(Q)ansible-playbook $(ANSIBLE_VERBOSE) \
+		playbooks/mmtests-compare.yml \
+		--extra-vars=@./extra_vars.yaml \
+		--tags deps,compare \
+		$(MMTESTS_ARGS)
+
 mmtests-clean:
 	$(Q)ansible-playbook $(ANSIBLE_VERBOSE) \
 		playbooks/mmtests.yml \
@@ -51,8 +58,9 @@ mmtests-help:
 	@echo "mmtests                       : Setup and install mmtests"
 	@echo "mmtests-baseline             : Setup mmtests with baseline configuration"
 	@echo "mmtests-dev                  : Setup mmtests with dev configuration"
-	@echo "mmtests-test                 : Run mmtests tests"
+	@echo "mmtests-tests                : Run mmtests tests"
 	@echo "mmtests-results              : Copy results from guests"
+	@echo "mmtests-compare              : Compare baseline and dev results (AB testing)"
 	@echo "mmtests-clean                : Clean up mmtests installation"
 	@echo ""
 
@@ -61,8 +69,9 @@ HELP_TARGETS += mmtests-help
 PHONY +: mmtests
 PHONY +: mmtests-baseline
 PHONY +: mmtests-dev
-PHONY +: mmtests-test
+PHONY +: mmtests-tests
 PHONY +: mmtests-results
+PHONY +: mmtests-compare
 PHONY +: mmtests-clean
 PHONY +: mmtests-help
 .PHONY: $(PHONY)
diff --git a/workflows/mmtests/fixes/0001-compare-Fix-undefined-array-reference-when-no-operat.patch b/workflows/mmtests/fixes/0001-compare-Fix-undefined-array-reference-when-no-operat.patch
new file mode 100644
index 00000000..b1e0adc3
--- /dev/null
+++ b/workflows/mmtests/fixes/0001-compare-Fix-undefined-array-reference-when-no-operat.patch
@@ -0,0 +1,46 @@
+From d951f7feb7855ee5ea393d2bbe55e93c150295da Mon Sep 17 00:00:00 2001
+From: Luis Chamberlain <mcgrof@kernel.org>
+Date: Tue, 5 Aug 2025 14:12:00 -0700
+Subject: [PATCH] compare: Fix undefined array reference when no operations
+ found
+
+When a benchmark produces no results (e.g., thpcompact when the binary
+fails to build), the compare script would crash with:
+'Can't use an undefined value as an ARRAY reference at Compare.pm line 461'
+
+This happens because $operations[0] doesn't exist when no operations
+are found. Add proper bounds checking to handle empty results gracefully.
+
+Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
+---
+ bin/lib/MMTests/Compare.pm | 14 ++++++++------
+ 1 file changed, 8 insertions(+), 6 deletions(-)
+
+diff --git a/bin/lib/MMTests/Compare.pm b/bin/lib/MMTests/Compare.pm
+index 94b0819eca67..6ea1b6173d4e 100644
+--- a/bin/lib/MMTests/Compare.pm
++++ b/bin/lib/MMTests/Compare.pm
+@@ -458,12 +458,14 @@ sub _generateRenderTable() {
+ 
+ 	# Build column format table
+ 	my %resultsTable = %{$self->{_ResultsTable}};
+-	for (my $i = 0; $i <= (scalar(@{$resultsTable{$operations[0]}[0]})); $i++) {
+-		my $fieldFormat = "%${fieldLength}.${precision}f";
+-		if (defined $self->{_CompareTable}) {
+-			push @formatTable, ($fieldFormat, " (%${compareLength}.2f%%)");
+-		} else {
+-			push @formatTable, ($fieldFormat, "");
++	if (@operations > 0 && exists $resultsTable{$operations[0]} && defined $resultsTable{$operations[0]}[0]) {
++		for (my $i = 0; $i <= (scalar(@{$resultsTable{$operations[0]}[0]})); $i++) {
++			my $fieldFormat = "%${fieldLength}.${precision}f";
++			if (defined $self->{_CompareTable}) {
++				push @formatTable, ($fieldFormat, " (%${compareLength}.2f%%)");
++			} else {
++				push @formatTable, ($fieldFormat, "");
++			}
+ 		}
+ 	}
+ 
+-- 
+2.45.2
+
diff --git a/workflows/mmtests/fixes/0002-thpcompact-fix-library-order-in-gcc-command.patch b/workflows/mmtests/fixes/0002-thpcompact-fix-library-order-in-gcc-command.patch
new file mode 100644
index 00000000..0ab3e082
--- /dev/null
+++ b/workflows/mmtests/fixes/0002-thpcompact-fix-library-order-in-gcc-command.patch
@@ -0,0 +1,33 @@
+From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
+From: Luis Chamberlain <mcgrof@kernel.org>
+Date: Tue, 5 Aug 2025 14:35:00 -0700
+Subject: [PATCH] thpcompact: fix library order in gcc command
+
+The gcc command in thpcompact-install has the libraries specified
+before the source file, which causes linking errors on modern systems:
+
+  undefined reference to `get_mempolicy'
+
+Fix by moving the libraries after the source file, as required by
+modern linkers that process dependencies in order.
+
+Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
+---
+ shellpack_src/src/thpcompact/thpcompact-install | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/shellpack_src/src/thpcompact/thpcompact-install b/shellpack_src/src/thpcompact/thpcompact-install
+index 1234567..7654321 100644
+--- a/shellpack_src/src/thpcompact/thpcompact-install
++++ b/shellpack_src/src/thpcompact/thpcompact-install
+@@ -8,7 +8,7 @@
+ install-depends libnuma-devel
+ 
+ mkdir $SHELLPACK_SOURCES/thpcompact-${VERSION}-installed
+-gcc -Wall -g -lpthread -lnuma $SHELLPACK_TEMP/thpcompact.c -o $SHELLPACK_SOURCES/thpcompact-${VERSION}-installed/thpcompact || \
++gcc -Wall -g $SHELLPACK_TEMP/thpcompact.c -lpthread -lnuma -o $SHELLPACK_SOURCES/thpcompact-${VERSION}-installed/thpcompact || \
+ 	die "Failed to build thpcompact"
+ 
+ echo thpcompact installed successfully
+-- 
+2.45.2
\ No newline at end of file
-- 
2.47.2