[PATCH 12/23] reboot-limit: add graph visualization support for results

public inbox for kdevops@lists.linux.dev
 help / color / mirror / Atom feed

From: Luis Chamberlain <mcgrof@kernel.org>
To: Chuck Lever <cel@kernel.org>, Daniel Gomez <da.gomez@kruces.com>,
	kdevops@lists.linux.dev
Cc: Luis Chamberlain <mcgrof@kernel.org>
Subject: [PATCH 12/23] reboot-limit: add graph visualization support for results
Date: Mon, 11 Aug 2025 15:24:39 -0700	[thread overview]
Message-ID: <20250811222452.2213071-13-mcgrof@kernel.org> (raw)
In-Reply-To: <20250811222452.2213071-1-mcgrof@kernel.org>

Add support to analyze and visualize reboot-limit workflow results. This
helps users understand boot performance trends and identify anomalies
across multiple reboots.

The implementation includes:
- analyze_results.py: Parses systemd-analyze output and generates graphs
  showing boot time trends, component breakdown (kernel/initrd/userspace),
  and statistical analysis
- generate_sample_data.py: Creates realistic test data for development
- New Makefile targets:
  - reboot-limit-results: Analyze and display summary statistics
  - reboot-limit-graph: Generate visualization graphs

The visualization provides:
- Stacked area charts showing boot component times
- Total boot time trends with statistical indicators (mean, median, stddev)
- Summary statistics for each host including min/max/range
- Support for multiple hosts (baseline and dev)

Generated-by: Claude AI
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 .../demos/reboot-limit/analyze_results.py     | 304 ++++++++++++++++++
 .../reboot-limit/generate_sample_data.py      |  73 +++++
 workflows/demos/reboot-limit/Makefile         |  13 +-
 3 files changed, 388 insertions(+), 2 deletions(-)
 create mode 100755 scripts/workflows/demos/reboot-limit/analyze_results.py
 create mode 100755 scripts/workflows/demos/reboot-limit/generate_sample_data.py

diff --git a/scripts/workflows/demos/reboot-limit/analyze_results.py b/scripts/workflows/demos/reboot-limit/analyze_results.py
new file mode 100755
index 00000000..8842b409
--- /dev/null
+++ b/scripts/workflows/demos/reboot-limit/analyze_results.py
@@ -0,0 +1,304 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: copyleft-next-0.3.1
+
+"""
+Analyze and visualize reboot-limit workflow results.
+
+This script parses systemd-analyze output and generates graphs showing:
+- Boot time trends across reboots
+- Individual component times (kernel, initrd, userspace)
+- Statistical analysis of boot performance
+"""
+
+import os
+import sys
+import re
+import argparse
+import statistics
+from pathlib import Path
+import matplotlib.pyplot as plt
+import matplotlib.ticker as ticker
+from typing import Dict, List, Tuple, Optional
+
+
+class RebootLimitAnalyzer:
+    """Analyzes reboot-limit workflow results."""
+
+    def __init__(self, results_dir: str):
+        self.results_dir = Path(results_dir)
+        self.hosts_data: Dict[str, Dict] = {}
+
+    def parse_systemd_analyze_line(self, line: str) -> Optional[Dict[str, float]]:
+        """
+        Parse a systemd-analyze output line.
+
+        Example line:
+        Startup finished in 2.345s (kernel) + 1.234s (initrd) + 5.678s (userspace) = 9.257s
+
+        Returns dict with times in seconds or None if parse fails.
+        """
+        # Pattern for systemd-analyze output
+        pattern = r"Startup finished in ([\d.]+)s \(kernel\) \+ ([\d.]+)s \(initrd\) \+ ([\d.]+)s \(userspace\) = ([\d.]+)s"
+
+        # Alternative pattern without initrd (for systems without initrd)
+        pattern_no_initrd = r"Startup finished in ([\d.]+)s \(kernel\) \+ ([\d.]+)s \(userspace\) = ([\d.]+)s"
+
+        match = re.search(pattern, line)
+        if match:
+            return {
+                "kernel": float(match.group(1)),
+                "initrd": float(match.group(2)),
+                "userspace": float(match.group(3)),
+                "total": float(match.group(4)),
+            }
+
+        match = re.search(pattern_no_initrd, line)
+        if match:
+            return {
+                "kernel": float(match.group(1)),
+                "initrd": 0.0,
+                "userspace": float(match.group(2)),
+                "total": float(match.group(3)),
+            }
+
+        return None
+
+    def load_host_data(self, host_dir: Path) -> Dict:
+        """Load and parse data for a single host."""
+        data = {"boot_count": 0, "boot_times": []}
+
+        # Read boot count
+        count_file = host_dir / "reboot-count.txt"
+        if count_file.exists():
+            with open(count_file, "r") as f:
+                content = f.read().strip()
+                if content:
+                    data["boot_count"] = int(content)
+
+        # Read systemd-analyze results
+        analyze_file = host_dir / "systemctl-analyze.txt"
+        if analyze_file.exists():
+            with open(analyze_file, "r") as f:
+                for line in f:
+                    parsed = self.parse_systemd_analyze_line(line.strip())
+                    if parsed:
+                        data["boot_times"].append(parsed)
+
+        return data
+
+    def load_all_data(self):
+        """Load data for all hosts in the results directory."""
+        # Look for host directories
+        for item in self.results_dir.iterdir():
+            if item.is_dir():
+                self.hosts_data[item.name] = self.load_host_data(item)
+
+    def calculate_statistics(self, times: List[float]) -> Dict[str, float]:
+        """Calculate statistical measures for a list of times."""
+        if not times:
+            return {}
+
+        return {
+            "min": min(times),
+            "max": max(times),
+            "mean": statistics.mean(times),
+            "median": statistics.median(times),
+            "stdev": statistics.stdev(times) if len(times) > 1 else 0,
+        }
+
+    def plot_boot_times(self, output_file: str = "reboot_limit_analysis.png"):
+        """Generate plots for boot time analysis."""
+        if not self.hosts_data:
+            print("No data to plot")
+            return
+
+        # Create figure with subplots
+        num_hosts = len(self.hosts_data)
+        fig, axes = plt.subplots(num_hosts, 2, figsize=(14, 6 * num_hosts))
+
+        if num_hosts == 1:
+            axes = axes.reshape(1, -1)
+
+        for idx, (host, data) in enumerate(self.hosts_data.items()):
+            boot_times = data["boot_times"]
+            if not boot_times:
+                continue
+
+            # Extract time series
+            boot_numbers = list(range(1, len(boot_times) + 1))
+            kernel_times = [bt["kernel"] for bt in boot_times]
+            initrd_times = [bt["initrd"] for bt in boot_times]
+            userspace_times = [bt["userspace"] for bt in boot_times]
+            total_times = [bt["total"] for bt in boot_times]
+
+            # Plot 1: Stacked area chart of boot components
+            ax1 = axes[idx, 0]
+            ax1.fill_between(boot_numbers, 0, kernel_times, alpha=0.7, label="Kernel")
+            ax1.fill_between(
+                boot_numbers,
+                kernel_times,
+                [k + i for k, i in zip(kernel_times, initrd_times)],
+                alpha=0.7,
+                label="Initrd",
+            )
+            ax1.fill_between(
+                boot_numbers,
+                [k + i for k, i in zip(kernel_times, initrd_times)],
+                total_times,
+                alpha=0.7,
+                label="Userspace",
+            )
+
+            ax1.set_xlabel("Boot Number")
+            ax1.set_ylabel("Time (seconds)")
+            ax1.set_title(f"{host}: Boot Component Times")
+            ax1.legend()
+            ax1.grid(True, alpha=0.3)
+            ax1.xaxis.set_major_locator(ticker.MaxNLocator(integer=True))
+
+            # Plot 2: Total boot time with statistics
+            ax2 = axes[idx, 1]
+            ax2.plot(
+                boot_numbers, total_times, "b-", linewidth=2, label="Total Boot Time"
+            )
+
+            # Add statistical lines
+            stats = self.calculate_statistics(total_times)
+            if stats:
+                ax2.axhline(
+                    y=stats["mean"],
+                    color="r",
+                    linestyle="--",
+                    label=f"Mean: {stats['mean']:.2f}s",
+                )
+                ax2.axhline(
+                    y=stats["median"],
+                    color="g",
+                    linestyle="--",
+                    label=f"Median: {stats['median']:.2f}s",
+                )
+
+                # Add standard deviation band
+                if stats["stdev"] > 0:
+                    ax2.fill_between(
+                        boot_numbers,
+                        stats["mean"] - stats["stdev"],
+                        stats["mean"] + stats["stdev"],
+                        alpha=0.2,
+                        color="gray",
+                        label=f"±1 StdDev: {stats['stdev']:.2f}s",
+                    )
+
+            ax2.set_xlabel("Boot Number")
+            ax2.set_ylabel("Time (seconds)")
+            ax2.set_title(f"{host}: Total Boot Time Analysis")
+            ax2.legend()
+            ax2.grid(True, alpha=0.3)
+            ax2.xaxis.set_major_locator(ticker.MaxNLocator(integer=True))
+
+            # Add text box with statistics
+            stats_text = f"Boots: {data['boot_count']}\n"
+            if stats:
+                stats_text += f"Min: {stats['min']:.2f}s\n"
+                stats_text += f"Max: {stats['max']:.2f}s\n"
+                stats_text += f"Range: {stats['max'] - stats['min']:.2f}s"
+
+            ax2.text(
+                0.02,
+                0.98,
+                stats_text,
+                transform=ax2.transAxes,
+                verticalalignment="top",
+                bbox=dict(boxstyle="round", facecolor="wheat", alpha=0.5),
+            )
+
+        plt.tight_layout()
+        plt.savefig(output_file, dpi=300, bbox_inches="tight")
+        print(f"Saved plot to {output_file}")
+
+    def print_summary(self):
+        """Print a summary of the analysis to stdout."""
+        for host, data in self.hosts_data.items():
+            print(f"\n{'=' * 60}")
+            print(f"Host: {host}")
+            print(f"Total boots: {data['boot_count']}")
+
+            if data["boot_times"]:
+                total_times = [bt["total"] for bt in data["boot_times"]]
+                stats = self.calculate_statistics(total_times)
+
+                print(f"\nBoot time statistics:")
+                print(f"  Samples analyzed: {len(total_times)}")
+                if stats:
+                    print(f"  Minimum: {stats['min']:.2f}s")
+                    print(f"  Maximum: {stats['max']:.2f}s")
+                    print(f"  Mean: {stats['mean']:.2f}s")
+                    print(f"  Median: {stats['median']:.2f}s")
+                    print(f"  StdDev: {stats['stdev']:.2f}s")
+                    print(f"  Range: {stats['max'] - stats['min']:.2f}s")
+
+                # Component breakdown
+                kernel_times = [bt["kernel"] for bt in data["boot_times"]]
+                initrd_times = [bt["initrd"] for bt in data["boot_times"]]
+                userspace_times = [bt["userspace"] for bt in data["boot_times"]]
+
+                print(f"\nComponent averages:")
+                print(f"  Kernel: {statistics.mean(kernel_times):.2f}s")
+                if any(t > 0 for t in initrd_times):
+                    print(f"  Initrd: {statistics.mean(initrd_times):.2f}s")
+                print(f"  Userspace: {statistics.mean(userspace_times):.2f}s")
+            else:
+                print("  No boot time data available")
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="Analyze reboot-limit workflow results"
+    )
+    parser.add_argument(
+        "results_dir",
+        nargs="?",
+        default="workflows/demos/reboot-limit/results",
+        help="Path to results directory (default: workflows/demos/reboot-limit/results)",
+    )
+    parser.add_argument(
+        "-o",
+        "--output",
+        default="reboot_limit_analysis.png",
+        help="Output filename for plot (default: reboot_limit_analysis.png)",
+    )
+    parser.add_argument(
+        "--no-plot", action="store_true", help="Skip plotting, only show summary"
+    )
+
+    args = parser.parse_args()
+
+    # Check if results directory exists
+    if not os.path.exists(args.results_dir):
+        print(f"Error: Results directory '{args.results_dir}' not found")
+        sys.exit(1)
+
+    # Create analyzer and load data
+    analyzer = RebootLimitAnalyzer(args.results_dir)
+    analyzer.load_all_data()
+
+    if not analyzer.hosts_data:
+        print(f"No host data found in '{args.results_dir}'")
+        print("Make sure you've run 'make reboot-limit-baseline' first")
+        sys.exit(1)
+
+    # Print summary
+    analyzer.print_summary()
+
+    # Generate plots
+    if not args.no_plot:
+        try:
+            analyzer.plot_boot_times(args.output)
+        except ImportError:
+            print("\nWarning: matplotlib not installed. Install with:")
+            print("  pip install matplotlib")
+            print("Skipping plot generation.")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/scripts/workflows/demos/reboot-limit/generate_sample_data.py b/scripts/workflows/demos/reboot-limit/generate_sample_data.py
new file mode 100755
index 00000000..0a481dc0
--- /dev/null
+++ b/scripts/workflows/demos/reboot-limit/generate_sample_data.py
@@ -0,0 +1,73 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: copyleft-next-0.3.1
+
+"""
+Generate sample reboot-limit data for testing the visualization.
+This is only for testing purposes.
+"""
+
+import os
+import random
+from pathlib import Path
+
+
+def generate_sample_data(results_dir: str, num_hosts: int = 2, num_boots: int = 50):
+    """Generate sample systemd-analyze data for testing."""
+    results_path = Path(results_dir)
+
+    for i in range(num_hosts):
+        if i == 0:
+            host_name = "demo-reboot-limit"
+        else:
+            host_name = f"demo-reboot-limit-dev"
+
+        host_dir = results_path / host_name
+        host_dir.mkdir(parents=True, exist_ok=True)
+
+        # Generate boot count
+        count_file = host_dir / "reboot-count.txt"
+        with open(count_file, "w") as f:
+            f.write(str(num_boots))
+
+        # Generate systemd-analyze data
+        analyze_file = host_dir / "systemctl-analyze.txt"
+        with open(analyze_file, "w") as f:
+            for boot in range(num_boots):
+                # Generate realistic boot times with some variation
+                kernel_base = 2.5 + (
+                    0.1 if i == 0 else 0.15
+                )  # Dev might be slightly slower
+                initrd_base = 1.2 + (0.05 if i == 0 else 0.08)
+                userspace_base = 5.5 + (0.2 if i == 0 else 0.3)
+
+                # Add some random variation and occasional spikes
+                if boot % 10 == 0:  # Occasional slow boot
+                    spike = random.uniform(0.5, 2.0)
+                else:
+                    spike = 0
+
+                kernel_time = kernel_base + random.uniform(-0.3, 0.3) + spike * 0.3
+                initrd_time = initrd_base + random.uniform(-0.2, 0.2) + spike * 0.2
+                userspace_time = (
+                    userspace_base + random.uniform(-0.5, 0.5) + spike * 0.5
+                )
+
+                total_time = kernel_time + initrd_time + userspace_time
+
+                line = f"Startup finished in {kernel_time:.3f}s (kernel) + {initrd_time:.3f}s (initrd) + {userspace_time:.3f}s (userspace) = {total_time:.3f}s\n"
+                f.write(line)
+
+        print(f"Generated sample data for {host_name}")
+
+
+if __name__ == "__main__":
+    import sys
+
+    if len(sys.argv) > 1:
+        results_dir = sys.argv[1]
+    else:
+        results_dir = "workflows/demos/reboot-limit/results"
+
+    print(f"Generating sample data in {results_dir}")
+    generate_sample_data(results_dir)
+    print("Sample data generation complete")
diff --git a/workflows/demos/reboot-limit/Makefile b/workflows/demos/reboot-limit/Makefile
index f739d8ce..f1411daf 100644
--- a/workflows/demos/reboot-limit/Makefile
+++ b/workflows/demos/reboot-limit/Makefile
@@ -189,14 +189,23 @@ reboot-limit-dev-reset:
 		--tags vars,reset \
 		--extra-vars=@./extra_vars.yaml
 
+reboot-limit-results:
+	$(Q)echo "Analyzing reboot-limit results..."
+	$(Q)python3 scripts/workflows/demos/reboot-limit/analyze_results.py
+
+reboot-limit-graph: reboot-limit-results
+	$(Q)echo "Graph saved to reboot_limit_analysis.png"
+
 reboot-limit-help-menu:
 	@echo "reboot-limit options:"
 	@echo "reboot-limit                             - Sets up the /data/reboot-limit directory"
-	@echo "reboot-limit-baseline                    - Run the reboot-linit test on baseline hosts and collect results"
+	@echo "reboot-limit-baseline                    - Run the reboot-limit test on baseline hosts and collect results"
 	@echo "reboot-limit-baseline-reset              - Reset the test boot counter for baseline"
-	@echo "reboot-limit-dev                         - Run the reboot-limti test on dev hosts and collect results"
+	@echo "reboot-limit-dev                         - Run the reboot-limit test on dev hosts and collect results"
 	@echo "reboot-limit-baseline-loop               - Run the reboot-limit test in a loop until failure or steady state"
 	@echo "reboot-limit-baseline-kotd               - Run the reboot-limit kotd (kernel-of-the-day) loop"
+	@echo "reboot-limit-results                     - Analyze and summarize reboot-limit test results"
+	@echo "reboot-limit-graph                       - Generate graphs from reboot-limit test results"
 	@echo ""
 
 HELP_TARGETS += reboot-limit-help-menu
-- 
2.47.2

next prev parent reply	other threads:[~2025-08-11 22:24 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-11 22:24 [PATCH 00/23] remove old kernel-ci and enhance reboot-limit Luis Chamberlain
2025-08-11 22:24 ` [PATCH 01/23] fstests: remove CONFIG_KERNEL_CI support Luis Chamberlain
2025-08-11 22:24 ` [PATCH 02/23] fstests: remove kernel-ci script symlinks Luis Chamberlain
2025-08-11 22:24 ` [PATCH 03/23] blktests: remove CONFIG_KERNEL_CI support Luis Chamberlain
2025-08-11 22:24 ` [PATCH 04/23] gitr: " Luis Chamberlain
2025-08-11 22:24 ` [PATCH 05/23] ltp: " Luis Chamberlain
2025-08-11 22:24 ` [PATCH 06/23] nfstest: " Luis Chamberlain
2025-08-11 22:24 ` [PATCH 07/23] pynfs: " Luis Chamberlain
2025-08-11 22:24 ` [PATCH 08/23] reboot-limit: convert CONFIG_KERNEL_CI to internal loop feature Luis Chamberlain
2025-08-11 22:24 ` [PATCH 09/23] kconfig: remove CONFIG_KERNEL_CI infrastructure Luis Chamberlain
2025-08-11 22:24 ` [PATCH 10/23] scripts: remove kernel-ci loop infrastructure Luis Chamberlain
2025-08-11 22:24 ` [PATCH 11/23] reboot-limit: simplify what gets selected Luis Chamberlain
2025-08-11 22:24 ` Luis Chamberlain [this message]
2025-08-11 22:24 ` [PATCH 13/23] reboot-limit: save graphs in organized results/graphs directory Luis Chamberlain
2025-08-11 22:24 ` [PATCH 14/23] docs: add comprehensive reboot-limit workflow documentation Luis Chamberlain
2025-08-11 22:24 ` [PATCH 15/23] reboot-limit: add kexec-tools dependency installation Luis Chamberlain
2025-08-11 22:24 ` [PATCH 16/23] reboot-limit: add A/B testing support targets Luis Chamberlain
2025-08-11 22:24 ` [PATCH 17/23] reboot-limit: fix kexec and reboot connection handling Luis Chamberlain
2025-08-11 22:24 ` [PATCH 18/23] reboot-limit: add COUNT parameter to override reboot count Luis Chamberlain
2025-08-11 22:24 ` [PATCH 19/23] reboot-limit: fix wait_for tasks using wrong host reference Luis Chamberlain
2025-08-11 22:24 ` [PATCH 20/23] reboot-limit: use ansible reboot module for all reboot types Luis Chamberlain
2025-08-11 22:24 ` [PATCH 21/23] reboot-limit: fix COUNT parameter to properly override reboot count Luis Chamberlain
2025-08-11 22:24 ` [PATCH 22/23] reboot-limit: handle empty dev group gracefully Luis Chamberlain
2025-08-11 22:24 ` [PATCH 23/23] reboot-limit: add kexec comparison feature Luis Chamberlain
2025-08-12 15:06 ` [PATCH 00/23] remove old kernel-ci and enhance reboot-limit Chuck Lever
2025-08-13  1:28   ` Luis Chamberlain

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:8842b40 dfblob:0a481dc dfblob:f739d8c dfblob:f1411da )
 OR (
bs:"[PATCH 12/23] reboot-limit: add graph visualization support for results" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250811222452.2213071-13-mcgrof@kernel.org \
    --to=mcgrof@kernel.org \
    --cc=cel@kernel.org \
    --cc=da.gomez@kruces.com \
    --cc=kdevops@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox