From: Luis Chamberlain <mcgrof@kernel.org>
To: Chuck Lever <cel@kernel.org>, Daniel Gomez <da.gomez@kruces.com>,
kdevops@lists.linux.dev
Cc: Luis Chamberlain <mcgrof@kernel.org>
Subject: [PATCH 12/23] reboot-limit: add graph visualization support for results
Date: Mon, 11 Aug 2025 15:24:39 -0700 [thread overview]
Message-ID: <20250811222452.2213071-13-mcgrof@kernel.org> (raw)
In-Reply-To: <20250811222452.2213071-1-mcgrof@kernel.org>
Add support to analyze and visualize reboot-limit workflow results. This
helps users understand boot performance trends and identify anomalies
across multiple reboots.
The implementation includes:
- analyze_results.py: Parses systemd-analyze output and generates graphs
showing boot time trends, component breakdown (kernel/initrd/userspace),
and statistical analysis
- generate_sample_data.py: Creates realistic test data for development
- New Makefile targets:
- reboot-limit-results: Analyze and display summary statistics
- reboot-limit-graph: Generate visualization graphs
The visualization provides:
- Stacked area charts showing boot component times
- Total boot time trends with statistical indicators (mean, median, stddev)
- Summary statistics for each host including min/max/range
- Support for multiple hosts (baseline and dev)
Generated-by: Claude AI
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
.../demos/reboot-limit/analyze_results.py | 304 ++++++++++++++++++
.../reboot-limit/generate_sample_data.py | 73 +++++
workflows/demos/reboot-limit/Makefile | 13 +-
3 files changed, 388 insertions(+), 2 deletions(-)
create mode 100755 scripts/workflows/demos/reboot-limit/analyze_results.py
create mode 100755 scripts/workflows/demos/reboot-limit/generate_sample_data.py
diff --git a/scripts/workflows/demos/reboot-limit/analyze_results.py b/scripts/workflows/demos/reboot-limit/analyze_results.py
new file mode 100755
index 00000000..8842b409
--- /dev/null
+++ b/scripts/workflows/demos/reboot-limit/analyze_results.py
@@ -0,0 +1,304 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: copyleft-next-0.3.1
+
+"""
+Analyze and visualize reboot-limit workflow results.
+
+This script parses systemd-analyze output and generates graphs showing:
+- Boot time trends across reboots
+- Individual component times (kernel, initrd, userspace)
+- Statistical analysis of boot performance
+"""
+
+import os
+import sys
+import re
+import argparse
+import statistics
+from pathlib import Path
+import matplotlib.pyplot as plt
+import matplotlib.ticker as ticker
+from typing import Dict, List, Tuple, Optional
+
+
+class RebootLimitAnalyzer:
+ """Analyzes reboot-limit workflow results."""
+
+ def __init__(self, results_dir: str):
+ self.results_dir = Path(results_dir)
+ self.hosts_data: Dict[str, Dict] = {}
+
+ def parse_systemd_analyze_line(self, line: str) -> Optional[Dict[str, float]]:
+ """
+ Parse a systemd-analyze output line.
+
+ Example line:
+ Startup finished in 2.345s (kernel) + 1.234s (initrd) + 5.678s (userspace) = 9.257s
+
+ Returns dict with times in seconds or None if parse fails.
+ """
+ # Pattern for systemd-analyze output
+ pattern = r"Startup finished in ([\d.]+)s \(kernel\) \+ ([\d.]+)s \(initrd\) \+ ([\d.]+)s \(userspace\) = ([\d.]+)s"
+
+ # Alternative pattern without initrd (for systems without initrd)
+ pattern_no_initrd = r"Startup finished in ([\d.]+)s \(kernel\) \+ ([\d.]+)s \(userspace\) = ([\d.]+)s"
+
+ match = re.search(pattern, line)
+ if match:
+ return {
+ "kernel": float(match.group(1)),
+ "initrd": float(match.group(2)),
+ "userspace": float(match.group(3)),
+ "total": float(match.group(4)),
+ }
+
+ match = re.search(pattern_no_initrd, line)
+ if match:
+ return {
+ "kernel": float(match.group(1)),
+ "initrd": 0.0,
+ "userspace": float(match.group(2)),
+ "total": float(match.group(3)),
+ }
+
+ return None
+
+ def load_host_data(self, host_dir: Path) -> Dict:
+ """Load and parse data for a single host."""
+ data = {"boot_count": 0, "boot_times": []}
+
+ # Read boot count
+ count_file = host_dir / "reboot-count.txt"
+ if count_file.exists():
+ with open(count_file, "r") as f:
+ content = f.read().strip()
+ if content:
+ data["boot_count"] = int(content)
+
+ # Read systemd-analyze results
+ analyze_file = host_dir / "systemctl-analyze.txt"
+ if analyze_file.exists():
+ with open(analyze_file, "r") as f:
+ for line in f:
+ parsed = self.parse_systemd_analyze_line(line.strip())
+ if parsed:
+ data["boot_times"].append(parsed)
+
+ return data
+
+ def load_all_data(self):
+ """Load data for all hosts in the results directory."""
+ # Look for host directories
+ for item in self.results_dir.iterdir():
+ if item.is_dir():
+ self.hosts_data[item.name] = self.load_host_data(item)
+
+ def calculate_statistics(self, times: List[float]) -> Dict[str, float]:
+ """Calculate statistical measures for a list of times."""
+ if not times:
+ return {}
+
+ return {
+ "min": min(times),
+ "max": max(times),
+ "mean": statistics.mean(times),
+ "median": statistics.median(times),
+ "stdev": statistics.stdev(times) if len(times) > 1 else 0,
+ }
+
+ def plot_boot_times(self, output_file: str = "reboot_limit_analysis.png"):
+ """Generate plots for boot time analysis."""
+ if not self.hosts_data:
+ print("No data to plot")
+ return
+
+ # Create figure with subplots
+ num_hosts = len(self.hosts_data)
+ fig, axes = plt.subplots(num_hosts, 2, figsize=(14, 6 * num_hosts))
+
+ if num_hosts == 1:
+ axes = axes.reshape(1, -1)
+
+ for idx, (host, data) in enumerate(self.hosts_data.items()):
+ boot_times = data["boot_times"]
+ if not boot_times:
+ continue
+
+ # Extract time series
+ boot_numbers = list(range(1, len(boot_times) + 1))
+ kernel_times = [bt["kernel"] for bt in boot_times]
+ initrd_times = [bt["initrd"] for bt in boot_times]
+ userspace_times = [bt["userspace"] for bt in boot_times]
+ total_times = [bt["total"] for bt in boot_times]
+
+ # Plot 1: Stacked area chart of boot components
+ ax1 = axes[idx, 0]
+ ax1.fill_between(boot_numbers, 0, kernel_times, alpha=0.7, label="Kernel")
+ ax1.fill_between(
+ boot_numbers,
+ kernel_times,
+ [k + i for k, i in zip(kernel_times, initrd_times)],
+ alpha=0.7,
+ label="Initrd",
+ )
+ ax1.fill_between(
+ boot_numbers,
+ [k + i for k, i in zip(kernel_times, initrd_times)],
+ total_times,
+ alpha=0.7,
+ label="Userspace",
+ )
+
+ ax1.set_xlabel("Boot Number")
+ ax1.set_ylabel("Time (seconds)")
+ ax1.set_title(f"{host}: Boot Component Times")
+ ax1.legend()
+ ax1.grid(True, alpha=0.3)
+ ax1.xaxis.set_major_locator(ticker.MaxNLocator(integer=True))
+
+ # Plot 2: Total boot time with statistics
+ ax2 = axes[idx, 1]
+ ax2.plot(
+ boot_numbers, total_times, "b-", linewidth=2, label="Total Boot Time"
+ )
+
+ # Add statistical lines
+ stats = self.calculate_statistics(total_times)
+ if stats:
+ ax2.axhline(
+ y=stats["mean"],
+ color="r",
+ linestyle="--",
+ label=f"Mean: {stats['mean']:.2f}s",
+ )
+ ax2.axhline(
+ y=stats["median"],
+ color="g",
+ linestyle="--",
+ label=f"Median: {stats['median']:.2f}s",
+ )
+
+ # Add standard deviation band
+ if stats["stdev"] > 0:
+ ax2.fill_between(
+ boot_numbers,
+ stats["mean"] - stats["stdev"],
+ stats["mean"] + stats["stdev"],
+ alpha=0.2,
+ color="gray",
+ label=f"±1 StdDev: {stats['stdev']:.2f}s",
+ )
+
+ ax2.set_xlabel("Boot Number")
+ ax2.set_ylabel("Time (seconds)")
+ ax2.set_title(f"{host}: Total Boot Time Analysis")
+ ax2.legend()
+ ax2.grid(True, alpha=0.3)
+ ax2.xaxis.set_major_locator(ticker.MaxNLocator(integer=True))
+
+ # Add text box with statistics
+ stats_text = f"Boots: {data['boot_count']}\n"
+ if stats:
+ stats_text += f"Min: {stats['min']:.2f}s\n"
+ stats_text += f"Max: {stats['max']:.2f}s\n"
+ stats_text += f"Range: {stats['max'] - stats['min']:.2f}s"
+
+ ax2.text(
+ 0.02,
+ 0.98,
+ stats_text,
+ transform=ax2.transAxes,
+ verticalalignment="top",
+ bbox=dict(boxstyle="round", facecolor="wheat", alpha=0.5),
+ )
+
+ plt.tight_layout()
+ plt.savefig(output_file, dpi=300, bbox_inches="tight")
+ print(f"Saved plot to {output_file}")
+
+ def print_summary(self):
+ """Print a summary of the analysis to stdout."""
+ for host, data in self.hosts_data.items():
+ print(f"\n{'=' * 60}")
+ print(f"Host: {host}")
+ print(f"Total boots: {data['boot_count']}")
+
+ if data["boot_times"]:
+ total_times = [bt["total"] for bt in data["boot_times"]]
+ stats = self.calculate_statistics(total_times)
+
+ print(f"\nBoot time statistics:")
+ print(f" Samples analyzed: {len(total_times)}")
+ if stats:
+ print(f" Minimum: {stats['min']:.2f}s")
+ print(f" Maximum: {stats['max']:.2f}s")
+ print(f" Mean: {stats['mean']:.2f}s")
+ print(f" Median: {stats['median']:.2f}s")
+ print(f" StdDev: {stats['stdev']:.2f}s")
+ print(f" Range: {stats['max'] - stats['min']:.2f}s")
+
+ # Component breakdown
+ kernel_times = [bt["kernel"] for bt in data["boot_times"]]
+ initrd_times = [bt["initrd"] for bt in data["boot_times"]]
+ userspace_times = [bt["userspace"] for bt in data["boot_times"]]
+
+ print(f"\nComponent averages:")
+ print(f" Kernel: {statistics.mean(kernel_times):.2f}s")
+ if any(t > 0 for t in initrd_times):
+ print(f" Initrd: {statistics.mean(initrd_times):.2f}s")
+ print(f" Userspace: {statistics.mean(userspace_times):.2f}s")
+ else:
+ print(" No boot time data available")
+
+
+def main():
+ parser = argparse.ArgumentParser(
+ description="Analyze reboot-limit workflow results"
+ )
+ parser.add_argument(
+ "results_dir",
+ nargs="?",
+ default="workflows/demos/reboot-limit/results",
+ help="Path to results directory (default: workflows/demos/reboot-limit/results)",
+ )
+ parser.add_argument(
+ "-o",
+ "--output",
+ default="reboot_limit_analysis.png",
+ help="Output filename for plot (default: reboot_limit_analysis.png)",
+ )
+ parser.add_argument(
+ "--no-plot", action="store_true", help="Skip plotting, only show summary"
+ )
+
+ args = parser.parse_args()
+
+ # Check if results directory exists
+ if not os.path.exists(args.results_dir):
+ print(f"Error: Results directory '{args.results_dir}' not found")
+ sys.exit(1)
+
+ # Create analyzer and load data
+ analyzer = RebootLimitAnalyzer(args.results_dir)
+ analyzer.load_all_data()
+
+ if not analyzer.hosts_data:
+ print(f"No host data found in '{args.results_dir}'")
+ print("Make sure you've run 'make reboot-limit-baseline' first")
+ sys.exit(1)
+
+ # Print summary
+ analyzer.print_summary()
+
+ # Generate plots
+ if not args.no_plot:
+ try:
+ analyzer.plot_boot_times(args.output)
+ except ImportError:
+ print("\nWarning: matplotlib not installed. Install with:")
+ print(" pip install matplotlib")
+ print("Skipping plot generation.")
+
+
+if __name__ == "__main__":
+ main()
diff --git a/scripts/workflows/demos/reboot-limit/generate_sample_data.py b/scripts/workflows/demos/reboot-limit/generate_sample_data.py
new file mode 100755
index 00000000..0a481dc0
--- /dev/null
+++ b/scripts/workflows/demos/reboot-limit/generate_sample_data.py
@@ -0,0 +1,73 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: copyleft-next-0.3.1
+
+"""
+Generate sample reboot-limit data for testing the visualization.
+This is only for testing purposes.
+"""
+
+import os
+import random
+from pathlib import Path
+
+
+def generate_sample_data(results_dir: str, num_hosts: int = 2, num_boots: int = 50):
+ """Generate sample systemd-analyze data for testing."""
+ results_path = Path(results_dir)
+
+ for i in range(num_hosts):
+ if i == 0:
+ host_name = "demo-reboot-limit"
+ else:
+ host_name = f"demo-reboot-limit-dev"
+
+ host_dir = results_path / host_name
+ host_dir.mkdir(parents=True, exist_ok=True)
+
+ # Generate boot count
+ count_file = host_dir / "reboot-count.txt"
+ with open(count_file, "w") as f:
+ f.write(str(num_boots))
+
+ # Generate systemd-analyze data
+ analyze_file = host_dir / "systemctl-analyze.txt"
+ with open(analyze_file, "w") as f:
+ for boot in range(num_boots):
+ # Generate realistic boot times with some variation
+ kernel_base = 2.5 + (
+ 0.1 if i == 0 else 0.15
+ ) # Dev might be slightly slower
+ initrd_base = 1.2 + (0.05 if i == 0 else 0.08)
+ userspace_base = 5.5 + (0.2 if i == 0 else 0.3)
+
+ # Add some random variation and occasional spikes
+ if boot % 10 == 0: # Occasional slow boot
+ spike = random.uniform(0.5, 2.0)
+ else:
+ spike = 0
+
+ kernel_time = kernel_base + random.uniform(-0.3, 0.3) + spike * 0.3
+ initrd_time = initrd_base + random.uniform(-0.2, 0.2) + spike * 0.2
+ userspace_time = (
+ userspace_base + random.uniform(-0.5, 0.5) + spike * 0.5
+ )
+
+ total_time = kernel_time + initrd_time + userspace_time
+
+ line = f"Startup finished in {kernel_time:.3f}s (kernel) + {initrd_time:.3f}s (initrd) + {userspace_time:.3f}s (userspace) = {total_time:.3f}s\n"
+ f.write(line)
+
+ print(f"Generated sample data for {host_name}")
+
+
+if __name__ == "__main__":
+ import sys
+
+ if len(sys.argv) > 1:
+ results_dir = sys.argv[1]
+ else:
+ results_dir = "workflows/demos/reboot-limit/results"
+
+ print(f"Generating sample data in {results_dir}")
+ generate_sample_data(results_dir)
+ print("Sample data generation complete")
diff --git a/workflows/demos/reboot-limit/Makefile b/workflows/demos/reboot-limit/Makefile
index f739d8ce..f1411daf 100644
--- a/workflows/demos/reboot-limit/Makefile
+++ b/workflows/demos/reboot-limit/Makefile
@@ -189,14 +189,23 @@ reboot-limit-dev-reset:
--tags vars,reset \
--extra-vars=@./extra_vars.yaml
+reboot-limit-results:
+ $(Q)echo "Analyzing reboot-limit results..."
+ $(Q)python3 scripts/workflows/demos/reboot-limit/analyze_results.py
+
+reboot-limit-graph: reboot-limit-results
+ $(Q)echo "Graph saved to reboot_limit_analysis.png"
+
reboot-limit-help-menu:
@echo "reboot-limit options:"
@echo "reboot-limit - Sets up the /data/reboot-limit directory"
- @echo "reboot-limit-baseline - Run the reboot-linit test on baseline hosts and collect results"
+ @echo "reboot-limit-baseline - Run the reboot-limit test on baseline hosts and collect results"
@echo "reboot-limit-baseline-reset - Reset the test boot counter for baseline"
- @echo "reboot-limit-dev - Run the reboot-limti test on dev hosts and collect results"
+ @echo "reboot-limit-dev - Run the reboot-limit test on dev hosts and collect results"
@echo "reboot-limit-baseline-loop - Run the reboot-limit test in a loop until failure or steady state"
@echo "reboot-limit-baseline-kotd - Run the reboot-limit kotd (kernel-of-the-day) loop"
+ @echo "reboot-limit-results - Analyze and summarize reboot-limit test results"
+ @echo "reboot-limit-graph - Generate graphs from reboot-limit test results"
@echo ""
HELP_TARGETS += reboot-limit-help-menu
--
2.47.2
next prev parent reply other threads:[~2025-08-11 22:24 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-11 22:24 [PATCH 00/23] remove old kernel-ci and enhance reboot-limit Luis Chamberlain
2025-08-11 22:24 ` [PATCH 01/23] fstests: remove CONFIG_KERNEL_CI support Luis Chamberlain
2025-08-11 22:24 ` [PATCH 02/23] fstests: remove kernel-ci script symlinks Luis Chamberlain
2025-08-11 22:24 ` [PATCH 03/23] blktests: remove CONFIG_KERNEL_CI support Luis Chamberlain
2025-08-11 22:24 ` [PATCH 04/23] gitr: " Luis Chamberlain
2025-08-11 22:24 ` [PATCH 05/23] ltp: " Luis Chamberlain
2025-08-11 22:24 ` [PATCH 06/23] nfstest: " Luis Chamberlain
2025-08-11 22:24 ` [PATCH 07/23] pynfs: " Luis Chamberlain
2025-08-11 22:24 ` [PATCH 08/23] reboot-limit: convert CONFIG_KERNEL_CI to internal loop feature Luis Chamberlain
2025-08-11 22:24 ` [PATCH 09/23] kconfig: remove CONFIG_KERNEL_CI infrastructure Luis Chamberlain
2025-08-11 22:24 ` [PATCH 10/23] scripts: remove kernel-ci loop infrastructure Luis Chamberlain
2025-08-11 22:24 ` [PATCH 11/23] reboot-limit: simplify what gets selected Luis Chamberlain
2025-08-11 22:24 ` Luis Chamberlain [this message]
2025-08-11 22:24 ` [PATCH 13/23] reboot-limit: save graphs in organized results/graphs directory Luis Chamberlain
2025-08-11 22:24 ` [PATCH 14/23] docs: add comprehensive reboot-limit workflow documentation Luis Chamberlain
2025-08-11 22:24 ` [PATCH 15/23] reboot-limit: add kexec-tools dependency installation Luis Chamberlain
2025-08-11 22:24 ` [PATCH 16/23] reboot-limit: add A/B testing support targets Luis Chamberlain
2025-08-11 22:24 ` [PATCH 17/23] reboot-limit: fix kexec and reboot connection handling Luis Chamberlain
2025-08-11 22:24 ` [PATCH 18/23] reboot-limit: add COUNT parameter to override reboot count Luis Chamberlain
2025-08-11 22:24 ` [PATCH 19/23] reboot-limit: fix wait_for tasks using wrong host reference Luis Chamberlain
2025-08-11 22:24 ` [PATCH 20/23] reboot-limit: use ansible reboot module for all reboot types Luis Chamberlain
2025-08-11 22:24 ` [PATCH 21/23] reboot-limit: fix COUNT parameter to properly override reboot count Luis Chamberlain
2025-08-11 22:24 ` [PATCH 22/23] reboot-limit: handle empty dev group gracefully Luis Chamberlain
2025-08-11 22:24 ` [PATCH 23/23] reboot-limit: add kexec comparison feature Luis Chamberlain
2025-08-12 15:06 ` [PATCH 00/23] remove old kernel-ci and enhance reboot-limit Chuck Lever
2025-08-13 1:28 ` Luis Chamberlain
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250811222452.2213071-13-mcgrof@kernel.org \
--to=mcgrof@kernel.org \
--cc=cel@kernel.org \
--cc=da.gomez@kruces.com \
--cc=kdevops@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox