From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 45BCA2E3397 for ; Mon, 11 Aug 2025 22:24:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754951097; cv=none; b=mb6oDCHKrN2SavqEv40koJ5gpty+vXRy4EPw6gPXpvI2bzoJV874h+VMdFjdLjlyauGWXIRouTpD8cAQETNPQbGkGj0OwmUrBGoN+L4OLK85VrUEX8O2XnmeM154fDwLqnd7HK+uzxprqqUPj46SsDCY0PIPaf2Rp1D+e7YQF6s= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754951097; c=relaxed/simple; bh=Gwvr4VyqPtLe+OqO7IfCDaMezo6M0JWEH8w1LbYcvqc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Q9ckIhGCLCjepSCwUJuhereGorEtQ229VIOp1CT9lCQP52zNE5Tj/OrOvgo7TD3uhnbtBC1Yu04iVB5gKb8yLCSStYFVMg2IAe8XKoBYfrrSDS7rw2N8V2Lfl7vJCLVxfmRQybd8mlJtOL1jalObeiZog2ieA4OQxXNN4r1HFF0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=wA9XIwwZ; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="wA9XIwwZ" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Sender:Content-Transfer-Encoding: Content-Type:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc: To:From:Reply-To:Content-ID:Content-Description; bh=bsMLh1EpifwFWJS9Id7hpAidU+hnzoS0be+VE0JucZg=; b=wA9XIwwZdgLA9JRv111s3wSGWC 2obbFk7O/AIwXKMbBF5sq0mlvo+9mAKBV+UYMHomWRyqM2Zd96gvhUB+pV+HQYLpUXl8e8/ul8Kbd Bi9MFTkTlEEgoVl+XxQm4REQX5wCtdp2PMY6t5GNVTvV4qMCCsk6OYgUvoke/C6RK3UCKZkOG4ESo g06gIU5TeZg5QRveKOFkLROojBit0Sq0LwJVII57FpA9T+7pKzq2E8ai5rMtEehJg6NfsO3XBuIxG pEcMubCqSSZM7YinnYM4163NSKmSZFcYXti75YI/ccz5a4X2T+nk/2AXY8dckZ4VxP9N870d9ap+p x98SnFHg==; Received: from mcgrof by bombadil.infradead.org with local (Exim 4.98.2 #2 (Red Hat Linux)) id 1ulawn-00000009Hjn-2BD8; Mon, 11 Aug 2025 22:24:53 +0000 From: Luis Chamberlain To: Chuck Lever , Daniel Gomez , kdevops@lists.linux.dev Cc: Luis Chamberlain Subject: [PATCH 12/23] reboot-limit: add graph visualization support for results Date: Mon, 11 Aug 2025 15:24:39 -0700 Message-ID: <20250811222452.2213071-13-mcgrof@kernel.org> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250811222452.2213071-1-mcgrof@kernel.org> References: <20250811222452.2213071-1-mcgrof@kernel.org> Precedence: bulk X-Mailing-List: kdevops@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: Luis Chamberlain Add support to analyze and visualize reboot-limit workflow results. This helps users understand boot performance trends and identify anomalies across multiple reboots. The implementation includes: - analyze_results.py: Parses systemd-analyze output and generates graphs showing boot time trends, component breakdown (kernel/initrd/userspace), and statistical analysis - generate_sample_data.py: Creates realistic test data for development - New Makefile targets: - reboot-limit-results: Analyze and display summary statistics - reboot-limit-graph: Generate visualization graphs The visualization provides: - Stacked area charts showing boot component times - Total boot time trends with statistical indicators (mean, median, stddev) - Summary statistics for each host including min/max/range - Support for multiple hosts (baseline and dev) Generated-by: Claude AI Signed-off-by: Luis Chamberlain --- .../demos/reboot-limit/analyze_results.py | 304 ++++++++++++++++++ .../reboot-limit/generate_sample_data.py | 73 +++++ workflows/demos/reboot-limit/Makefile | 13 +- 3 files changed, 388 insertions(+), 2 deletions(-) create mode 100755 scripts/workflows/demos/reboot-limit/analyze_results.py create mode 100755 scripts/workflows/demos/reboot-limit/generate_sample_data.py diff --git a/scripts/workflows/demos/reboot-limit/analyze_results.py b/scripts/workflows/demos/reboot-limit/analyze_results.py new file mode 100755 index 00000000..8842b409 --- /dev/null +++ b/scripts/workflows/demos/reboot-limit/analyze_results.py @@ -0,0 +1,304 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: copyleft-next-0.3.1 + +""" +Analyze and visualize reboot-limit workflow results. + +This script parses systemd-analyze output and generates graphs showing: +- Boot time trends across reboots +- Individual component times (kernel, initrd, userspace) +- Statistical analysis of boot performance +""" + +import os +import sys +import re +import argparse +import statistics +from pathlib import Path +import matplotlib.pyplot as plt +import matplotlib.ticker as ticker +from typing import Dict, List, Tuple, Optional + + +class RebootLimitAnalyzer: + """Analyzes reboot-limit workflow results.""" + + def __init__(self, results_dir: str): + self.results_dir = Path(results_dir) + self.hosts_data: Dict[str, Dict] = {} + + def parse_systemd_analyze_line(self, line: str) -> Optional[Dict[str, float]]: + """ + Parse a systemd-analyze output line. + + Example line: + Startup finished in 2.345s (kernel) + 1.234s (initrd) + 5.678s (userspace) = 9.257s + + Returns dict with times in seconds or None if parse fails. + """ + # Pattern for systemd-analyze output + pattern = r"Startup finished in ([\d.]+)s \(kernel\) \+ ([\d.]+)s \(initrd\) \+ ([\d.]+)s \(userspace\) = ([\d.]+)s" + + # Alternative pattern without initrd (for systems without initrd) + pattern_no_initrd = r"Startup finished in ([\d.]+)s \(kernel\) \+ ([\d.]+)s \(userspace\) = ([\d.]+)s" + + match = re.search(pattern, line) + if match: + return { + "kernel": float(match.group(1)), + "initrd": float(match.group(2)), + "userspace": float(match.group(3)), + "total": float(match.group(4)), + } + + match = re.search(pattern_no_initrd, line) + if match: + return { + "kernel": float(match.group(1)), + "initrd": 0.0, + "userspace": float(match.group(2)), + "total": float(match.group(3)), + } + + return None + + def load_host_data(self, host_dir: Path) -> Dict: + """Load and parse data for a single host.""" + data = {"boot_count": 0, "boot_times": []} + + # Read boot count + count_file = host_dir / "reboot-count.txt" + if count_file.exists(): + with open(count_file, "r") as f: + content = f.read().strip() + if content: + data["boot_count"] = int(content) + + # Read systemd-analyze results + analyze_file = host_dir / "systemctl-analyze.txt" + if analyze_file.exists(): + with open(analyze_file, "r") as f: + for line in f: + parsed = self.parse_systemd_analyze_line(line.strip()) + if parsed: + data["boot_times"].append(parsed) + + return data + + def load_all_data(self): + """Load data for all hosts in the results directory.""" + # Look for host directories + for item in self.results_dir.iterdir(): + if item.is_dir(): + self.hosts_data[item.name] = self.load_host_data(item) + + def calculate_statistics(self, times: List[float]) -> Dict[str, float]: + """Calculate statistical measures for a list of times.""" + if not times: + return {} + + return { + "min": min(times), + "max": max(times), + "mean": statistics.mean(times), + "median": statistics.median(times), + "stdev": statistics.stdev(times) if len(times) > 1 else 0, + } + + def plot_boot_times(self, output_file: str = "reboot_limit_analysis.png"): + """Generate plots for boot time analysis.""" + if not self.hosts_data: + print("No data to plot") + return + + # Create figure with subplots + num_hosts = len(self.hosts_data) + fig, axes = plt.subplots(num_hosts, 2, figsize=(14, 6 * num_hosts)) + + if num_hosts == 1: + axes = axes.reshape(1, -1) + + for idx, (host, data) in enumerate(self.hosts_data.items()): + boot_times = data["boot_times"] + if not boot_times: + continue + + # Extract time series + boot_numbers = list(range(1, len(boot_times) + 1)) + kernel_times = [bt["kernel"] for bt in boot_times] + initrd_times = [bt["initrd"] for bt in boot_times] + userspace_times = [bt["userspace"] for bt in boot_times] + total_times = [bt["total"] for bt in boot_times] + + # Plot 1: Stacked area chart of boot components + ax1 = axes[idx, 0] + ax1.fill_between(boot_numbers, 0, kernel_times, alpha=0.7, label="Kernel") + ax1.fill_between( + boot_numbers, + kernel_times, + [k + i for k, i in zip(kernel_times, initrd_times)], + alpha=0.7, + label="Initrd", + ) + ax1.fill_between( + boot_numbers, + [k + i for k, i in zip(kernel_times, initrd_times)], + total_times, + alpha=0.7, + label="Userspace", + ) + + ax1.set_xlabel("Boot Number") + ax1.set_ylabel("Time (seconds)") + ax1.set_title(f"{host}: Boot Component Times") + ax1.legend() + ax1.grid(True, alpha=0.3) + ax1.xaxis.set_major_locator(ticker.MaxNLocator(integer=True)) + + # Plot 2: Total boot time with statistics + ax2 = axes[idx, 1] + ax2.plot( + boot_numbers, total_times, "b-", linewidth=2, label="Total Boot Time" + ) + + # Add statistical lines + stats = self.calculate_statistics(total_times) + if stats: + ax2.axhline( + y=stats["mean"], + color="r", + linestyle="--", + label=f"Mean: {stats['mean']:.2f}s", + ) + ax2.axhline( + y=stats["median"], + color="g", + linestyle="--", + label=f"Median: {stats['median']:.2f}s", + ) + + # Add standard deviation band + if stats["stdev"] > 0: + ax2.fill_between( + boot_numbers, + stats["mean"] - stats["stdev"], + stats["mean"] + stats["stdev"], + alpha=0.2, + color="gray", + label=f"±1 StdDev: {stats['stdev']:.2f}s", + ) + + ax2.set_xlabel("Boot Number") + ax2.set_ylabel("Time (seconds)") + ax2.set_title(f"{host}: Total Boot Time Analysis") + ax2.legend() + ax2.grid(True, alpha=0.3) + ax2.xaxis.set_major_locator(ticker.MaxNLocator(integer=True)) + + # Add text box with statistics + stats_text = f"Boots: {data['boot_count']}\n" + if stats: + stats_text += f"Min: {stats['min']:.2f}s\n" + stats_text += f"Max: {stats['max']:.2f}s\n" + stats_text += f"Range: {stats['max'] - stats['min']:.2f}s" + + ax2.text( + 0.02, + 0.98, + stats_text, + transform=ax2.transAxes, + verticalalignment="top", + bbox=dict(boxstyle="round", facecolor="wheat", alpha=0.5), + ) + + plt.tight_layout() + plt.savefig(output_file, dpi=300, bbox_inches="tight") + print(f"Saved plot to {output_file}") + + def print_summary(self): + """Print a summary of the analysis to stdout.""" + for host, data in self.hosts_data.items(): + print(f"\n{'=' * 60}") + print(f"Host: {host}") + print(f"Total boots: {data['boot_count']}") + + if data["boot_times"]: + total_times = [bt["total"] for bt in data["boot_times"]] + stats = self.calculate_statistics(total_times) + + print(f"\nBoot time statistics:") + print(f" Samples analyzed: {len(total_times)}") + if stats: + print(f" Minimum: {stats['min']:.2f}s") + print(f" Maximum: {stats['max']:.2f}s") + print(f" Mean: {stats['mean']:.2f}s") + print(f" Median: {stats['median']:.2f}s") + print(f" StdDev: {stats['stdev']:.2f}s") + print(f" Range: {stats['max'] - stats['min']:.2f}s") + + # Component breakdown + kernel_times = [bt["kernel"] for bt in data["boot_times"]] + initrd_times = [bt["initrd"] for bt in data["boot_times"]] + userspace_times = [bt["userspace"] for bt in data["boot_times"]] + + print(f"\nComponent averages:") + print(f" Kernel: {statistics.mean(kernel_times):.2f}s") + if any(t > 0 for t in initrd_times): + print(f" Initrd: {statistics.mean(initrd_times):.2f}s") + print(f" Userspace: {statistics.mean(userspace_times):.2f}s") + else: + print(" No boot time data available") + + +def main(): + parser = argparse.ArgumentParser( + description="Analyze reboot-limit workflow results" + ) + parser.add_argument( + "results_dir", + nargs="?", + default="workflows/demos/reboot-limit/results", + help="Path to results directory (default: workflows/demos/reboot-limit/results)", + ) + parser.add_argument( + "-o", + "--output", + default="reboot_limit_analysis.png", + help="Output filename for plot (default: reboot_limit_analysis.png)", + ) + parser.add_argument( + "--no-plot", action="store_true", help="Skip plotting, only show summary" + ) + + args = parser.parse_args() + + # Check if results directory exists + if not os.path.exists(args.results_dir): + print(f"Error: Results directory '{args.results_dir}' not found") + sys.exit(1) + + # Create analyzer and load data + analyzer = RebootLimitAnalyzer(args.results_dir) + analyzer.load_all_data() + + if not analyzer.hosts_data: + print(f"No host data found in '{args.results_dir}'") + print("Make sure you've run 'make reboot-limit-baseline' first") + sys.exit(1) + + # Print summary + analyzer.print_summary() + + # Generate plots + if not args.no_plot: + try: + analyzer.plot_boot_times(args.output) + except ImportError: + print("\nWarning: matplotlib not installed. Install with:") + print(" pip install matplotlib") + print("Skipping plot generation.") + + +if __name__ == "__main__": + main() diff --git a/scripts/workflows/demos/reboot-limit/generate_sample_data.py b/scripts/workflows/demos/reboot-limit/generate_sample_data.py new file mode 100755 index 00000000..0a481dc0 --- /dev/null +++ b/scripts/workflows/demos/reboot-limit/generate_sample_data.py @@ -0,0 +1,73 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: copyleft-next-0.3.1 + +""" +Generate sample reboot-limit data for testing the visualization. +This is only for testing purposes. +""" + +import os +import random +from pathlib import Path + + +def generate_sample_data(results_dir: str, num_hosts: int = 2, num_boots: int = 50): + """Generate sample systemd-analyze data for testing.""" + results_path = Path(results_dir) + + for i in range(num_hosts): + if i == 0: + host_name = "demo-reboot-limit" + else: + host_name = f"demo-reboot-limit-dev" + + host_dir = results_path / host_name + host_dir.mkdir(parents=True, exist_ok=True) + + # Generate boot count + count_file = host_dir / "reboot-count.txt" + with open(count_file, "w") as f: + f.write(str(num_boots)) + + # Generate systemd-analyze data + analyze_file = host_dir / "systemctl-analyze.txt" + with open(analyze_file, "w") as f: + for boot in range(num_boots): + # Generate realistic boot times with some variation + kernel_base = 2.5 + ( + 0.1 if i == 0 else 0.15 + ) # Dev might be slightly slower + initrd_base = 1.2 + (0.05 if i == 0 else 0.08) + userspace_base = 5.5 + (0.2 if i == 0 else 0.3) + + # Add some random variation and occasional spikes + if boot % 10 == 0: # Occasional slow boot + spike = random.uniform(0.5, 2.0) + else: + spike = 0 + + kernel_time = kernel_base + random.uniform(-0.3, 0.3) + spike * 0.3 + initrd_time = initrd_base + random.uniform(-0.2, 0.2) + spike * 0.2 + userspace_time = ( + userspace_base + random.uniform(-0.5, 0.5) + spike * 0.5 + ) + + total_time = kernel_time + initrd_time + userspace_time + + line = f"Startup finished in {kernel_time:.3f}s (kernel) + {initrd_time:.3f}s (initrd) + {userspace_time:.3f}s (userspace) = {total_time:.3f}s\n" + f.write(line) + + print(f"Generated sample data for {host_name}") + + +if __name__ == "__main__": + import sys + + if len(sys.argv) > 1: + results_dir = sys.argv[1] + else: + results_dir = "workflows/demos/reboot-limit/results" + + print(f"Generating sample data in {results_dir}") + generate_sample_data(results_dir) + print("Sample data generation complete") diff --git a/workflows/demos/reboot-limit/Makefile b/workflows/demos/reboot-limit/Makefile index f739d8ce..f1411daf 100644 --- a/workflows/demos/reboot-limit/Makefile +++ b/workflows/demos/reboot-limit/Makefile @@ -189,14 +189,23 @@ reboot-limit-dev-reset: --tags vars,reset \ --extra-vars=@./extra_vars.yaml +reboot-limit-results: + $(Q)echo "Analyzing reboot-limit results..." + $(Q)python3 scripts/workflows/demos/reboot-limit/analyze_results.py + +reboot-limit-graph: reboot-limit-results + $(Q)echo "Graph saved to reboot_limit_analysis.png" + reboot-limit-help-menu: @echo "reboot-limit options:" @echo "reboot-limit - Sets up the /data/reboot-limit directory" - @echo "reboot-limit-baseline - Run the reboot-linit test on baseline hosts and collect results" + @echo "reboot-limit-baseline - Run the reboot-limit test on baseline hosts and collect results" @echo "reboot-limit-baseline-reset - Reset the test boot counter for baseline" - @echo "reboot-limit-dev - Run the reboot-limti test on dev hosts and collect results" + @echo "reboot-limit-dev - Run the reboot-limit test on dev hosts and collect results" @echo "reboot-limit-baseline-loop - Run the reboot-limit test in a loop until failure or steady state" @echo "reboot-limit-baseline-kotd - Run the reboot-limit kotd (kernel-of-the-day) loop" + @echo "reboot-limit-results - Analyze and summarize reboot-limit test results" + @echo "reboot-limit-graph - Generate graphs from reboot-limit test results" @echo "" HELP_TARGETS += reboot-limit-help-menu -- 2.47.2