From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 30F8F2E6105 for ; Thu, 4 Sep 2025 09:13:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.137.202.133 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756977206; cv=none; b=oN5DnjbXr68K61+9by2gU96DMQvQmUZZD2klfW1gYeaw/5/NZMCw6y7WDhTelZFcx5daMlT39hrrU+hw/bD7GTw2YOO78+azFhHgzLjm2Nql/Tiweh4q/DkVz/5kA5MS3RzUEEtIrPwDlCd+U5Fy0CrRgAKD4EwnOHUDJ5T7AmA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756977206; c=relaxed/simple; bh=s+va++2Iwi9WkWsvKMpMN/OGZXb3xX4YDs9pe9KENfY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=AIXF8EGoyBZotHRk3ZRL8LUvIL+cEi48OCo6fWfJWMorM0goWr7SSvP+BzCHDk29CGWsnG6PtrdqOxINPQFrti/pHNRmKzhMgU8H8yfZ4gnQnZtJ1kaXZFoD2t6KoVMc0WTX+2DK64uYPaEq6cEbM5t29AFzO5xANoMDNBCVMnw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=O/NMtO1+; arc=none smtp.client-ip=198.137.202.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=kernel.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="O/NMtO1+" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Sender:Content-Transfer-Encoding: Content-Type:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:Cc: To:From:Reply-To:Content-ID:Content-Description; bh=pTuUMZuACCVFFyc12bYxBTdkHE50saeUwmYzWb/2h54=; b=O/NMtO1+z4dnv1iH69ZgDCg9kk 6hG//UUxkYczQ1JGH8vIviz9DFag4cqfqZmKOYTZzShLnkguuHWnm1BcPQBof5CrLzHR9BsQUb78u ICvtdttO1QDgvnZBsw+OB5kIrj6khQNq6bbgu3tRFGHKdbPSct1QaW5gPlTMPCKjoTd/iGFG9Yh/N CdLRDIqt/t/W/Js1EDimcWpDBcydszniDRfPOp1Mv+2znFEOwgrCtBIOpZ82Mz8tOJ+LYOVoRAEfu MGF/LNYoBjXmUjLPf9Hg9ACfmw5N24pd6NHQ0taUDx6p0uLmCC8F3yXEu6gZvUHnwIBmgFnIZ19JN ELykGh+g==; Received: from mcgrof by bombadil.infradead.org with local (Exim 4.98.2 #2 (Red Hat Linux)) id 1uu61y-0000000AU8a-3pjF; Thu, 04 Sep 2025 09:13:22 +0000 From: Luis Chamberlain To: Chuck Lever , Daniel Gomez , kdevops@lists.linux.dev Cc: Luis Chamberlain Subject: [PATCH 1/5] monitoring: add memory fragmentation eBPF monitoring support Date: Thu, 4 Sep 2025 02:13:17 -0700 Message-ID: <20250904091322.2499058-2-mcgrof@kernel.org> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250904091322.2499058-1-mcgrof@kernel.org> References: <20250904091322.2499058-1-mcgrof@kernel.org> Precedence: bulk X-Mailing-List: kdevops@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: Luis Chamberlain Add support for memory fragmentation monitoring using eBPF-based tracking, we leverage the plot-fragmentation effort [0]. This provides real-time tracking of memory allocation events and fragmentation indices with matplotlib visualization. Features: - eBPF tracepoint-based fragmentation tracking - Real-time fragmentation index monitoring - Automatic plot generation with fragmentation_visualizer.py - Configurable monitoring duration and output directory - Integration with existing monitoring framework The scripts are included directly in kdevops rather than cloning an external repository for simplicity. Generated-by: Claude AI Link: https://github.com/mcgrof/plot-fragmentation # [0] Signed-off-by: Luis Chamberlain --- kconfigs/monitors/Kconfig | 53 + .../tasks/install-deps/debian/main.yml | 1 + .../tasks/install-deps/redhat/main.yml | 1 + .../fstests/tasks/install-deps/suse/main.yml | 1 + .../monitoring/files/fragmentation_tracker.py | 533 ++++++++ .../files/fragmentation_visualizer.py | 1161 +++++++++++++++++ .../monitoring/tasks/monitor_collect.yml | 126 ++ .../roles/monitoring/tasks/monitor_run.yml | 123 ++ 8 files changed, 1999 insertions(+) create mode 100644 playbooks/roles/monitoring/files/fragmentation_tracker.py create mode 100644 playbooks/roles/monitoring/files/fragmentation_visualizer.py diff --git a/kconfigs/monitors/Kconfig b/kconfigs/monitors/Kconfig index 6dc1ddbdd2e9..bd4dc81fa11d 100644 --- a/kconfigs/monitors/Kconfig +++ b/kconfigs/monitors/Kconfig @@ -61,6 +61,59 @@ config MONITOR_FOLIO_MIGRATION_INTERVAL performance. Higher values reduce overhead but may miss short-lived migration events. +config MONITOR_MEMORY_FRAGMENTATION + bool "Monitor memory fragmentation with eBPF" + output yaml + default n + help + Enable monitoring of memory fragmentation using eBPF-based tracking. + This provides advanced memory fragmentation visualization using + eBPF tracepoints and matplotlib. + + This tool tracks memory allocation events and fragmentation indices + in real-time, providing insights that traditional methods like + /proc/pagetypeinfo cannot fully capture. + + Features: + - eBPF-based tracepoint tracking + - Real-time fragmentation index monitoring + - Page mobility tracking + - Matplotlib visualization of fragmentation data + + Requirements: + - Python 3 with python3-bpfcc + - Kernel with required tracepoint support + - Root privileges for eBPF attachment + + The tool is particularly useful for investigating whether Large Block + Size support in the kernel creates worse fragmentation. + +config MONITOR_FRAGMENTATION_DURATION + int "Fragmentation monitoring duration (seconds)" + output yaml + default 0 + depends on MONITOR_MEMORY_FRAGMENTATION + help + Duration to run fragmentation monitoring in seconds. + Set to 0 for continuous monitoring until workflow completion. + + The monitoring will automatically stop when the workflow + finishes or when this duration expires, whichever comes first. + +config MONITOR_FRAGMENTATION_OUTPUT_DIR + string "Fragmentation monitoring output directory" + output yaml + default "/root/monitoring/fragmentation" + depends on MONITOR_MEMORY_FRAGMENTATION + help + Directory where fragmentation monitoring data and plots will be stored. + This directory will be created if it doesn't exist. + + The collected data includes: + - Raw eBPF trace data + - Generated matplotlib plots + - JSON formatted fragmentation metrics + endif # MONITOR_DEVELOPMENTAL_STATS # Future monitoring options can be added here diff --git a/playbooks/roles/fstests/tasks/install-deps/debian/main.yml b/playbooks/roles/fstests/tasks/install-deps/debian/main.yml index cbcb3788d2bd..cc4a5a6b10af 100644 --- a/playbooks/roles/fstests/tasks/install-deps/debian/main.yml +++ b/playbooks/roles/fstests/tasks/install-deps/debian/main.yml @@ -73,6 +73,7 @@ - xfsdump - cifs-utils - duperemove + - python3-bpfcc state: present update_cache: true tags: ["fstests", "deps"] diff --git a/playbooks/roles/fstests/tasks/install-deps/redhat/main.yml b/playbooks/roles/fstests/tasks/install-deps/redhat/main.yml index c1bd7f82f0aa..3c681c1a06fb 100644 --- a/playbooks/roles/fstests/tasks/install-deps/redhat/main.yml +++ b/playbooks/roles/fstests/tasks/install-deps/redhat/main.yml @@ -72,6 +72,7 @@ - gettext - ncurses - ncurses-devel + - python3-bcc - name: Install xfsprogs-xfs_scrub become: true diff --git a/playbooks/roles/fstests/tasks/install-deps/suse/main.yml b/playbooks/roles/fstests/tasks/install-deps/suse/main.yml index 3247567a8cfa..54de4a7ad3d5 100644 --- a/playbooks/roles/fstests/tasks/install-deps/suse/main.yml +++ b/playbooks/roles/fstests/tasks/install-deps/suse/main.yml @@ -124,6 +124,7 @@ - libcap-progs - fio - parted + - python3-bcc state: present when: - repos_present|bool diff --git a/playbooks/roles/monitoring/files/fragmentation_tracker.py b/playbooks/roles/monitoring/files/fragmentation_tracker.py new file mode 100644 index 000000000000..7b66f3232960 --- /dev/null +++ b/playbooks/roles/monitoring/files/fragmentation_tracker.py @@ -0,0 +1,533 @@ +#!/usr/bin/env python3 +""" +Enhanced eBPF-based memory fragmentation tracker. +Primary focus on mm_page_alloc_extfrag events with optional compaction tracking. +""" + +from bcc import BPF +import time +import signal +import sys +import os +import json +import argparse +from collections import defaultdict +from datetime import datetime + +# eBPF program to trace fragmentation events +bpf_program = """ +#include +#include +#include + +// Event types +#define EVENT_COMPACTION_SUCCESS 1 +#define EVENT_COMPACTION_FAILURE 2 +#define EVENT_EXTFRAG 3 + +struct fragmentation_event { + u64 timestamp; + u32 pid; + u32 tid; + u8 event_type; // 1=compact_success, 2=compact_fail, 3=extfrag + + // Common fields + u32 order; + int fragmentation_index; + int zone_idx; + int node_id; + + // ExtFrag specific fields + int fallback_order; // Order of the fallback allocation + int migrate_from; // Original migrate type + int migrate_to; // Fallback migrate type + int fallback_blocks; // Number of pageblocks involved + int is_steal; // Whether this is a steal vs claim + + // Process info + char comm[16]; +}; + +BPF_PERF_OUTPUT(events); + +// Statistics tracking +BPF_HASH(extfrag_stats, u32, u64); // Key: order, Value: count +BPF_HASH(compact_stats, u32, u64); // Key: order|success<<16, Value: count + +// Helper to get current fragmentation state (simplified) +static inline int get_fragmentation_estimate(int order) { + // This is a simplified estimate + // In real implementation, we'd need to walk buddy lists + // For now, return a placeholder that indicates we need fragmentation data + if (order <= 3) return 100; // Low order usually OK + if (order <= 6) return 400; // Medium order moderate frag + return 700; // High order typically fragmented +} + +// Trace external fragmentation events (page steal/claim from different migratetype) +TRACEPOINT_PROBE(kmem, mm_page_alloc_extfrag) { + struct fragmentation_event event = {}; + + event.timestamp = bpf_ktime_get_ns(); + event.pid = bpf_get_current_pid_tgid() >> 32; + event.tid = bpf_get_current_pid_tgid() & 0xFFFFFFFF; + event.event_type = EVENT_EXTFRAG; + + // Extract tracepoint arguments + // Note: Field names may vary by kernel version + // Typical fields: alloc_order, fallback_order, + // alloc_migratetype, fallback_migratetype, change_ownership + + event.order = args->alloc_order; + event.fallback_order = args->fallback_order; + event.migrate_from = args->fallback_migratetype; + event.migrate_to = args->alloc_migratetype; + + // change_ownership indicates if the whole pageblock was claimed + // 0 = steal (partial), 1 = claim (whole block) + event.is_steal = args->change_ownership ? 0 : 1; + + // Node ID - set to -1 as page struct access is kernel-specific + // Could be enhanced with kernel version detection + event.node_id = -1; + event.zone_idx = -1; + + // Estimate fragmentation at this point + event.fragmentation_index = get_fragmentation_estimate(event.order); + + // Get process name + bpf_get_current_comm(&event.comm, sizeof(event.comm)); + + events.perf_submit(args, &event, sizeof(event)); + + // Update statistics + u64 *count = extfrag_stats.lookup(&event.order); + if (count) { + (*count)++; + } else { + u64 initial = 1; + extfrag_stats.update(&event.order, &initial); + } + + return 0; +} + +// Optional: Trace compaction success (if tracepoint exists) +#ifdef TRACE_COMPACTION +TRACEPOINT_PROBE(page_alloc, mm_compaction_success) { + struct fragmentation_event event = {}; + + event.timestamp = bpf_ktime_get_ns(); + event.pid = bpf_get_current_pid_tgid() >> 32; + event.tid = bpf_get_current_pid_tgid() & 0xFFFFFFFF; + event.event_type = EVENT_COMPACTION_SUCCESS; + + event.order = args->order; + event.fragmentation_index = args->ret; + event.zone_idx = args->idx; + event.node_id = args->nid; + + bpf_get_current_comm(&event.comm, sizeof(event.comm)); + + events.perf_submit(args, &event, sizeof(event)); + + u32 key = (event.order) | (1 << 16); // Set success bit + u64 *count = compact_stats.lookup(&key); + if (count) { + (*count)++; + } else { + u64 initial = 1; + compact_stats.update(&key, &initial); + } + + return 0; +} + +TRACEPOINT_PROBE(page_alloc, mm_compaction_failure) { + struct fragmentation_event event = {}; + + event.timestamp = bpf_ktime_get_ns(); + event.pid = bpf_get_current_pid_tgid() >> 32; + event.tid = bpf_get_current_pid_tgid() & 0xFFFFFFFF; + event.event_type = EVENT_COMPACTION_FAILURE; + + event.order = args->order; + event.fragmentation_index = -1; + event.zone_idx = -1; + event.node_id = -1; + + bpf_get_current_comm(&event.comm, sizeof(event.comm)); + + events.perf_submit(args, &event, sizeof(event)); + + u32 key = event.order; // No success bit + u64 *count = compact_stats.lookup(&key); + if (count) { + (*count)++; + } else { + u64 initial = 1; + compact_stats.update(&key, &initial); + } + + return 0; +} +#endif +""" + +# Migrate type names for better readability +MIGRATE_TYPES = { + 0: "UNMOVABLE", + 1: "MOVABLE", + 2: "RECLAIMABLE", + 3: "PCPTYPES", + 4: "HIGHATOMIC", + 5: "CMA", + 6: "ISOLATE", +} + + +class FragmentationTracker: + def __init__(self, verbose=True, output_file=None): + self.start_time = time.time() + self.events_data = [] + self.extfrag_stats = defaultdict(int) + self.compact_stats = defaultdict(lambda: {"success": 0, "failure": 0}) + self.zone_names = ["DMA", "DMA32", "Normal", "Movable", "Device"] + self.verbose = verbose + self.output_file = output_file + self.event_count = 0 + self.interrupted = False + + def process_event(self, cpu, data, size): + """Process a fragmentation event from eBPF.""" + event = self.b["events"].event(data) + + # Calculate relative time from start + rel_time = (event.timestamp - self.start_ns) / 1e9 + + # Decode process name + try: + comm = event.comm.decode("utf-8", "replace") + except: + comm = "unknown" + + # Determine event type and format output + if event.event_type == 3: # EXTFRAG event + event_name = "EXTFRAG" + color = "\033[93m" # Yellow + + # Get migrate type names + from_type = MIGRATE_TYPES.get( + event.migrate_from, f"TYPE_{event.migrate_from}" + ) + to_type = MIGRATE_TYPES.get(event.migrate_to, f"TYPE_{event.migrate_to}") + + # Store event data + event_dict = { + "timestamp": rel_time, + "absolute_time": datetime.now().isoformat(), + "event_type": "extfrag", + "pid": event.pid, + "tid": event.tid, + "comm": comm, + "order": event.order, + "fallback_order": event.fallback_order, + "migrate_from": from_type, + "migrate_to": to_type, + "is_steal": bool(event.is_steal), + "node": event.node_id, + "fragmentation_index": event.fragmentation_index, + } + + self.extfrag_stats[event.order] += 1 + + if self.verbose: + action = "steal" if event.is_steal else "claim" + print( + f"{color}[{rel_time:8.3f}s] {event_name:10s}\033[0m " + f"Order={event.order:2d} FallbackOrder={event.fallback_order:2d} " + f"{from_type:10s}->{to_type:10s} ({action}) " + f"Process={comm:12s} PID={event.pid:6d}" + ) + + elif event.event_type == 1: # COMPACTION_SUCCESS + event_name = "COMPACT_OK" + color = "\033[92m" # Green + + zone_name = ( + self.zone_names[event.zone_idx] + if 0 <= event.zone_idx < len(self.zone_names) + else "Unknown" + ) + + event_dict = { + "timestamp": rel_time, + "absolute_time": datetime.now().isoformat(), + "event_type": "compaction_success", + "pid": event.pid, + "comm": comm, + "order": event.order, + "fragmentation_index": event.fragmentation_index, + "zone": zone_name, + "node": event.node_id, + } + + self.compact_stats[event.order]["success"] += 1 + + if self.verbose: + print( + f"{color}[{rel_time:8.3f}s] {event_name:10s}\033[0m " + f"Order={event.order:2d} FragIdx={event.fragmentation_index:5d} " + f"Zone={zone_name:8s} Node={event.node_id:2d} " + f"Process={comm:12s} PID={event.pid:6d}" + ) + + else: # COMPACTION_FAILURE + event_name = "COMPACT_FAIL" + color = "\033[91m" # Red + + event_dict = { + "timestamp": rel_time, + "absolute_time": datetime.now().isoformat(), + "event_type": "compaction_failure", + "pid": event.pid, + "comm": comm, + "order": event.order, + "fragmentation_index": -1, + } + + self.compact_stats[event.order]["failure"] += 1 + + if self.verbose: + print( + f"{color}[{rel_time:8.3f}s] {event_name:10s}\033[0m " + f"Order={event.order:2d} " + f"Process={comm:12s} PID={event.pid:6d}" + ) + + self.events_data.append(event_dict) + self.event_count += 1 + + def print_summary(self): + """Print summary statistics.""" + print("\n" + "=" * 80) + print("FRAGMENTATION TRACKING SUMMARY") + print("=" * 80) + + total_events = len(self.events_data) + print(f"\nTotal events captured: {total_events}") + + if total_events > 0: + # Count by type + extfrag_count = sum( + 1 for e in self.events_data if e["event_type"] == "extfrag" + ) + compact_success = sum( + 1 for e in self.events_data if e["event_type"] == "compaction_success" + ) + compact_fail = sum( + 1 for e in self.events_data if e["event_type"] == "compaction_failure" + ) + + print(f"\nEvent breakdown:") + print(f" External Fragmentation: {extfrag_count}") + print(f" Compaction Success: {compact_success}") + print(f" Compaction Failure: {compact_fail}") + + # ExtFrag analysis + if extfrag_count > 0: + print("\nExternal Fragmentation Events by Order:") + print("-" * 40) + print(f"{'Order':<8} {'Count':<10} {'Percentage':<10}") + print("-" * 40) + + for order in sorted(self.extfrag_stats.keys()): + count = self.extfrag_stats[order] + pct = (count / extfrag_count) * 100 + print(f"{order:<8} {count:<10} {pct:<10.1f}%") + + # Analyze migrate type patterns + extfrag_events = [ + e for e in self.events_data if e["event_type"] == "extfrag" + ] + migrate_patterns = defaultdict(int) + steal_vs_claim = {"steal": 0, "claim": 0} + + for e in extfrag_events: + pattern = f"{e['migrate_from']}->{e['migrate_to']}" + migrate_patterns[pattern] += 1 + if e["is_steal"]: + steal_vs_claim["steal"] += 1 + else: + steal_vs_claim["claim"] += 1 + + print("\nMigrate Type Patterns:") + print("-" * 40) + for pattern, count in sorted( + migrate_patterns.items(), key=lambda x: x[1], reverse=True + )[:5]: + print( + f" {pattern:<30} {count:5d} ({count/extfrag_count*100:5.1f}%)" + ) + + print(f"\nSteal vs Claim:") + print( + f" Steal (partial): {steal_vs_claim['steal']} ({steal_vs_claim['steal']/extfrag_count*100:.1f}%)" + ) + print( + f" Claim (whole): {steal_vs_claim['claim']} ({steal_vs_claim['claim']/extfrag_count*100:.1f}%)" + ) + + # Compaction analysis + if self.compact_stats: + print("\nCompaction Events by Order:") + print("-" * 40) + print( + f"{'Order':<8} {'Success':<10} {'Failure':<10} {'Total':<10} {'Success%':<10}" + ) + print("-" * 40) + + for order in sorted(self.compact_stats.keys()): + stats = self.compact_stats[order] + total = stats["success"] + stats["failure"] + success_pct = (stats["success"] / total * 100) if total > 0 else 0 + print( + f"{order:<8} {stats['success']:<10} {stats['failure']:<10} " + f"{total:<10} {success_pct:<10.1f}" + ) + + def save_data(self, filename=None): + """Save captured data to JSON file for visualization.""" + if filename is None and self.output_file: + filename = self.output_file + + if filename is None: + timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") + filename = f"fragmentation_data_{timestamp}.json" + + # Prepare statistics + stats = {} + + # ExtFrag stats + stats["extfrag"] = dict(self.extfrag_stats) + + # Compaction stats + stats["compaction"] = {} + for order, counts in self.compact_stats.items(): + stats["compaction"][str(order)] = counts + + output = { + "metadata": { + "start_time": self.start_time, + "end_time": time.time(), + "duration": time.time() - self.start_time, + "total_events": len(self.events_data), + "kernel_version": os.uname().release, + }, + "events": self.events_data, + "statistics": stats, + } + + with open(filename, "w") as f: + json.dump(output, f, indent=2) + print(f"\nData saved to {filename}") + return filename + + def run(self): + """Main execution loop.""" + print("Compiling eBPF program...") + + # Check if compaction tracepoints are available + has_compaction = os.path.exists( + "/sys/kernel/debug/tracing/events/page_alloc/mm_compaction_success" + ) + + # Modify BPF program based on available tracepoints + program = bpf_program + if has_compaction: + program = program.replace("#ifdef TRACE_COMPACTION", "#if 1") + print(" Compaction tracepoints: AVAILABLE") + else: + program = program.replace("#ifdef TRACE_COMPACTION", "#if 0") + print(" Compaction tracepoints: NOT AVAILABLE (will track extfrag only)") + + self.b = BPF(text=program) + self.start_ns = time.perf_counter_ns() + + # Setup event handler + self.b["events"].open_perf_buffer(self.process_event) + + # Determine output filename upfront + if self.output_file: + save_file = self.output_file + else: + timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") + save_file = f"fragmentation_data_{timestamp}.json" + + print("\nStarting fragmentation event tracking...") + print(f"Primary focus: mm_page_alloc_extfrag events") + print(f"Data will be saved to: {save_file}") + print("Press Ctrl+C to stop and see summary\n") + print("-" * 80) + print(f"{'Time':>10s} {'Event':>12s} {'Details'}") + print("-" * 80) + + try: + while not self.interrupted: + self.b.perf_buffer_poll() + except KeyboardInterrupt: + self.interrupted = True + finally: + # Always save data on exit + self.print_summary() + self.save_data() + + +def main(): + parser = argparse.ArgumentParser( + description="Track memory fragmentation events using eBPF" + ) + parser.add_argument("-o", "--output", help="Output JSON file") + parser.add_argument( + "-t", "--time", type=int, help="Run for specified seconds then exit" + ) + parser.add_argument( + "-q", + "--quiet", + action="store_true", + help="Suppress event output (summary only)", + ) + + args = parser.parse_args() + + # Check for root privileges + if os.geteuid() != 0: + print("This script must be run as root (uses eBPF)") + sys.exit(1) + + # Create tracker instance + tracker = FragmentationTracker(verbose=not args.quiet, output_file=args.output) + + # Set up signal handler + def signal_handler_with_tracker(sig, frame): + tracker.interrupted = True + + signal.signal(signal.SIGINT, signal_handler_with_tracker) + + if args.time: + # Run for specified time + import threading + + def timeout_handler(): + time.sleep(args.time) + tracker.interrupted = True + + timer = threading.Thread(target=timeout_handler) + timer.daemon = True + timer.start() + + tracker.run() + + +if __name__ == "__main__": + main() diff --git a/playbooks/roles/monitoring/files/fragmentation_visualizer.py b/playbooks/roles/monitoring/files/fragmentation_visualizer.py new file mode 100644 index 000000000000..f3891de61e35 --- /dev/null +++ b/playbooks/roles/monitoring/files/fragmentation_visualizer.py @@ -0,0 +1,1161 @@ +#!/usr/bin/env python3 +""" +Enhanced fragmentation A/B comparison with overlaid visualizations. +Combines datasets on the same graphs using different visual markers. + +Usage: + python c6.py fragmentation_data_A.json --compare fragmentation_data_B.json -o comparison.png +""" +import json +import sys +import numpy as np +import matplotlib.pyplot as plt +import matplotlib.gridspec as gridspec +import matplotlib.patches as mpatches +from matplotlib.patches import Rectangle +from datetime import datetime +import argparse +from collections import defaultdict + + +def load_data(filename): + with open(filename, "r") as f: + return json.load(f) + + +def get_dot_size(order: int) -> float: + base_size = 1 + return base_size + (order * 2) + + +def build_counts(events, bin_size): + if not events: + return np.array([]), np.array([]) + times = np.array([e["timestamp"] for e in events], dtype=float) + tmin, tmax = times.min(), times.max() + if tmax == tmin: + tmax = tmin + bin_size + bins = np.arange(tmin, tmax + bin_size, bin_size) + counts, edges = np.histogram(times, bins=bins) + centers = (edges[:-1] + edges[1:]) / 2.0 + return centers, counts + + +def get_migrate_type_color(mtype): + """Get consistent colors for migrate types""" + colors = { + "UNMOVABLE": "#e74c3c", # Red + "MOVABLE": "#2ecc71", # Green + "RECLAIMABLE": "#f39c12", # Orange + "PCPTYPES": "#9b59b6", # Purple + "HIGHATOMIC": "#e91e63", # Pink + "ISOLATE": "#607d8b", # Blue-grey + "CMA": "#00bcd4", # Cyan + } + return colors.get(mtype, "#95a5a6") + + +def get_migration_severity(from_type, to_type): + """Determine migration severity score""" + if to_type == "UNMOVABLE": + return -2 # Very bad + elif from_type == "UNMOVABLE": + return 1 # Good + elif to_type == "MOVABLE": + return 1 # Good + elif from_type == "MOVABLE" and to_type == "RECLAIMABLE": + return -1 # Somewhat bad + elif to_type == "RECLAIMABLE": + return -0.5 # Slightly bad + return 0 + + +def get_severity_color(severity_score): + """Get color based on severity score""" + if severity_score <= -2: + return "#8b0000" # Dark red + elif severity_score <= -1: + return "#ff6b6b" # Light red + elif severity_score >= 1: + return "#51cf66" # Green + else: + return "#ffd43b" # Yellow + + +def create_overlaid_compaction_graph(ax, data_a, data_b, labels): + """Create overlaid compaction events graph""" + + # Process dataset A + events_a = data_a.get("events", []) + compact_a = [ + e + for e in events_a + if e["event_type"] in ["compaction_success", "compaction_failure"] + ] + success_a = [e for e in compact_a if e["event_type"] == "compaction_success"] + failure_a = [e for e in compact_a if e["event_type"] == "compaction_failure"] + + # Process dataset B + events_b = data_b.get("events", []) + compact_b = [ + e + for e in events_b + if e["event_type"] in ["compaction_success", "compaction_failure"] + ] + success_b = [e for e in compact_b if e["event_type"] == "compaction_success"] + failure_b = [e for e in compact_b if e["event_type"] == "compaction_failure"] + + # Plot A with circles + for e in success_a: + ax.scatter( + e["timestamp"], + e.get("fragmentation_index", 0), + s=get_dot_size(e["order"]), + c="#2ecc71", + alpha=0.3, + edgecolors="none", + marker="o", + label=None, + ) + + for i, e in enumerate(failure_a): + y_pos = -50 - (e["order"] * 10) + ax.scatter( + e["timestamp"], + y_pos, + s=get_dot_size(e["order"]), + c="#e74c3c", + alpha=0.3, + edgecolors="none", + marker="o", + label=None, + ) + + # Plot B with triangles + for e in success_b: + ax.scatter( + e["timestamp"], + e.get("fragmentation_index", 0), + s=get_dot_size(e["order"]) * 1.2, + c="#27ae60", + alpha=0.4, + edgecolors="black", + linewidths=0.5, + marker="^", + label=None, + ) + + for i, e in enumerate(failure_b): + y_pos = -55 - (e["order"] * 10) # Slightly offset from A + ax.scatter( + e["timestamp"], + y_pos, + s=get_dot_size(e["order"]) * 1.2, + c="#c0392b", + alpha=0.4, + edgecolors="black", + linewidths=0.5, + marker="^", + label=None, + ) + + # Set y-axis limits - cap at 1000, ignore data above + all_y_values = [] + if success_a: + all_y_values.extend( + [ + e.get("fragmentation_index", 0) + for e in success_a + if e.get("fragmentation_index", 0) <= 1000 + ] + ) + if success_b: + all_y_values.extend( + [ + e.get("fragmentation_index", 0) + for e in success_b + if e.get("fragmentation_index", 0) <= 1000 + ] + ) + + max_y = max(all_y_values) if all_y_values else 1000 + min_y = -200 # Fixed minimum for failure lanes + ax.set_ylim(min_y, min(max_y + 100, 1000)) + + # Create legend - position above the data + from matplotlib.lines import Line2D + + legend_elements = [ + Line2D( + [0], + [0], + marker="o", + color="w", + markerfacecolor="#2ecc71", + markersize=8, + alpha=0.6, + label=f"{labels[0]} Success", + ), + Line2D( + [0], + [0], + marker="o", + color="w", + markerfacecolor="#e74c3c", + markersize=8, + alpha=0.6, + label=f"{labels[0]} Failure", + ), + Line2D( + [0], + [0], + marker="^", + color="w", + markerfacecolor="#27ae60", + markersize=8, + alpha=0.6, + label=f"{labels[1]} Success", + ), + Line2D( + [0], + [0], + marker="^", + color="w", + markerfacecolor="#c0392b", + markersize=8, + alpha=0.6, + label=f"{labels[1]} Failure", + ), + ] + # Position legend at y=0 on the left side where there's no data + ax.legend( + handles=legend_elements, + loc="center left", + bbox_to_anchor=(0.02, 0.5), + ncol=1, + fontsize=8, + frameon=True, + fancybox=True, + ) + + # Styling + ax.axhline(y=0, color="#34495e", linestyle="-", linewidth=1.5, alpha=0.8) + ax.grid(True, alpha=0.08, linestyle=":", linewidth=0.5) + ax.set_xlabel("Time (seconds)", fontsize=11) + ax.set_ylabel("Fragmentation Index", fontsize=11) + ax.set_title( + "Compaction Events Comparison (○ = A, △ = B)", + fontsize=13, + fontweight="bold", + pad=20, + ) + + +def create_overlaid_extfrag_timeline(ax, data_a, data_b, labels, bin_size=0.5): + """Create overlaid ExtFrag timeline""" + + events_a = [e for e in data_a.get("events", []) if e["event_type"] == "extfrag"] + events_b = [e for e in data_b.get("events", []) if e["event_type"] == "extfrag"] + + # Dataset A - solid lines + steal_a = [e for e in events_a if e.get("is_steal")] + claim_a = [e for e in events_a if not e.get("is_steal")] + + steal_times_a, steal_counts_a = build_counts(steal_a, bin_size) + claim_times_a, claim_counts_a = build_counts(claim_a, bin_size) + + if steal_times_a.size > 0: + ax.plot( + steal_times_a, + steal_counts_a, + linewidth=2, + color="#3498db", + alpha=0.8, + label=f"{labels[0]} Steal", + linestyle="-", + ) + ax.fill_between(steal_times_a, 0, steal_counts_a, alpha=0.15, color="#3498db") + + if claim_times_a.size > 0: + ax.plot( + claim_times_a, + claim_counts_a, + linewidth=2, + color="#e67e22", + alpha=0.8, + label=f"{labels[0]} Claim", + linestyle="-", + ) + ax.fill_between(claim_times_a, 0, claim_counts_a, alpha=0.15, color="#e67e22") + + # Dataset B - dashed lines + steal_b = [e for e in events_b if e.get("is_steal")] + claim_b = [e for e in events_b if not e.get("is_steal")] + + steal_times_b, steal_counts_b = build_counts(steal_b, bin_size) + claim_times_b, claim_counts_b = build_counts(claim_b, bin_size) + + if steal_times_b.size > 0: + ax.plot( + steal_times_b, + steal_counts_b, + linewidth=2, + color="#2980b9", + alpha=0.8, + label=f"{labels[1]} Steal", + linestyle="--", + ) + + if claim_times_b.size > 0: + ax.plot( + claim_times_b, + claim_counts_b, + linewidth=2, + color="#d35400", + alpha=0.8, + label=f"{labels[1]} Claim", + linestyle="--", + ) + + ax.legend(loc="upper right", frameon=True, fontsize=9, ncol=2) + ax.set_xlabel("Time (seconds)", fontsize=11) + ax.set_ylabel(f"Events per {bin_size}s", fontsize=11) + ax.set_title( + "ExtFrag Events Timeline (Solid = A, Dashed = B)", + fontsize=12, + fontweight="semibold", + ) + ax.grid(True, alpha=0.06, linestyle=":", linewidth=0.5) + + +def create_combined_migration_heatmap(ax, data_a, data_b, labels): + """Create combined migration pattern heatmap""" + + events_a = [e for e in data_a.get("events", []) if e["event_type"] == "extfrag"] + events_b = [e for e in data_b.get("events", []) if e["event_type"] == "extfrag"] + + if not events_a and not events_b: + ax.text( + 0.5, + 0.5, + "No external fragmentation events", + ha="center", + va="center", + fontsize=12, + ) + ax.axis("off") + return + + # Combine all events to get unified time range and patterns + all_events = events_a + events_b + times = [e["timestamp"] for e in all_events] + min_time, max_time = min(times), max(times) + + # Create time bins + n_bins = min(25, max(15, int((max_time - min_time) / 10))) + time_bins = np.linspace(min_time, max_time, n_bins + 1) + time_centers = (time_bins[:-1] + time_bins[1:]) / 2 + + # Get all unique patterns from both datasets + all_patterns = set() + for e in all_events: + all_patterns.add(f"{e['migrate_from']}→{e['migrate_to']}") + + # Calculate pattern severities and sort + pattern_severities = {} + for pattern in all_patterns: + from_type, to_type = pattern.split("→") + pattern_severities[pattern] = get_migration_severity(from_type, to_type) + + sorted_patterns = sorted(all_patterns, key=lambda p: (pattern_severities[p], p)) + + # Create separate heatmaps for A and B + heatmap_a = np.zeros((len(sorted_patterns), len(time_centers))) + heatmap_b = np.zeros((len(sorted_patterns), len(time_centers))) + + # Fill heatmap A + for e in events_a: + pattern = f"{e['migrate_from']}→{e['migrate_to']}" + pattern_idx = sorted_patterns.index(pattern) + bin_idx = np.digitize(e["timestamp"], time_bins) - 1 + if 0 <= bin_idx < len(time_centers): + heatmap_a[pattern_idx, bin_idx] += 1 + + # Fill heatmap B + for e in events_b: + pattern = f"{e['migrate_from']}→{e['migrate_to']}" + pattern_idx = sorted_patterns.index(pattern) + bin_idx = np.digitize(e["timestamp"], time_bins) - 1 + if 0 <= bin_idx < len(time_centers): + heatmap_b[pattern_idx, bin_idx] += 1 + + # Combine heatmaps: A in upper half of cell, B in lower half + from matplotlib.colors import LinearSegmentedColormap + + # Plot base grid + for i in range(len(sorted_patterns)): + for j in range(len(time_centers)): + # Draw cell background based on severity + severity = pattern_severities[sorted_patterns[i]] + base_color = get_severity_color(severity) + rect = Rectangle( + (j - 0.5, i - 0.5), + 1, + 1, + facecolor=base_color, + alpha=0.1, + edgecolor="gray", + linewidth=0.5, + ) + ax.add_patch(rect) + + # Add counts for A (upper half) + if heatmap_a[i, j] > 0: + rect_a = Rectangle( + (j - 0.4, i), 0.8, 0.4, facecolor="#3498db", alpha=0.6 + ) + ax.add_patch(rect_a) + ax.text( + j, + i + 0.2, + str(int(heatmap_a[i, j])), + ha="center", + va="center", + fontsize=6, + color="white", + fontweight="bold", + ) + + # Add counts for B (lower half) + if heatmap_b[i, j] > 0: + rect_b = Rectangle( + (j - 0.4, i - 0.4), 0.8, 0.4, facecolor="#e67e22", alpha=0.6 + ) + ax.add_patch(rect_b) + ax.text( + j, + i - 0.2, + str(int(heatmap_b[i, j])), + ha="center", + va="center", + fontsize=6, + color="white", + fontweight="bold", + ) + + # Set axes - extend left margin for severity indicators + ax.set_xlim(-2.5, len(time_centers) - 0.5) + ax.set_ylim(-0.5, len(sorted_patterns) - 0.5) + + # Set x-axis (time) + ax.set_xticks(np.arange(len(time_centers))) + ax.set_xticklabels( + [f"{t:.0f}s" for t in time_centers], rotation=45, ha="right", fontsize=8 + ) + + # Set y-axis with severity indicators + ax.set_yticks(np.arange(len(sorted_patterns))) + y_labels = [] + + for i, pattern in enumerate(sorted_patterns): + severity = pattern_severities[pattern] + + # Add colored severity indicator on the left side (in data coordinates) + severity_color = get_severity_color(severity) + rect = Rectangle( + (-1.8, i - 0.4), + 1.0, + 0.8, + facecolor=severity_color, + alpha=0.8, + edgecolor="black", + linewidth=0.5, + clip_on=False, + ) + ax.add_patch(rect) + + # Add severity symbol + if severity <= -2: + symbol = "!!" + elif severity <= -1: + symbol = "!" + elif severity >= 1: + symbol = "+" + else: + symbol = "=" + + ax.text( + -1.3, + i, + symbol, + ha="center", + va="center", + fontsize=8, + fontweight="bold", + color="white" if abs(severity) > 0 else "black", + ) + + y_labels.append(pattern) + + ax.set_yticklabels(y_labels, fontsize=8) + + # Add legend + legend_elements = [ + mpatches.Patch(color="#3498db", alpha=0.6, label=f"{labels[0]} (upper)"), + mpatches.Patch(color="#e67e22", alpha=0.6, label=f"{labels[1]} (lower)"), + mpatches.Patch(color="#8b0000", alpha=0.8, label="Bad migration"), + mpatches.Patch(color="#51cf66", alpha=0.8, label="Good migration"), + ] + ax.legend( + handles=legend_elements, + loc="upper right", + bbox_to_anchor=(1.15, 1.0), + fontsize=8, + frameon=True, + ) + + # Styling + ax.set_xlabel("Time", fontsize=11) + ax.set_ylabel("Migration Pattern", fontsize=11) + ax.set_title( + "Migration Patterns Comparison (Blue = A, Orange = B)", + fontsize=12, + fontweight="semibold", + ) + ax.grid(False) + + +def create_comparison_statistics_table(ax, data_a, data_b, labels): + """Create comparison statistics table""" + ax.axis("off") + + # Calculate metrics + def calculate_metrics(data): + events = data.get("events", []) + compact = [ + e + for e in events + if e["event_type"] in ["compaction_success", "compaction_failure"] + ] + extfrag = [e for e in events if e["event_type"] == "extfrag"] + + compact_success = sum( + 1 for e in compact if e["event_type"] == "compaction_success" + ) + success_rate = (compact_success / len(compact) * 100) if compact else 0 + + bad = sum( + 1 + for e in extfrag + if get_migration_severity(e["migrate_from"], e["migrate_to"]) < 0 + ) + good = sum( + 1 + for e in extfrag + if get_migration_severity(e["migrate_from"], e["migrate_to"]) > 0 + ) + + steal = sum(1 for e in extfrag if e.get("is_steal")) + claim = len(extfrag) - steal if extfrag else 0 + + return { + "total": len(events), + "compact_success_rate": success_rate, + "extfrag": len(extfrag), + "bad_migrations": bad, + "good_migrations": good, + "steal": steal, + "claim": claim, + } + + metrics_a = calculate_metrics(data_a) + metrics_b = calculate_metrics(data_b) + + # Create table data + headers = ["Metric", labels[0], labels[1], "Better"] + rows = [ + [ + "Total Events", + metrics_a["total"], + metrics_b["total"], + "=" if metrics_a["total"] == metrics_b["total"] else "", + ], + [ + "Compaction Success Rate", + f"{metrics_a['compact_success_rate']:.1f}%", + f"{metrics_b['compact_success_rate']:.1f}%", + ( + labels[0] + if metrics_a["compact_success_rate"] > metrics_b["compact_success_rate"] + else ( + labels[1] + if metrics_b["compact_success_rate"] + > metrics_a["compact_success_rate"] + else "=" + ) + ), + ], + [ + "ExtFrag Events", + metrics_a["extfrag"], + metrics_b["extfrag"], + ( + labels[0] + if metrics_a["extfrag"] < metrics_b["extfrag"] + else labels[1] if metrics_b["extfrag"] < metrics_a["extfrag"] else "=" + ), + ], + [ + "Bad Migrations", + metrics_a["bad_migrations"], + metrics_b["bad_migrations"], + ( + labels[0] + if metrics_a["bad_migrations"] < metrics_b["bad_migrations"] + else ( + labels[1] + if metrics_b["bad_migrations"] < metrics_a["bad_migrations"] + else "=" + ) + ), + ], + [ + "Good Migrations", + metrics_a["good_migrations"], + metrics_b["good_migrations"], + ( + labels[0] + if metrics_a["good_migrations"] > metrics_b["good_migrations"] + else ( + labels[1] + if metrics_b["good_migrations"] > metrics_a["good_migrations"] + else "=" + ) + ), + ], + ["Steal Events", metrics_a["steal"], metrics_b["steal"], ""], + ["Claim Events", metrics_a["claim"], metrics_b["claim"], ""], + ] + + # Create table - position closer to title + table = ax.table( + cellText=rows, + colLabels=headers, + cellLoc="center", + loc="center", + colWidths=[0.35, 0.25, 0.25, 0.15], + bbox=[0.1, 0.15, 0.8, 0.65], + ) # Center table with margins + + table.auto_set_font_size(False) + table.set_fontsize(10) + table.scale(1, 2.2) # Make cells taller for better readability + + # Add padding to cells for better spacing + for key, cell in table.get_celld().items(): + cell.set_height(0.08) # Increase cell height + cell.PAD = 0.05 # Add internal padding + + # Color cells based on which is better + for i in range(1, len(rows) + 1): + row = rows[i - 1] + if row[3] == labels[0]: + table[(i, 1)].set_facecolor("#d4edda") + table[(i, 2)].set_facecolor("#f8d7da") + elif row[3] == labels[1]: + table[(i, 1)].set_facecolor("#f8d7da") + table[(i, 2)].set_facecolor("#d4edda") + + # Position title with more space from previous graph + ax.set_title( + "\n\nStatistical Comparison (Green = Better, Red = Worse)", + fontsize=13, + fontweight="bold", + pad=5, + y=1.0, + ) + + +def create_single_dashboard(data, output_file=None, bin_size=0.5): + """Create single dataset analysis dashboard with severity indicators""" + + # Create figure + fig = plt.figure(figsize=(20, 16), constrained_layout=False) + fig.patch.set_facecolor("#f8f9fa") + + # Create grid layout - 3 rows for single analysis + gs = gridspec.GridSpec(3, 1, height_ratios=[2.5, 2, 3], hspace=0.3, figure=fig) + + # Create subplots + ax_compact = fig.add_subplot(gs[0]) + ax_extfrag = fig.add_subplot(gs[1]) + ax_migration = fig.add_subplot(gs[2]) + + # Process events + events = data.get("events", []) + compact_events = [ + e + for e in events + if e["event_type"] in ["compaction_success", "compaction_failure"] + ] + extfrag_events = [e for e in events if e["event_type"] == "extfrag"] + + success_events = [ + e for e in compact_events if e["event_type"] == "compaction_success" + ] + failure_events = [ + e for e in compact_events if e["event_type"] == "compaction_failure" + ] + + # === COMPACTION GRAPH === + if compact_events: + for e in success_events: + if e.get("fragmentation_index", 0) <= 1000: # Cap at 1000 + ax_compact.scatter( + e["timestamp"], + e.get("fragmentation_index", 0), + s=get_dot_size(e["order"]), + c="#2ecc71", + alpha=0.3, + edgecolors="none", + ) + + for i, e in enumerate(failure_events): + y_pos = -50 - (e["order"] * 10) + ax_compact.scatter( + e["timestamp"], + y_pos, + s=get_dot_size(e["order"]), + c="#e74c3c", + alpha=0.3, + edgecolors="none", + ) + + ax_compact.axhline( + y=0, color="#34495e", linestyle="-", linewidth=1.5, alpha=0.8 + ) + ax_compact.grid(True, alpha=0.08, linestyle=":", linewidth=0.5) + ax_compact.set_ylim(-200, 1000) + + # Add statistics + success_rate = ( + len(success_events) / len(compact_events) * 100 if compact_events else 0 + ) + stats_text = f"Success: {len(success_events)}/{len(compact_events)} ({success_rate:.1f}%)" + ax_compact.text( + 0.02, + 0.98, + stats_text, + transform=ax_compact.transAxes, + fontsize=10, + verticalalignment="top", + bbox=dict(boxstyle="round,pad=0.5", facecolor="white", alpha=0.9), + ) + + ax_compact.set_xlabel("Time (seconds)", fontsize=11) + ax_compact.set_ylabel("Fragmentation Index", fontsize=11) + ax_compact.set_title("Compaction Events Over Time", fontsize=13, fontweight="bold") + + # === EXTFRAG TIMELINE === + if extfrag_events: + steal_events = [e for e in extfrag_events if e.get("is_steal")] + claim_events = [e for e in extfrag_events if not e.get("is_steal")] + + steal_times, steal_counts = build_counts(steal_events, bin_size) + claim_times, claim_counts = build_counts(claim_events, bin_size) + + if steal_times.size > 0: + ax_extfrag.fill_between( + steal_times, 0, steal_counts, alpha=0.3, color="#3498db" + ) + ax_extfrag.plot( + steal_times, + steal_counts, + linewidth=2, + color="#2980b9", + alpha=0.8, + label=f"Steal ({len(steal_events)})", + ) + + if claim_times.size > 0: + ax_extfrag.fill_between( + claim_times, 0, claim_counts, alpha=0.3, color="#e67e22" + ) + ax_extfrag.plot( + claim_times, + claim_counts, + linewidth=2, + color="#d35400", + alpha=0.8, + label=f"Claim ({len(claim_events)})", + ) + + ax_extfrag.legend(loc="upper right", frameon=True, fontsize=9) + + # Add bad/good migration counts + bad_migrations = sum( + 1 + for e in extfrag_events + if get_migration_severity(e["migrate_from"], e["migrate_to"]) < 0 + ) + good_migrations = sum( + 1 + for e in extfrag_events + if get_migration_severity(e["migrate_from"], e["migrate_to"]) > 0 + ) + + migration_text = f"Bad: {bad_migrations} | Good: {good_migrations}" + ax_extfrag.text( + 0.02, + 0.98, + migration_text, + transform=ax_extfrag.transAxes, + fontsize=10, + verticalalignment="top", + bbox=dict(boxstyle="round,pad=0.5", facecolor="white", alpha=0.9), + ) + + ax_extfrag.set_xlabel("Time (seconds)", fontsize=11) + ax_extfrag.set_ylabel(f"Events per {bin_size}s", fontsize=11) + ax_extfrag.set_title( + "External Fragmentation Events Timeline", fontsize=12, fontweight="semibold" + ) + ax_extfrag.grid(True, alpha=0.06, linestyle=":", linewidth=0.5) + + # === MIGRATION HEATMAP WITH SEVERITY === + create_single_migration_heatmap(ax_migration, extfrag_events) + + # Super title + fig.suptitle( + "Memory Fragmentation Analysis", fontsize=18, fontweight="bold", y=0.98 + ) + + # Footer + timestamp_text = f"Generated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}" + fig.text( + 0.98, + 0.01, + timestamp_text, + ha="right", + fontsize=9, + style="italic", + color="#7f8c8d", + ) + + # Adjust layout + plt.subplots_adjust(left=0.08, right=0.95, top=0.94, bottom=0.03) + + # Save + if output_file is None: + timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") + output_file = f"fragmentation_analysis_{timestamp}.png" + + plt.savefig(output_file, dpi=120, bbox_inches="tight", facecolor="#f8f9fa") + plt.close(fig) + return output_file + + +def create_single_migration_heatmap(ax, extfrag_events): + """Create migration heatmap for single dataset with severity indicators""" + if not extfrag_events: + ax.text( + 0.5, + 0.5, + "No external fragmentation events", + ha="center", + va="center", + fontsize=12, + ) + ax.axis("off") + return + + # Get time range and create bins + times = [e["timestamp"] for e in extfrag_events] + min_time, max_time = min(times), max(times) + + n_bins = min(25, max(15, int((max_time - min_time) / 10))) + time_bins = np.linspace(min_time, max_time, n_bins + 1) + time_centers = (time_bins[:-1] + time_bins[1:]) / 2 + + # Get patterns and calculate severities + patterns = {} + pattern_events = defaultdict(list) + + for e in extfrag_events: + pattern = f"{e['migrate_from']}→{e['migrate_to']}" + pattern_events[pattern].append(e) + if pattern not in patterns: + patterns[pattern] = { + "from": e["migrate_from"], + "to": e["migrate_to"], + "total": 0, + "steal": 0, + "claim": 0, + "severity": get_migration_severity(e["migrate_from"], e["migrate_to"]), + } + patterns[pattern]["total"] += 1 + if e.get("is_steal"): + patterns[pattern]["steal"] += 1 + else: + patterns[pattern]["claim"] += 1 + + # Sort by severity then count + sorted_patterns = sorted( + patterns.keys(), key=lambda p: (patterns[p]["severity"], -patterns[p]["total"]) + ) + + # Create heatmap data + heatmap_data = np.zeros((len(sorted_patterns), len(time_centers))) + + for i, pattern in enumerate(sorted_patterns): + for e in pattern_events[pattern]: + bin_idx = np.digitize(e["timestamp"], time_bins) - 1 + if 0 <= bin_idx < len(time_centers): + heatmap_data[i, bin_idx] += 1 + + # Plot heatmap + from matplotlib.colors import LinearSegmentedColormap + + colors = ["#ffffff", "#ffeb3b", "#ff9800", "#f44336"] # White to red + cmap = LinearSegmentedColormap.from_list("intensity", colors, N=256) + + im = ax.imshow(heatmap_data, aspect="auto", cmap=cmap, vmin=0, alpha=0.8) + + # Overlay counts + for i in range(len(sorted_patterns)): + for j in range(len(time_centers)): + if heatmap_data[i, j] > 0: + count = int(heatmap_data[i, j]) + color = ( + "white" if heatmap_data[i, j] > heatmap_data.max() / 2 else "black" + ) + ax.text( + j, + i, + str(count), + ha="center", + va="center", + fontsize=6, + fontweight="bold", + color=color, + ) + + # Set axes + ax.set_xlim(-2.5, len(time_centers) - 0.5) + ax.set_ylim(-0.5, len(sorted_patterns) - 0.5) + + # Set x-axis + ax.set_xticks(np.arange(len(time_centers))) + ax.set_xticklabels( + [f"{t:.0f}s" for t in time_centers], rotation=45, ha="right", fontsize=8 + ) + + # Set y-axis with severity indicators + ax.set_yticks(np.arange(len(sorted_patterns))) + y_labels = [] + + for i, pattern in enumerate(sorted_patterns): + severity = patterns[pattern]["severity"] + severity_color = get_severity_color(severity) + + # Add severity indicator + rect = Rectangle( + (-2.3, i - 0.4), + 1.5, + 0.8, + facecolor=severity_color, + alpha=0.8, + edgecolor="black", + linewidth=0.5, + clip_on=False, + ) + ax.add_patch(rect) + + # Add symbol + if severity <= -2: + symbol = "!!" + elif severity <= -1: + symbol = "!" + elif severity >= 1: + symbol = "+" + else: + symbol = "=" + + ax.text( + -1.55, + i, + symbol, + ha="center", + va="center", + fontsize=8, + fontweight="bold", + color="white" if abs(severity) > 0 else "black", + ) + + # Format label + total = patterns[pattern]["total"] + steal = patterns[pattern]["steal"] + claim = patterns[pattern]["claim"] + label = f"{pattern} ({total}: {steal}s/{claim}c)" + y_labels.append(label) + + ax.set_yticklabels(y_labels, fontsize=8) + + # Add colorbar + cbar = plt.colorbar(im, ax=ax, orientation="vertical", pad=0.02, aspect=30) + cbar.set_label("Event Intensity", fontsize=9) + cbar.ax.tick_params(labelsize=8) + + # Add severity legend + bad_patch = mpatches.Patch(color="#8b0000", label="Bad (→UNMOVABLE)", alpha=0.8) + good_patch = mpatches.Patch(color="#51cf66", label="Good (→MOVABLE)", alpha=0.8) + neutral_patch = mpatches.Patch(color="#ffd43b", label="Neutral", alpha=0.8) + + ax.legend( + handles=[bad_patch, neutral_patch, good_patch], + loc="upper right", + bbox_to_anchor=(1.15, 1.0), + title="Migration Impact", + fontsize=8, + title_fontsize=9, + ) + + # Styling + ax.set_xlabel("Time", fontsize=11) + ax.set_ylabel("Migration Pattern", fontsize=11) + ax.set_title( + "Migration Patterns Timeline with Severity Indicators", + fontsize=12, + fontweight="semibold", + ) + ax.grid(False) + + # Add grid lines + for i in range(len(sorted_patterns) + 1): + ax.axhline(i - 0.5, color="gray", linewidth=0.5, alpha=0.3) + for j in range(len(time_centers) + 1): + ax.axvline(j - 0.5, color="gray", linewidth=0.5, alpha=0.3) + + # Summary + total_events = len(extfrag_events) + bad_events = sum( + patterns[p]["total"] for p in patterns if patterns[p]["severity"] < 0 + ) + good_events = sum( + patterns[p]["total"] for p in patterns if patterns[p]["severity"] > 0 + ) + + summary = f"Total: {total_events} | Bad: {bad_events} | Good: {good_events}" + ax.text( + 0.5, + -0.12, + summary, + transform=ax.transAxes, + ha="center", + fontsize=9, + style="italic", + color="#7f8c8d", + ) + + +def create_comparison_dashboard(data_a, data_b, labels, output_file=None): + """Create comprehensive comparison dashboard""" + + # Create figure + fig = plt.figure(figsize=(20, 18), constrained_layout=False) + fig.patch.set_facecolor("#f8f9fa") + + # Create grid layout - 4 rows, single column with more space for stats + gs = gridspec.GridSpec( + 4, 1, height_ratios=[2.5, 2, 2.5, 1.5], hspace=0.45, figure=fig + ) + + # Create subplots + ax_compact = fig.add_subplot(gs[0]) + ax_extfrag = fig.add_subplot(gs[1]) + ax_migration = fig.add_subplot(gs[2]) + ax_stats = fig.add_subplot(gs[3]) + + # Create visualizations + create_overlaid_compaction_graph(ax_compact, data_a, data_b, labels) + create_overlaid_extfrag_timeline(ax_extfrag, data_a, data_b, labels) + create_combined_migration_heatmap(ax_migration, data_a, data_b, labels) + create_comparison_statistics_table(ax_stats, data_a, data_b, labels) + + # Super title + fig.suptitle( + "Memory Fragmentation A/B Comparison Analysis", + fontsize=18, + fontweight="bold", + y=0.98, + ) + + # Footer + timestamp_text = f"Generated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}" + fig.text( + 0.98, + 0.01, + timestamp_text, + ha="right", + fontsize=9, + style="italic", + color="#7f8c8d", + ) + + # Adjust layout with more bottom margin for stats table + plt.subplots_adjust(left=0.08, right=0.95, top=0.94, bottom=0.05) + + # Save + if output_file is None: + timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") + output_file = f"fragmentation_comparison_{timestamp}.png" + + plt.savefig(output_file, dpi=120, bbox_inches="tight", facecolor="#f8f9fa") + plt.close(fig) + return output_file + + +def main(): + parser = argparse.ArgumentParser( + description="Fragmentation analysis with optional comparison" + ) + parser.add_argument("input_file", help="Primary JSON file") + parser.add_argument( + "--compare", help="Secondary JSON file for A/B comparison (optional)" + ) + parser.add_argument("-o", "--output", help="Output filename") + parser.add_argument( + "--labels", + nargs=2, + default=["Light Load", "Heavy Load"], + help="Labels for the two datasets in comparison mode", + ) + parser.add_argument( + "--bin", type=float, default=0.5, help="Bin size for event counts" + ) + args = parser.parse_args() + + try: + data_a = load_data(args.input_file) + except Exception as e: + print(f"Error loading primary data: {e}") + sys.exit(1) + + if args.compare: + # Comparison mode + try: + data_b = load_data(args.compare) + except Exception as e: + print(f"Error loading comparison data: {e}") + sys.exit(1) + + out = create_comparison_dashboard(data_a, data_b, args.labels, args.output) + print(f"Comparison saved: {out}") + else: + # Single file mode + out = create_single_dashboard(data_a, args.output, args.bin) + print(f"Analysis saved: {out}") + + +if __name__ == "__main__": + main() diff --git a/playbooks/roles/monitoring/tasks/monitor_collect.yml b/playbooks/roles/monitoring/tasks/monitor_collect.yml index 5432fc879bd0..f57a4e9d8106 100644 --- a/playbooks/roles/monitoring/tasks/monitor_collect.yml +++ b/playbooks/roles/monitoring/tasks/monitor_collect.yml @@ -206,3 +206,129 @@ - monitor_developmental_stats|default(false)|bool - monitor_folio_migration|default(false)|bool - folio_migration_data_file.stat.exists|default(false) + +# Plot-fragmentation collection tasks +- name: Check if fragmentation monitoring was started + become: true + become_method: sudo + ansible.builtin.stat: + path: "{{ monitor_fragmentation_output_dir|default('/root/monitoring/fragmentation') }}/fragmentation_tracker.pid" + register: fragmentation_pid_file + when: + - monitor_developmental_stats|default(false)|bool + - monitor_memory_fragmentation|default(false)|bool + +- name: Stop fragmentation monitoring + become: true + become_method: sudo + ansible.builtin.shell: | + output_dir="{{ monitor_fragmentation_output_dir|default('/root/monitoring/fragmentation') }}" + if [ -f "${output_dir}/fragmentation_tracker.pid" ]; then + pid=$(cat "${output_dir}/fragmentation_tracker.pid") + if ps -p $pid > /dev/null 2>&1; then + kill -SIGINT $pid # Use SIGINT to allow graceful shutdown + sleep 2 # Give it time to save data + if ps -p $pid > /dev/null 2>&1; then + kill -SIGTERM $pid # Force kill if still running + fi + echo "Stopped fragmentation monitoring process $pid" + else + echo "Fragmentation monitoring process $pid was not running" + fi + rm -f "${output_dir}/fragmentation_tracker.pid" + fi + + # Save the end time + date +"%Y-%m-%d %H:%M:%S" > "${output_dir}/end_time.txt" + register: stop_fragmentation_monitor + when: + - monitor_developmental_stats|default(false)|bool + - monitor_memory_fragmentation|default(false)|bool + - fragmentation_pid_file.stat.exists|default(false) + +- name: Display stop fragmentation monitoring status + ansible.builtin.debug: + msg: "{{ stop_fragmentation_monitor.stdout }}" + when: + - monitor_developmental_stats|default(false)|bool + - monitor_memory_fragmentation|default(false)|bool + - stop_fragmentation_monitor is defined + - stop_fragmentation_monitor.changed|default(false) + +- name: Generate fragmentation visualization + become: true + become_method: sudo + ansible.builtin.shell: | + cd /opt/fragmentation + output_dir="{{ monitor_fragmentation_output_dir|default('/root/monitoring/fragmentation') }}" + + # Run the visualizer if data exists + if [ -f "${output_dir}/fragmentation_data.json" ] || [ -f "${output_dir}/fragmentation_tracker.log" ]; then + python3 fragmentation_visualizer.py \ + --input "${output_dir}" \ + --output "${output_dir}/fragmentation_plot.png" 2>&1 | tee "${output_dir}/visualizer.log" + echo "Generated fragmentation visualization" + else + echo "No fragmentation data found to visualize" + fi + register: generate_fragmentation_plot + ignore_errors: true + when: + - monitor_developmental_stats|default(false)|bool + - monitor_memory_fragmentation|default(false)|bool + +- name: List fragmentation monitoring output files + become: true + become_method: sudo + ansible.builtin.find: + paths: "{{ monitor_fragmentation_output_dir|default('/root/monitoring/fragmentation') }}" + patterns: "*" + file_type: file + register: fragmentation_output_files + when: + - monitor_developmental_stats|default(false)|bool + - monitor_memory_fragmentation|default(false)|bool + +- name: Create local fragmentation results directory + ansible.builtin.file: + path: "{{ monitoring_results_path }}/fragmentation" + state: directory + delegate_to: localhost + run_once: true + when: + - monitor_developmental_stats|default(false)|bool + - monitor_memory_fragmentation|default(false)|bool + - fragmentation_output_files.files is defined + - fragmentation_output_files.files | length > 0 + +- name: Copy fragmentation monitoring data to localhost + become: true + become_method: sudo + ansible.builtin.fetch: + src: "{{ item.path }}" + dest: "{{ monitoring_results_path }}/fragmentation/{{ ansible_hostname }}_{{ item.path | basename }}" + flat: true + validate_checksum: false + loop: "{{ fragmentation_output_files.files | default([]) }}" + when: + - monitor_developmental_stats|default(false)|bool + - monitor_memory_fragmentation|default(false)|bool + - fragmentation_output_files.files is defined + +- name: Display fragmentation monitoring collection summary + ansible.builtin.debug: + msg: | + Fragmentation monitoring collection complete. + {% if fragmentation_output_files.files is defined and fragmentation_output_files.files | length > 0 %} + Collected {{ fragmentation_output_files.files | length }} files + Data saved to: {{ monitoring_results_path }}/fragmentation/ + Files collected: + {% for file in fragmentation_output_files.files %} + - {{ ansible_hostname }}_{{ file.path | basename }} + {% endfor %} + {% else %} + No fragmentation data was collected. + {% endif %} + when: + - monitor_developmental_stats|default(false)|bool + - monitor_memory_fragmentation|default(false)|bool diff --git a/playbooks/roles/monitoring/tasks/monitor_run.yml b/playbooks/roles/monitoring/tasks/monitor_run.yml index f56d06e4facf..c563b38bc0b5 100644 --- a/playbooks/roles/monitoring/tasks/monitor_run.yml +++ b/playbooks/roles/monitoring/tasks/monitor_run.yml @@ -81,3 +81,126 @@ - monitor_folio_migration|default(false)|bool - folio_migration_stats_file.stat.exists|default(false) - monitor_status is defined + +# Plot-fragmentation monitoring tasks +- name: Install python3-bpfcc for fragmentation monitoring + become: true + become_method: sudo + ansible.builtin.package: + name: python3-bpfcc + state: present + when: + - monitor_developmental_stats|default(false)|bool + - monitor_memory_fragmentation|default(false)|bool + +- name: Install matplotlib for fragmentation visualization + become: true + become_method: sudo + ansible.builtin.pip: + name: matplotlib + state: present + executable: pip3 + when: + - monitor_developmental_stats|default(false)|bool + - monitor_memory_fragmentation|default(false)|bool + +- name: Create fragmentation scripts directory + become: true + become_method: sudo + ansible.builtin.file: + path: /opt/fragmentation + state: directory + mode: "0755" + when: + - monitor_developmental_stats|default(false)|bool + - monitor_memory_fragmentation|default(false)|bool + +- name: Copy fragmentation monitoring scripts to target + become: true + become_method: sudo + ansible.builtin.copy: + src: "{{ item }}" + dest: "/opt/fragmentation/{{ item | basename }}" + mode: "0755" + loop: + - "{{ playbook_dir }}/roles/monitoring/files/fragmentation_tracker.py" + - "{{ playbook_dir }}/roles/monitoring/files/fragmentation_visualizer.py" + when: + - monitor_developmental_stats|default(false)|bool + - monitor_memory_fragmentation|default(false)|bool + +- name: Create fragmentation monitoring output directory + become: true + become_method: sudo + ansible.builtin.file: + path: "{{ monitor_fragmentation_output_dir|default('/root/monitoring/fragmentation') }}" + state: directory + mode: "0755" + when: + - monitor_developmental_stats|default(false)|bool + - monitor_memory_fragmentation|default(false)|bool + +- name: Start fragmentation monitoring in background + become: true + become_method: sudo + ansible.builtin.shell: | + cd /opt/fragmentation + duration="{{ monitor_fragmentation_duration|default(0) }}" + output_dir="{{ monitor_fragmentation_output_dir|default('/root/monitoring/fragmentation') }}" + + # Start the fragmentation tracker + if [ "$duration" -eq "0" ]; then + # Run continuously until killed + nohup python3 fragmentation_tracker.py > "${output_dir}/fragmentation_tracker.log" 2>&1 & + else + # Run for specified duration + nohup timeout ${duration} python3 fragmentation_tracker.py > "${output_dir}/fragmentation_tracker.log" 2>&1 & + fi + echo $! > "${output_dir}/fragmentation_tracker.pid" + + # Also save the start time for reference + date +"%Y-%m-%d %H:%M:%S" > "${output_dir}/start_time.txt" + async: 86400 # Run for up to 24 hours + poll: 0 + register: fragmentation_monitor + when: + - monitor_developmental_stats|default(false)|bool + - monitor_memory_fragmentation|default(false)|bool + +- name: Save fragmentation monitor async job ID + ansible.builtin.set_fact: + fragmentation_monitor_job: "{{ fragmentation_monitor.ansible_job_id }}" + when: + - monitor_developmental_stats|default(false)|bool + - monitor_memory_fragmentation|default(false)|bool + - fragmentation_monitor is defined + +- name: Verify fragmentation monitoring started successfully + become: true + become_method: sudo + ansible.builtin.shell: | + output_dir="{{ monitor_fragmentation_output_dir|default('/root/monitoring/fragmentation') }}" + if [ -f "${output_dir}/fragmentation_tracker.pid" ]; then + pid=$(cat "${output_dir}/fragmentation_tracker.pid") + if ps -p $pid > /dev/null 2>&1; then + echo "Fragmentation monitoring process $pid is running" + else + echo "ERROR: Fragmentation monitoring process $pid is not running" >&2 + exit 1 + fi + else + echo "ERROR: PID file not found" >&2 + exit 1 + fi + register: fragmentation_monitor_status + when: + - monitor_developmental_stats|default(false)|bool + - monitor_memory_fragmentation|default(false)|bool + +- name: Display fragmentation monitoring status + ansible.builtin.debug: + msg: "{{ fragmentation_monitor_status.stdout }}" + when: + - monitor_developmental_stats|default(false)|bool + - monitor_memory_fragmentation|default(false)|bool + - fragmentation_monitor_status is defined -- 2.45.2