* [PATCH v4 29/58] perf futex-contention: Port futex-contention to use python module
2026-04-23 19:43 ` [PATCH v4 28/58] perf syscall-counts-by-pid: Port syscall-counts-by-pid to use python module Ian Rogers
@ 2026-04-23 19:43 ` Ian Rogers
2026-04-23 19:43 ` [PATCH v4 30/58] perf flamegraph: Port flamegraph " Ian Rogers
1 sibling, 0 replies; 3+ messages in thread
From: Ian Rogers @ 2026-04-23 19:43 UTC (permalink / raw)
To: acme, adrian.hunter, james.clark, leo.yan, namhyung, tmricht
Cc: alice.mei.rogers, dapeng1.mi, linux-arm-kernel, linux-kernel,
linux-perf-users, mingo, peterz, Ian Rogers
Rewrite tools/perf/scripts/python/futex-contention.py to use the
python module and various style changes. By avoiding the overheads in
the `perf script` execution the performance improves by more than 3.2x
as shown in the following (with PYTHON_PATH and PERF_EXEC_PATH set as
necessary):
```
$ perf record -e syscalls:sys_*_futex -a sleep 1
...
$ time perf script tools/perf/scripts/python/futex-contention.py
Install the python-audit package to get syscall names.
For example:
# apt-get install python3-audit (Ubuntu)
# yum install python3-audit (Fedora)
etc.
Press control+C to stop and show the summary
aaa/4[2435653] lock 7f76b380c878 contended 1 times, 1099 avg ns [max: 1099 ns, min 1099 ns]
...
real 0m1.007s
user 0m0.935s
sys 0m0.072s
$ time python3 tools/perf/python/futex-contention.py
...
real 0m0.314s
user 0m0.259s
sys 0m0.056s
```
Assisted-by: Gemini:gemini-3.1-pro-preview
Signed-off-by: Ian Rogers <irogers@google.com>
---
v2:
1. Fixed Module Import Failure: Corrected the type annotations from
[int, int] to Tuple[int, int] . The previous code would raise a
TypeError at module import time because lists cannot be used as
types in dictionary annotations.
2. Prevented Out-Of-Memory Crashes: Replaced the approach of storing
every single duration in a list with a LockStats class that
maintains running aggregates (count, total time, min, max). This
ensures O(1) memory usage per lock/thread pair rather than
unbounded memory growth.
3. Support for Custom Input Files: Added a -i / --input command-line
argument to support processing arbitrarily named trace files,
removing the hardcoded "perf.data" restriction.
4. Robust Process Lookup: Added a check to ensure session is
initialized before calling session. process() , preventing
potential NoneType attribute errors if events are processed during
initialization.
---
tools/perf/python/futex-contention.py | 87 +++++++++++++++++++++++++++
1 file changed, 87 insertions(+)
create mode 100755 tools/perf/python/futex-contention.py
diff --git a/tools/perf/python/futex-contention.py b/tools/perf/python/futex-contention.py
new file mode 100755
index 000000000000..7c5c3d0ca60a
--- /dev/null
+++ b/tools/perf/python/futex-contention.py
@@ -0,0 +1,87 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+"""Measures futex contention."""
+
+import argparse
+from collections import defaultdict
+from typing import Dict, Tuple
+import perf
+
+class LockStats:
+ """Aggregate lock contention information."""
+ def __init__(self) -> None:
+ self.count = 0
+ self.total_time = 0
+ self.min_time = 0
+ self.max_time = 0
+
+ def add(self, duration: int) -> None:
+ """Add a new duration measurement."""
+ self.count += 1
+ self.total_time += duration
+ if self.count == 1:
+ self.min_time = duration
+ self.max_time = duration
+ else:
+ self.min_time = min(self.min_time, duration)
+ self.max_time = max(self.max_time, duration)
+
+ def avg(self) -> float:
+ """Return average duration."""
+ return self.total_time / self.count if self.count > 0 else 0.0
+
+process_names: Dict[int, str] = {}
+start_times: Dict[int, Tuple[int, int]] = {}
+session = None
+durations: Dict[Tuple[int, int], LockStats] = defaultdict(LockStats)
+
+FUTEX_WAIT = 0
+FUTEX_WAKE = 1
+FUTEX_PRIVATE_FLAG = 128
+FUTEX_CLOCK_REALTIME = 256
+FUTEX_CMD_MASK = ~(FUTEX_PRIVATE_FLAG | FUTEX_CLOCK_REALTIME)
+
+
+def process_event(sample: perf.sample_event) -> None:
+ """Process a single sample event."""
+ def handle_start(tid: int, uaddr: int, op: int, start_time: int) -> None:
+ if (op & FUTEX_CMD_MASK) != FUTEX_WAIT:
+ return
+ if tid not in process_names:
+ try:
+ if session:
+ process = session.process(tid)
+ if process:
+ process_names[tid] = process.comm()
+ except (TypeError, AttributeError):
+ return
+ start_times[tid] = (uaddr, start_time)
+
+ def handle_end(tid: int, end_time: int) -> None:
+ if tid not in start_times:
+ return
+ (uaddr, start_time) = start_times[tid]
+ del start_times[tid]
+ durations[(tid, uaddr)].add(end_time - start_time)
+
+ event_name = str(sample.evsel)
+ if event_name == "evsel(syscalls:sys_enter_futex)":
+ uaddr = getattr(sample, "uaddr", 0)
+ op = getattr(sample, "op", 0)
+ handle_start(sample.sample_tid, uaddr, op, sample.sample_time)
+ elif event_name == "evsel(syscalls:sys_exit_futex)":
+ handle_end(sample.sample_tid, sample.sample_time)
+
+
+if __name__ == "__main__":
+ ap = argparse.ArgumentParser(description="Measure futex contention")
+ ap.add_argument("-i", "--input", default="perf.data", help="Input file name")
+ args = ap.parse_args()
+
+ session = perf.session(perf.data(args.input), sample=process_event)
+ session.process_events()
+
+ for ((t, u), stats) in sorted(durations.items()):
+ avg_ns = stats.avg()
+ print(f"{process_names.get(t, 'unknown')}[{t}] lock {u:x} contended {stats.count} times, "
+ f"{avg_ns:.0f} avg ns [max: {stats.max_time} ns, min {stats.min_time} ns]")
--
2.54.0.rc2.533.g4f5dca5207-goog
^ permalink raw reply related [flat|nested] 3+ messages in thread* [PATCH v4 30/58] perf flamegraph: Port flamegraph to use python module
2026-04-23 19:43 ` [PATCH v4 28/58] perf syscall-counts-by-pid: Port syscall-counts-by-pid to use python module Ian Rogers
2026-04-23 19:43 ` [PATCH v4 29/58] perf futex-contention: Port futex-contention " Ian Rogers
@ 2026-04-23 19:43 ` Ian Rogers
1 sibling, 0 replies; 3+ messages in thread
From: Ian Rogers @ 2026-04-23 19:43 UTC (permalink / raw)
To: acme, adrian.hunter, james.clark, leo.yan, namhyung, tmricht
Cc: alice.mei.rogers, dapeng1.mi, linux-arm-kernel, linux-kernel,
linux-perf-users, mingo, peterz, Ian Rogers
Add a port of the flamegraph script that uses the perf python module
directly. This approach is significantly faster than using perf script
callbacks as it avoids creating intermediate dictionaries for all
event fields.
Assisted-by: Gemini:gemini-3.1-pro-preview
Signed-off-by: Ian Rogers <irogers@google.com>
---
v2:
1. Performance Optimization: Changed Node.children from a list to a
dictionary, reducing the lookup time in find_or_create_node from
O(N) to O(1) and avoiding performance bottlenecks on wide call
graphs.
2. Callchain Fallback: Added a fallback to use the sample's top-level
symbol or instruction pointer if no callchain is present, ensuring
the script still generates meaningful output rather than just
process names.
3. Template Downloading Fix: Corrected the logic handling the
--allow-download flag and custom HTTP URLs. It no longer warns
about missing local files when a URL is provided, and won't
silently overwrite custom URLs with the default one.
4. Output Stream Separation: Moved informational warnings to
sys.stderr to prevent them from corrupting the resulting HTML/JSON
file when the user streams the output to stdout (e.g., using -o -
).
5. XSS Protection: Added basic HTML entity escaping for < , > , and &
within the embedded JSON data blocks. This mitigates the risk of
cross-site scripting if trace data contains maliciously formed
process or symbol names.
---
tools/perf/python/flamegraph.py | 250 ++++++++++++++++++++++++++++++++
1 file changed, 250 insertions(+)
create mode 100755 tools/perf/python/flamegraph.py
diff --git a/tools/perf/python/flamegraph.py b/tools/perf/python/flamegraph.py
new file mode 100755
index 000000000000..f3f69e5a88c2
--- /dev/null
+++ b/tools/perf/python/flamegraph.py
@@ -0,0 +1,250 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+"""
+flamegraph.py - create flame graphs from perf samples using perf python module
+"""
+
+import argparse
+import hashlib
+import json
+import os
+import subprocess
+import sys
+import urllib.request
+from typing import Dict, Optional, Union
+import perf
+
+MINIMAL_HTML = """<head>
+ <link rel="stylesheet" type="text/css" href="https://cdn.jsdelivr.net/npm/d3-flame-graph@4.1.3/dist/d3-flamegraph.css">
+</head>
+<body>
+ <div id="chart"></div>
+ <script type="text/javascript" src="https://d3js.org/d3.v7.js"></script>
+ <script type="text/javascript" src="https://cdn.jsdelivr.net/npm/d3-flame-graph@4.1.3/dist/d3-flamegraph.min.js"></script>
+ <script type="text/javascript">
+ const stacks = [/** @flamegraph_json **/];
+ // Note, options is unused.
+ const options = [/** @options_json **/];
+
+ var chart = flamegraph();
+ d3.select("#chart")
+ .datum(stacks[0])
+ .call(chart);
+ </script>
+</body>
+"""
+
+class Node:
+ """A node in the flame graph tree."""
+ def __init__(self, name: str, libtype: str):
+ self.name = name
+ self.libtype = libtype
+ self.value: int = 0
+ self.children: dict[str, Node] = {}
+
+ def to_json(self) -> Dict[str, Union[str, int, list[Dict]]]:
+ """Convert the node to a JSON-serializable dictionary."""
+ return {
+ "n": self.name,
+ "l": self.libtype,
+ "v": self.value,
+ "c": [x.to_json() for x in self.children.values()]
+ }
+
+
+class FlameGraphCLI:
+ """Command-line interface for generating flame graphs."""
+ def __init__(self, args):
+ self.args = args
+ self.stack = Node("all", "root")
+ self.session = None
+
+ @staticmethod
+ def get_libtype_from_dso(dso: Optional[str]) -> str:
+ """Determine the library type from the DSO name."""
+ if dso and (dso == "[kernel.kallsyms]" or dso.endswith("/vmlinux") or dso == "[kernel]"):
+ return "kernel"
+ return ""
+
+ @staticmethod
+ def find_or_create_node(node: Node, name: str, libtype: str) -> Node:
+ """Find a child node with the given name or create a new one."""
+ if name in node.children:
+ return node.children[name]
+ child = Node(name, libtype)
+ node.children[name] = child
+ return child
+
+ def process_event(self, sample) -> None:
+ """Process a single perf sample event."""
+ if self.args.event_name and str(sample.evsel) != self.args.event_name:
+ return
+
+ pid = sample.sample_pid
+ dso_type = ""
+ try:
+ thread = self.session.process(sample.sample_tid)
+ comm = thread.comm()
+ except Exception:
+ comm = "[unknown]"
+
+ if pid == 0:
+ comm = "swapper"
+ dso_type = "kernel"
+ else:
+ comm = f"{comm} ({pid})"
+
+ node = self.find_or_create_node(self.stack, comm, dso_type)
+
+ callchain = sample.callchain
+ if callchain:
+ # We want to traverse from root to leaf.
+ # perf callchain iterator gives leaf to root.
+ # We collect them and reverse.
+ frames = list(callchain)
+ for entry in reversed(frames):
+ name = entry.symbol or "[unknown]"
+ libtype = self.get_libtype_from_dso(entry.dso)
+ node = self.find_or_create_node(node, name, libtype)
+ else:
+ # Fallback if no callchain
+ name = getattr(sample, "symbol", "[unknown]")
+ libtype = self.get_libtype_from_dso(getattr(sample, "dso", "[unknown]"))
+ node = self.find_or_create_node(node, name, libtype)
+
+ node.value += 1
+
+ def get_report_header(self) -> str:
+ """Get the header from the perf report."""
+ try:
+ input_file = self.args.input or "perf.data"
+ output = subprocess.check_output(["perf", "report", "--header-only", "-i", input_file])
+ result = output.decode("utf-8")
+ if self.args.event_name:
+ result += "\nFocused event: " + self.args.event_name
+ return result
+ except Exception:
+ return ""
+
+ def run(self) -> None:
+ """Run the flame graph generation."""
+ input_file = self.args.input or "perf.data"
+ if not os.path.exists(input_file):
+ print(f"Error: {input_file} not found. (try 'perf record' first)", file=sys.stderr)
+ sys.exit(1)
+
+ try:
+ self.session = perf.session(perf.data(input_file),
+ sample=self.process_event)
+ except Exception as e:
+ print(f"Error opening session: {e}", file=sys.stderr)
+ sys.exit(1)
+
+ self.session.process_events()
+
+ stacks_json = json.dumps(self.stack, default=lambda x: x.to_json())
+ # Escape HTML special characters to prevent XSS
+ stacks_json = stacks_json.replace("<", "\\u003c") \
+ .replace(">", "\\u003e").replace("&", "\\u0026")
+
+ if self.args.format == "html":
+ report_header = self.get_report_header()
+ options = {
+ "colorscheme": self.args.colorscheme,
+ "context": report_header
+ }
+ options_json = json.dumps(options)
+ options_json = options_json.replace("<", "\\u003c") \
+ .replace(">", "\\u003e").replace("&", "\\u0026")
+
+ template = self.args.template
+ template_md5sum = None
+ output_str = None
+
+ if not os.path.isfile(template):
+ if template.startswith("http://") or template.startswith("https://"):
+ if not self.args.allow_download:
+ print("Warning: Downloading templates is disabled. "
+ "Use --allow-download.", file=sys.stderr)
+ template = None
+ else:
+ print(f"Warning: Template file '{template}' not found.", file=sys.stderr)
+ if self.args.allow_download:
+ print("Using default CDN template.", file=sys.stderr)
+ template = (
+ "https://cdn.jsdelivr.net/npm/d3-flame-graph@4.1.3/dist/templates/"
+ "d3-flamegraph-base.html"
+ )
+ template_md5sum = "143e0d06ba69b8370b9848dcd6ae3f36"
+ else:
+ template = None
+
+ use_minimal = False
+ try:
+ if not template:
+ use_minimal = True
+ elif template.startswith("http"):
+ with urllib.request.urlopen(template) as url_template:
+ output_str = "".join([l.decode("utf-8") for l in url_template.readlines()])
+ else:
+ with open(template, "r", encoding="utf-8") as f:
+ output_str = f.read()
+ except Exception as err:
+ print(f"Error reading template {template}: {err}\n", file=sys.stderr)
+ use_minimal = True
+
+ if use_minimal:
+ print("Using internal minimal HTML that refers to d3's web site. JavaScript " +
+ "loaded this way from a local file may be blocked unless your " +
+ "browser has relaxed permissions. Run with '--allow-download' to fetch" +
+ "the full D3 HTML template.", file=sys.stderr)
+ output_str = MINIMAL_HTML
+
+ elif template_md5sum:
+ assert output_str is not None
+ download_md5sum = hashlib.md5(output_str.encode("utf-8")).hexdigest()
+ if download_md5sum != template_md5sum:
+ s = None
+ while s not in ["y", "n"]:
+ s = input(f"""Unexpected template md5sum.
+{download_md5sum} != {template_md5sum}, for:
+{output_str}
+continue?[yn] """).lower()
+ if s == "n":
+ sys.exit(1)
+
+ assert output_str is not None
+ output_str = output_str.replace("/** @options_json **/", options_json)
+ output_str = output_str.replace("/** @flamegraph_json **/", stacks_json)
+ output_fn = self.args.output or "flamegraph.html"
+ else:
+ output_str = stacks_json
+ output_fn = self.args.output or "stacks.json"
+
+ if output_fn == "-":
+ sys.stdout.write(output_str)
+ else:
+ print(f"dumping data to {output_fn}")
+ with open(output_fn, "w", encoding="utf-8") as out:
+ out.write(output_str)
+
+
+if __name__ == "__main__":
+ parser = argparse.ArgumentParser(description="Create flame graphs using perf python module.")
+ parser.add_argument("-f", "--format", default="html", choices=["json", "html"],
+ help="output file format")
+ parser.add_argument("-o", "--output", help="output file name")
+ parser.add_argument("--template",
+ default="/usr/share/d3-flame-graph/d3-flamegraph-base.html",
+ help="path to flame graph HTML template")
+ parser.add_argument("--colorscheme", default="blue-green",
+ help="flame graph color scheme", choices=["blue-green", "orange"])
+ parser.add_argument("-i", "--input", help="input perf.data file")
+ parser.add_argument("--allow-download", default=False, action="store_true",
+ help="allow unprompted downloading of HTML template")
+ parser.add_argument("-e", "--event", default="", dest="event_name", type=str,
+ help="specify the event to generate flamegraph for")
+
+ cli_args = parser.parse_args()
+ cli = FlameGraphCLI(cli_args)
+ cli.run()
--
2.54.0.rc2.533.g4f5dca5207-goog
^ permalink raw reply related [flat|nested] 3+ messages in thread