From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 884654279F6; Tue, 30 Jun 2026 14:23:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782829406; cv=none; b=eH5E8uYMy/D5uMn4EZiFcTzdsOJ5UeVFfTqvrCDM/UYC/lJ/0T/Dpc5iwT/0QiKtjGyf6GExG+XEfg3/UouU6gKqepMs4O9yp+iVULkn0mJiKIQ8JlOs0AE9/e39j8i02+0UAvR0Kisy/cjO6rbrKN2kd5AOl+KfVM7RyxQ/tzg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782829406; c=relaxed/simple; bh=4Mdk6PeJVz2qKvGO8sh40YuDeFqEaJZUp90ErRdeae4=; h=From:Subject:To:Cc:In-Reply-To:References:Content-Type:Date: Message-Id; b=YknTcq6S+P5/aP7NkRN76R/Bj6Bj8ZEGXp35h45XEypDi+cru5evaWhODPD3BUj64vrc/XlG4ZrY7gHsvDovw5dTIr6mKwN81hmbNQxM9+kvnWd6Ryy7vH+PgakwcTMip8J+cRAKxUkY8ug25ZxFuN1zN3nU1r0f//WUBV5YsgQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=AVINqA4+; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="AVINqA4+" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0FDC11F000E9; Tue, 30 Jun 2026 14:23:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1782829405; bh=fUSz7eEHYpvlOzsm+PvIOL5UMpwSX4asWnIG8SXRQ9w=; h=From:Subject:Reply-To:To:Cc:In-Reply-To:References:Date; b=AVINqA4+0mKW2mbjg4vu8fM695FCCJx6R/nwzY2PZ9fiJJhPzAVXWL6lf61XNUJX/ YJgPmk14Czg6d1p3zQ6jRrrqJiVjg6hCw20WvCNSVB2/3Dg9hyUFMK01SFn0i6YZRA ke5H+Ol0omhT/06YZVLMeNP6fYuOXzNwNNlxEV118f0F4Tr2ws4GZeVIEezcbOhCz8 DTr5q5lTordecxRlTR2Dbk80KcMsX0bQKQlehHmdhtqGOoOAMc3So3Md5K+aaIi383 9ps14BNr62/RAx+azFG7eu+B8stugu9/fNb0XdtiwTXekuzayIMYXk3s6e8QJcq7CS q5oaNrL6Oi12Q== From: sashiko-bot@kernel.org Subject: Re: [PATCH 1/1] tools/bpftrace: Add a simple instrumented kernel profiler Reply-To: sashiko-reviews@lists.linux.dev To: "Aaron Lewis" Cc: bpf@vger.kernel.org, kvm@vger.kernel.org In-Reply-To: <20260630141152.3757722-2-aaronlewis@google.com> References: <20260630141152.3757722-1-aaronlewis@google.com> <20260630141152.3757722-2-aaronlewis@google.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Date: Tue, 30 Jun 2026 14:23:24 +0000 Message-Id: <20260630142325.0FDC11F000E9@smtp.kernel.org> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Thank you for your contribution! Sashiko AI review found 5 potential issue(= s) to consider: - [Medium] Incomplete timers (DURATION_UNSET) are blindly aggregated, corru= pting profile metrics. - [Medium] The `kvm_mmu_page_fault` probe incorrectly names its timer `kvm_= vcpu_block`. - [Medium] Two-pass parsing architecture breaks when reading from standard = input, causing an AttributeError. - [Medium] Fragile parsing logic crashes the script if any BPF trace map ev= ents are dropped. - [Low] File validation checks evaluate against closed file descriptors, re= ndering them dead code. -- commit 991cf3dc11faba4afe46fa2281e4850e5407d45b Author: Aaron Lewis tools/bpftrace: Add a simple instrumented kernel profiler This commit introduces a simple instrumented profiler for the kernel using BpfTrace. It allows arbitrary tracing of kernel functions and tracepoints, and includes a Python script to aggregate and display the results in a hierarchical view with timing data. > diff --git a/tools/bpftrace/instrumented_profiler/aggregate_bpf_results.p= y b/tools/bpftrace/instrumented_profiler/aggregate_bpf_results.py > new file mode 100644 > index 0000000000000..f4ffe8cff4418 > --- /dev/null > +++ b/tools/bpftrace/instrumented_profiler/aggregate_bpf_results.py [ ... ] > +def update_child(r: BpfResult, timer: BpfTimer, tid: int) -> int: > + i =3D r.child_index.get(timer.name) > + if i is None: > + i =3D len(r.children) > + r.children.append(BpfResult( > + timer.name, > + timer.start_time_us, > + timer.start_time_us + timer.duration_us, > + timer.duration_us, > + timer.duration_us, > + timer.duration_us, > + 1, > + 1, > + timer.depth, > + tid)) > + r.child_index[timer.name] =3D i > + else: > + r.children[i].wall_time_start_us =3D min(r.children[i].wall_time_sta= rt_us, timer.start_time_us) > + r.children[i].wall_time_end_us =3D max(r.children[i].wall_time_end_u= s, timer.start_time_us + timer.duration_us) > + r.children[i].thread_us +=3D timer.duration_us > + r.children[i].max_us =3D max(r.children[i].max_us, timer.duration_us) [Severity: Medium] When a traced function does not complete before tracing ends (for example, = if interrupted via Ctrl-C), the duration remains at its initialization value of DURATION_UNSET (-1 cast to uint64). Are we blindly aggregating incomplete timers here? Adding this massive sentinel value directly to thread_us and max_us will severely corrupt the aggregated timing metrics. [ ... ] > +def merge_recursive(r: BpfResult, tid: int, timers: List[BpfTimer], inde= x: int, child_index: int) -> int: > + while index < len(timers): > + while index < len(timers) and timers[index] =3D=3D None: > + index +=3D 1 > + > + if index >=3D len(timers): > + return index > + > + # The timer is a child of 'r'. (Child case) > + if timers[index].depth =3D=3D (r.depth + 1): > + # Add child > + child_index =3D update_child(r, timers[index], tid) > + > + # Now that the timer has been handled increment to the next one. > + index +=3D 1 > + > + # The timer is a child of 'r'. (Grandchild case) > + elif timers[index].depth =3D=3D (r.depth + 2): > + if child_index < 0: > + raise ValueError("Invalid 'child_index'") > + # Recurse down the child becuase that is the parent this timer. > + index =3D merge_recursive(r.children[child_index], tid, timers, in= dex, child_index) > + > + # The timer is a descendant of 'r', but not the child... Invalid cas= e! > + elif timers[index].depth > (r.depth + 2): > + raise ValueError("Invalid depth. descendant can only be children.= name =3D '%s', tid =3D %d, index =3D %d" % (timers[index].name, tid, index= )) [Severity: Medium] Will this script crash with a ValueError or IndexError if bpftrace drops any map events due to ring buffer overflow or map eviction under heavy load? The parsing logic appears to assume a flawless 1:1 correlation across all dumped maps. If events are dropped, it might create a gap in the call depth triggering this strict ValueError, or result in an IndexError during array assignment in generate_from_log() when index expectations are broken. [ ... ] > +def generate_from_log(files: Sequence[str]): > + """Read the trace and return a list of ProfilerResult objects.""" > + > + timer_counts =3D {} > + all_timers =3D {} > + > + count_regex =3D re.compile( > + r"^@timer_count\[(?P[0-9]+)\]:\W(?P[-0-9]+)" > + ) > + name_regex =3D re.compile( > + r"^@timer_name\[(?P[0-9]+),\W*(?P[0-9]+)\]:\W(?P= [\w\.0-9]+)" > + ) > + start_time_regex =3D re.compile( > + r"^@timer_start\[(?P[0-9]+),\W*(?P[0-9]+)\]:\W(?P[-0-9]+)" > + ) > + duration_regex =3D re.compile( > + r"^@timer_duration\[(?P[0-9]+),\W*(?P[0-9]+)\]:\W(?P[-0-9]+)" > + ) > + depth_regex =3D re.compile( > + r"^@timer_depth\[(?P[0-9]+),\W*(?P[0-9]+)\]:\W(?P[-0-9]+)" > + ) > + > + attaching_regex =3D re.compile( > + r"^Attaching.*" > + ) > + > + # Get all counts. That way we know how many timers each thread used. > + for line in fileinput.input(files=3Dfiles): [Severity: Medium] Could this two-pass parsing architecture break when reading trace data dire= ctly from standard input? If a user pipes bpftrace output into this script, this first loop will cons= ume the entire stream until EOF. The second fileinput.input() loop below will then read nothing, leaving the timer arrays filled with None values. This leads to an AttributeError in merge_timers() when attempting to read timers[0].depth. > + line =3D line.rstrip() > + > + m =3D re.fullmatch(duration_regex, line) > + if m: > + tid =3D int(m.group("tid")) > + index =3D int(m.group("index")) > + duration_str =3D m.group("duration_us") > + duration =3D int(m.group("duration_us")) > + > + if not tid in timer_counts: > + timer_counts[tid] =3D index > + else: > + timer_counts[tid] =3D max(timer_counts[tid], index) > + > + # Allocate enough timers to fit the trace. > + for tid, count in timer_counts.items(): > + all_timers[tid] =3D [None] * (count + 1) > + > + # Populate the trace. > + for line in fileinput.input(files=3Dfiles): [ ... ] > + root =3D BpfResult("", 0, 0, 0, 0, 0, 1, 1, -1, -1) > + results =3D [] > + for tid, timers in all_timers.items(): > + merge_timers(root, tid, timers) > + print_results(root, results) > + > + if fileinput.fileno() =3D=3D 0: > + raise ValueError("Expected a file to read.") [Severity: Low] Does this file validation check evaluate against closed file descriptors? The fileinput loops have already exhausted and closed the files by this poi= nt. Calling fileinput.fileno() returns the active UNIX file descriptor, which is -1 when closed, not the count of parsed files. Both this check and the one below will always evaluate to false, rendering this block dead code. > + > + if fileinput.fileno() > 1: > + raise ValueError("Too many files read. Expected 1") > + > +def main() -> None: [ ... ] > diff --git a/tools/bpftrace/instrumented_profiler/samples/kvm_trace.bt b/= tools/bpftrace/instrumented_profiler/samples/kvm_trace.bt > new file mode 100644 > index 0000000000000..bd341e408570e > --- /dev/null > +++ b/tools/bpftrace/instrumented_profiler/samples/kvm_trace.bt [ ... ] > +/* > + * Time 'kvm_mmu_page_fault()'. > + * > + * Check: sudo grep kvm_mmu_page_fault /proc/kallsyms > + */ > +kfunc:kvm_mmu_page_fault > +{ > + START_TIMER("kvm_vcpu_block", > + @timer_name, @trace_stack, @timer_start, @timer_duration, > + @timer_depth, @timer_count, @trace_stack_top); > +} [Severity: Medium] Is there a copy-paste error here where the timer for kvm_mmu_page_fault is incorrectly named kvm_vcpu_block? This will falsely attribute page fault durations to kvm_vcpu_block, distort= ing the captured metrics for the profile. > +kretfunc:kvm_mmu_page_fault > +{ > + if (@trace_stack_top[tid] > 0) { > + STOP_TIMER(@trace_stack_top, @trace_stack, @timer_duration, @timer_sta= rt); > + } > +} --=20 Sashiko AI review =C2=B7 https://sashiko.dev/#/patchset/20260630141152.3757= 722-2-aaronlewis@google.com?part=3D1