From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 959F0314D26 for ; Sat, 25 Apr 2026 23:13:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777158804; cv=none; b=Uul3UNOA3uIY8tmOHDC6ao3n1NNTFcgt1dAD0Bi9qJwBN3VNz0tgfLCrokVlo5J4BqHCsjeCCQAyjy44NHBt6P5BvlySLGRJdadfubWOoSockjAvC/sdC2EOxnoyZeX13qkpKJ+KLHtyoi/qtkNk7B0z8dA70+7nloA4H1FMeF4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777158804; c=relaxed/simple; bh=x6HjfdQNtnHjl1onvRUhG4u59qkwrkDgKYrnafecGBw=; h=From:Subject:To:Cc:In-Reply-To:References:Content-Type:Date: Message-Id; b=otTl+DmzblkRfq7dQoBGIvtDOtHw8KRPYBMtOST+sknkkLVef28zJPGHnQ4v8uyz1eIXt1uNVLbANRj1yVZWCbhKgraLMwgwTkoAv1hNdA3hRCni8sS4EUadKVp6p3sj7yNuvfLJvti2s7NwYYtK2VS+fSvl+1SamWZSlfURwZI= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=UvEOImzc; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="UvEOImzc" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6443BC2BCB0; Sat, 25 Apr 2026 23:13:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777158804; bh=x6HjfdQNtnHjl1onvRUhG4u59qkwrkDgKYrnafecGBw=; h=From:Subject:Reply-To:To:Cc:In-Reply-To:References:Date:From; b=UvEOImzcoN+X/HkogWZzL1M9oOfoKMkbu8+JYRJEzAMW+PydyljrGne3suy76+Aiq YgfcXVyCr4RrjQn2drVHzOr3uiRSsfz5NOsk8V6bLHtvr9NSzBMJDu5Tqz4S4dw+v8 8lY71nWPhR6PhYrehyCg1OJiQ5bNc481Uwl0vBWr4l+HUXNcx/N1M9Yl7fpg3b6Llc Y0haC95O/xCGvoir5+Ub4dA/Pf29E5edZizcrItMsVGm6u80cQBefWAc+hlE1ighbO SC9xA4WqGY3oDas0xUao/M8VP7a0GuAFKyvppsydRXc/BchUm7uuveDOD2SHPL8D8C 98GuhfWRbDlHg== From: sashiko-bot@kernel.org Subject: Re: [PATCH v7 37/59] perf export-to-postgresql: Port export-to-postgresql to use python module Reply-To: sashiko@lists.linux.dev To: "Ian Rogers" Cc: linux-perf-users@vger.kernel.org In-Reply-To: <20260425224951.174663-38-irogers@google.com> References: <20260425224951.174663-38-irogers@google.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Date: Sat, 25 Apr 2026 23:13:23 +0000 Message-Id: <20260425231324.6443BC2BCB0@smtp.kernel.org> Precedence: bulk X-Mailing-List: linux-perf-users@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Sashiko AI review found 9 potential issue(s): - [High] The script silently exports 0 for many critical sample fields due = to incorrect attribute names and ignoring branch stacks. - [High] The call path tree constructed for PostgreSQL is structurally inve= rted, placing leaf functions at the root. - [High] The script silently ignores the final result status of the Postgre= SQL COPY FROM STDIN command, leading to silent data loss. - [High] Using the 'q' (signed 64-bit integer) format character in struct.p= ack for 64-bit memory addresses crashes the exporter on kernel samples. - [High] The script completely drops the logic for exporting synthetic Inte= l PT events and context switches, despite advertising Intel PT support. - [Medium] String values are packed dynamically without length validation, = leading to brittle behavior and database import failures. - [Medium] The ported script drops all database foreign key constraints tha= t were present in the original implementation. - [Medium] The `comm_threads_view` view is documented in the file header bu= t is never created. - [Low] Exceptions in setup_db() bypass cleanup logic, leaving temporary di= rectories on disk. -- commit 6da42dc06630d58ef6cdd8bf94f7bec7384e8f74 Author: Ian Rogers perf export-to-postgresql: Port export-to-postgresql to use python module =20 Ported from tools/perf/scripts/python/export-to-postgresql.py to use the perf Python module API. > diff --git a/tools/perf/python/export-to-postgresql.py b/tools/perf/pytho= n/export-to-postgresql.py > new file mode 100755 > index 000000000000..0118dc348b1e > --- /dev/null > +++ b/tools/perf/python/export-to-postgresql.py > @@ -0,0 +1,701 @@ > +#!/usr/bin/env python3 > +# SPDX-License-Identifier: GPL-2.0 > +r""" > +Export perf data to a postgresql database. [ ... ] > +An example of using this script with Intel PT: > + > + $ perf record -e intel_pt//u ls > + $ python tools/perf/python/export-to-postgresql.py -i perf.data -o pt_e= xample Does this script still support exporting synthetic Intel PT events and cont= ext switches? The docstring advertises an example with Intel PT, but the logic and table schemas for ptwrite, cbr, mwait, pwre, exstop, pwrx, and context_switches appear to be missing. [ ... ] > +Views: > + > + Most of the tables have views for more friendly display. The views are: > + > + comm_threads_view This isn't a bug, but comm_threads_view is listed in the views documentation here, but doesn't seem to be created in finalize(). [ ... ] > + def copy_output_file(self, path_name: str, table_name: str): > + """Copy intermediate file to database.""" > + sql =3D f"COPY {table_name} FROM STDIN (FORMAT 'binary')" > + res =3D PQexec(self.conn, toserverstr(sql)) [ ... ] > + res =3D PQgetResult(self.conn) > + while res: > + PQclear(res) > + res =3D PQgetResult(self.conn) Does this loop silently ignore the final result status of the PostgreSQL CO= PY command? If PostgreSQL rejects the data due to a constraint violation, PQgetResult() yields PGRES_FATAL_ERROR, but this loop clears the result without inspecting PQresultStatus(), potentially masking silent data loss. [ ... ] > + def write_comm(self, comm_id: int, comm_str: str, thread_id: int, > + time: int, exec_flag: int) -> None: > + """Write comm to binary file.""" > + comm_bytes =3D toserverstr(comm_str) > + n =3D len(comm_bytes) > + fmt =3D "!hiqi" + str(n) + "s" + "iqiqiB" > + value =3D struct.pack(fmt, 5, 8, comm_id, n, comm_bytes, 8, > + thread_id, 8, time, 1, exec_flag) > + self.files['comm'].write(value) Are the dynamically packed strings validated against the database schema limits? The comms table defines comm as varchar(16), but unbounded strings are pack= ed here. If a string exceeds the limit, will PostgreSQL abort the COPY FROM ST= DIN command? [ ... ] > + def write_symbol(self, symbol_id: int, dso_id: int, sym_start: int, > + sym_end: int, binding: int, symbol_name: str) -> N= one: > + """Write symbol to binary file.""" > + name_bytes =3D toserverstr(symbol_name) > + n =3D len(name_bytes) > + fmt =3D "!hiqiqiqiqiii" + str(n) + "s" > + value =3D struct.pack(fmt, 6, 8, symbol_id, 8, dso_id, 8, > + sym_start, 8, sym_end, 4, binding, n, name_b= ytes) > + self.files['symbol'].write(value) Will using the 'q' (signed 64-bit integer) format character in struct.pack crash the exporter when processing kernel-space instruction pointers? Since kernel addresses can exceed the maximum positive value of a signed 64-bit integer, should the 'Q' (unsigned 64-bit) specifier be used instead? [ ... ] > + def write_sample(self, sample_id: int, evsel_id: int, thread_id: int, > + comm_id: int, dso_id: int, symbol_id: int, > + sample: perf.sample_event, call_path_id: int) -> N= one: > + """Write sample to binary file.""" > + value =3D struct.pack( > + "!hiqiqiqiqiqiqiqiqiqiqiiiqiqiqiqiqiqiqiqiiiBiqiqiqii", > + 25, 8, sample_id, 8, evsel_id, 8, 0, 8, thread_id, 8, comm_i= d, > + 8, dso_id, 8, symbol_id, 8, getattr(sample, 'sym_offset', 0), > + 8, sample.sample_ip, 8, sample.sample_time, 4, sample.sample= _cpu, > + 8, 0, 8, 0, 8, 0, 8, 0, > + 8, getattr(sample, 'sample_period', 0) or 0, > + 8, getattr(sample, 'sample_weight', 0) or 0, > + 8, getattr(sample, 'transaction_', 0) or 0, > + 8, getattr(sample, 'data_src', 0) or 0, > + 4, 0, > + 1, getattr(sample, 'in_tx', 0) or 0, > + 8, call_path_id, > + 8, getattr(sample, 'insn_count', 0) or 0, > + 8, getattr(sample, 'cyc_count', 0) or 0, > + 4, getattr(sample, 'flags', 0) or 0 > + ) > + self.files['sample'].write(value) Are the attribute names for the sample fields correct, and are branch targe= ts being handled? The fields data_src, insn_count, and cyc_count differ from the actual sample_data_src, sample_insn_count, and sample_cyc_count names. Additionall= y, the branch target fields (to_ip, to_dso_id, to_symbol_id, to_sym_offset) are hardcoded to 0 instead of being extracted from sample.brstack. Will this result in missing sample fields being silently exported as zeroes? [ ... ] > + def process_event(self, sample: perf.sample_event) -> None: > + """Callback for processing events.""" [ ... ] > + call_path_id =3D 0 > + if hasattr(sample, 'callchain') and sample.callchain: > + parent_id =3D 0 > + for node in sample.callchain: > + node_dso =3D getattr(node, 'dso', None) or getattr(node,= 'map', None) > + node_symbol =3D getattr(node, 'symbol', None) or getattr= (node, 'sym', None) > + > + dso_name =3D "Unknown_dso" > + if node_dso: > + dso_name =3D getattr(node_dso, 'name', "Unknown_dso"= ) or "Unknown_dso" > + > + symbol_name =3D "Unknown_symbol" > + if node_symbol: > + symbol_name =3D getattr(node_symbol, 'name', "Unknow= n_symbol") or "Unknown_symbol" > + > + node_dso_id =3D self.get_dso_id(dso_name, dso_name, "") > + node_symbol_id =3D self.get_symbol_id(node_dso_id, symbo= l_name, 0, 0) > + > + parent_id =3D self.get_call_path_id(parent_id, node_symb= ol_id, node.ip) > + call_path_id =3D parent_id Is the call path tree constructed upside-down here? By iterating sequentially over sample.callchain (from index 0 upwards) and using the newly generated ID as the parent_id for the next frame, the caller becomes a child of the callee. Should this be iterating in reverse to properly build parent-child aggregation? [ ... ] > + def finalize(self) -> None: > + """Copy files to database and add keys/views.""" [ ... ] > + print("Adding primary keys") > + self.do_query("ALTER TABLE selected_events ADD PRIMARY KEY (id)") > + self.do_query("ALTER TABLE machines ADD PRIMARY KEY (id)") > + self.do_query("ALTER TABLE threads ADD PRIMARY KEY (id)") > + self.do_query("ALTER TABLE comms ADD PRIMARY KEY (id)") > + self.do_query("ALTER TABLE comm_threads ADD PRIMARY KEY (id)") > + self.do_query("ALTER TABLE dsos ADD PRIMARY KEY (id)") > + self.do_query("ALTER TABLE symbols ADD PRIMARY KEY (id)") > + self.do_query("ALTER TABLE branch_types ADD PRIMARY KEY (id)") > + self.do_query("ALTER TABLE samples ADD PRIMARY KEY (id)") > + self.do_query("ALTER TABLE call_paths ADD PRIMARY KEY (id)") This isn't a bug, but were the foreign key constraints from the original implementation intentionally dropped? The previous script enforced relational integrity by adding foreign keys at the end of the export, which seem to be missing here. [ ... ] > +if __name__ =3D=3D "__main__": > + ap =3D argparse.ArgumentParser( > + description=3D"Export perf data to a postgresql database") > + ap.add_argument("-i", "--input", default=3D"perf.data", > + help=3D"Input file name") > + ap.add_argument("-o", "--output", required=3DTrue, > + help=3D"Output database name") > + args =3D ap.parse_args() > + > + exporter =3D PostgresExporter(args.output) > + exporter.setup_db() > + > + session =3D None > + error_occurred =3D False > + try: Could exceptions in setup_db() bypass the cleanup logic? If setup_db() fails, the script will terminate immediately without executing the finally block, which might leave orphaned temporary directories on the filesystem. --=20 Sashiko AI review =C2=B7 https://sashiko.dev/#/patchset/20260425224951.1746= 63-1-irogers@google.com?part=3D37