Re: [PATCH v2 1/2] perf jitdump: Add sym/str-tables to build-ID generation

linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Eric Biggers <ebiggers@kernel.org>
To: Namhyung Kim <namhyung@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>,
	Ian Rogers <irogers@google.com>,
	James Clark <james.clark@linaro.org>,
	Jiri Olsa <jolsa@kernel.org>,
	Adrian Hunter <adrian.hunter@intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-perf-users@vger.kernel.org,
	Pablo Galindo <pablogsal@gmail.com>,
	Fangrui Song <maskray@sourceware.org>
Subject: Re: [PATCH v2 1/2] perf jitdump: Add sym/str-tables to build-ID generation
Date: Tue, 25 Nov 2025 19:29:43 +0000	[thread overview]
Message-ID: <20251125192943.GA3061247@google.com> (raw)
In-Reply-To: <20251125080748.461014-1-namhyung@kernel.org>

On Tue, Nov 25, 2025 at 12:07:46AM -0800, Namhyung Kim wrote:
> It was reported that python backtrace with JIT dump was broken after the
> change to built-in SHA-1 implementation.  It seems python generates the
> same JIT code for each function.  They will become separate DSOs but the
> contents are the same.  Only difference is in the symbol name.
> 
> But this caused a problem that every JIT'ed DSOs will have the same
> build-ID which makes perf confused.  And it resulted in no python
> symbols (from JIT) in the output.
> 
> Looking back at the original code before the conversion, it used the
> load_addr as well as the code section to distinguish each DSO.  But it'd
> be better to use contents of symtab and strtab instead as it aligns with
> some linker behaviors.
> 
> This patch adds a buffer to save all the contents in a single place for
> SHA-1 calculation.  Probably we need to add sha1_update() or similar to
> update the existing hash value with different contents and use it here.
> But it's out of scope for this change and I'd like something that can be
> backported to the stable trees easily.
> 
> Fixes: e3f612c1d8f3945b ("perf genelf: Remove libcrypto dependency and use built-in sha1()")
> Cc: Eric Biggers <ebiggers@kernel.org>
> Cc: Pablo Galindo <pablogsal@gmail.com>
> Cc: Fangrui Song <maskray@sourceware.org>
> Link: https://github.com/python/cpython/issues/139544
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>

That commit actually preserved the behavior of the existing variant of
gen_build_id() that was under #ifdef BUILD_ID_SHA.  So I guess that code
was always broken, and it was just never noticed because the alternative
variant of gen_build_id() under #ifdef BUILD_ID_MD5 was used instead?

The MD5 variant of gen_build_id() just hashed the load_addr concatenated
with the code.  That's not what this patch does, though.  So just to
clarify, you'd actually like to go with a third approach rather than
just restoring the original hash(load_addr || code) approach?

Also, I missed that you had actually changed the hash algorithm.  I had
assumed the perf folks were were pushing SHA-1 because they were already
using it.  Given that the algorithm changed, there must not be any
backwards compatibility concerns here, and you should switch to a modern
hash algorithm such as SHA-256 instead.

I'd be glad to add an incremental API if you need it, but I'm confused
why you want SHA-1 and not a modern hash algorithm.

- Eric

next prev parent reply	other threads:[~2025-11-25 19:29 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-25  8:07 [PATCH v2 1/2] perf jitdump: Add sym/str-tables to build-ID generation Namhyung Kim
2025-11-25  8:07 ` [PATCH v2 2/2] perf test: Add python JIT dump test Namhyung Kim
2025-11-25 19:29 ` Eric Biggers [this message]
2025-11-26  2:55   ` [PATCH v2 1/2] perf jitdump: Add sym/str-tables to build-ID generation Ian Rogers
2025-11-26  3:04     ` Eric Biggers
2025-11-27 21:17       ` Namhyung Kim
2025-11-27 22:18         ` Eric Biggers
2025-12-02 21:56           ` Namhyung Kim
2025-12-02 23:26             ` Ian Rogers
2025-12-03  0:06               ` Namhyung Kim
2025-12-02 23:27             ` Eric Biggers
2025-12-02 23:32               ` Ian Rogers
2025-12-03  1:28             ` Fangrui Song
2025-12-03 17:58 ` Namhyung Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251125192943.GA3061247@google.com \
    --to=ebiggers@kernel.org \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=irogers@google.com \
    --cc=james.clark@linaro.org \
    --cc=jolsa@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=maskray@sourceware.org \
    --cc=mingo@kernel.org \
    --cc=namhyung@kernel.org \
    --cc=pablogsal@gmail.com \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).