All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Biggers <ebiggers@kernel.org>
To: Namhyung Kim <namhyung@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>,
	Ian Rogers <irogers@google.com>,
	James Clark <james.clark@linaro.org>,
	Jiri Olsa <jolsa@kernel.org>,
	Adrian Hunter <adrian.hunter@intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-perf-users@vger.kernel.org,
	Pablo Galindo <pablogsal@gmail.com>,
	Fangrui Song <maskray@sourceware.org>
Subject: Re: [PATCH v2 1/2] perf jitdump: Add sym/str-tables to build-ID generation
Date: Tue, 25 Nov 2025 19:29:43 +0000	[thread overview]
Message-ID: <20251125192943.GA3061247@google.com> (raw)
In-Reply-To: <20251125080748.461014-1-namhyung@kernel.org>

On Tue, Nov 25, 2025 at 12:07:46AM -0800, Namhyung Kim wrote:
> It was reported that python backtrace with JIT dump was broken after the
> change to built-in SHA-1 implementation.  It seems python generates the
> same JIT code for each function.  They will become separate DSOs but the
> contents are the same.  Only difference is in the symbol name.
> 
> But this caused a problem that every JIT'ed DSOs will have the same
> build-ID which makes perf confused.  And it resulted in no python
> symbols (from JIT) in the output.
> 
> Looking back at the original code before the conversion, it used the
> load_addr as well as the code section to distinguish each DSO.  But it'd
> be better to use contents of symtab and strtab instead as it aligns with
> some linker behaviors.
> 
> This patch adds a buffer to save all the contents in a single place for
> SHA-1 calculation.  Probably we need to add sha1_update() or similar to
> update the existing hash value with different contents and use it here.
> But it's out of scope for this change and I'd like something that can be
> backported to the stable trees easily.
> 
> Fixes: e3f612c1d8f3945b ("perf genelf: Remove libcrypto dependency and use built-in sha1()")
> Cc: Eric Biggers <ebiggers@kernel.org>
> Cc: Pablo Galindo <pablogsal@gmail.com>
> Cc: Fangrui Song <maskray@sourceware.org>
> Link: https://github.com/python/cpython/issues/139544
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>

That commit actually preserved the behavior of the existing variant of
gen_build_id() that was under #ifdef BUILD_ID_SHA.  So I guess that code
was always broken, and it was just never noticed because the alternative
variant of gen_build_id() under #ifdef BUILD_ID_MD5 was used instead?

The MD5 variant of gen_build_id() just hashed the load_addr concatenated
with the code.  That's not what this patch does, though.  So just to
clarify, you'd actually like to go with a third approach rather than
just restoring the original hash(load_addr || code) approach?

Also, I missed that you had actually changed the hash algorithm.  I had
assumed the perf folks were were pushing SHA-1 because they were already
using it.  Given that the algorithm changed, there must not be any
backwards compatibility concerns here, and you should switch to a modern
hash algorithm such as SHA-256 instead.

I'd be glad to add an incremental API if you need it, but I'm confused
why you want SHA-1 and not a modern hash algorithm.

- Eric

  parent reply	other threads:[~2025-11-25 19:29 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-25  8:07 [PATCH v2 1/2] perf jitdump: Add sym/str-tables to build-ID generation Namhyung Kim
2025-11-25  8:07 ` [PATCH v2 2/2] perf test: Add python JIT dump test Namhyung Kim
2025-11-25 19:29 ` Eric Biggers [this message]
2025-11-26  2:55   ` [PATCH v2 1/2] perf jitdump: Add sym/str-tables to build-ID generation Ian Rogers
2025-11-26  3:04     ` Eric Biggers
2025-11-27 21:17       ` Namhyung Kim
2025-11-27 22:18         ` Eric Biggers
2025-12-02 21:56           ` Namhyung Kim
2025-12-02 23:26             ` Ian Rogers
2025-12-03  0:06               ` Namhyung Kim
2025-12-02 23:27             ` Eric Biggers
2025-12-02 23:32               ` Ian Rogers
2025-12-03  1:28             ` Fangrui Song
2025-12-03 17:58 ` Namhyung Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251125192943.GA3061247@google.com \
    --to=ebiggers@kernel.org \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=irogers@google.com \
    --cc=james.clark@linaro.org \
    --cc=jolsa@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=maskray@sourceware.org \
    --cc=mingo@kernel.org \
    --cc=namhyung@kernel.org \
    --cc=pablogsal@gmail.com \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.