From: Namhyung Kim <namhyung@kernel.org>
To: Ian Rogers <irogers@google.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>,
James Clark <james.clark@linaro.org>,
Jiri Olsa <jolsa@kernel.org>,
Adrian Hunter <adrian.hunter@intel.com>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@kernel.org>,
LKML <linux-kernel@vger.kernel.org>,
linux-perf-users@vger.kernel.org,
Eric Biggers <ebiggers@kernel.org>,
Pablo Galindo <pablogsal@gmail.com>
Subject: Re: [PATCH 1/2] perf jitdump: Add load_addr to build-ID generation
Date: Fri, 14 Nov 2025 10:57:12 -0800 [thread overview]
Message-ID: <aRd7iJJxgT-0G8Ky@google.com> (raw)
In-Reply-To: <CAP-5=fWsFLkYGFFyzSrpZNdRLpjaTZAuW7-YVUr-zMVH5dk8eg@mail.gmail.com>
On Fri, Nov 14, 2025 at 09:33:29AM -0800, Ian Rogers wrote:
> On Fri, Nov 14, 2025 at 1:29 AM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > It was reported that python backtrace with JIT dump was broken after the
> > change to built-in SHA-1 implementation. It seems python generates the
> > same JIT code for each function. They will become separate DSOs but the
> > contents are the same. Only difference is in the symbol name.
> >
> > But this caused a problem that every JIT'ed DSOs will have the same
> > build-ID which makes perf confused. And it resulted in no python
> > symbols (from JIT) in the output.
>
> The lookup of a DSO involves the build ID and the filename. I'm
> confused as to why things weren't deduplicated and why no symbols
> rather than repeatedly the same symbol?
I don't know, but that's the symptom in the original bug report in the
python github (see Links: below). I guess the behavior is
non-deterministic.
>
> > Looking back at the original code before the conversion, it used the
> > load_addr as well as the code section to distinguish each DSO. I think
> > we should do the same or use symbol table as an additional input for
> > SHA-1.
>
> Hmm.. the build ID for the contents of the code should be a constant.
> As the build ID is a note for the entire ELF file then something is
> wrong with the filename handling it seems.
When it tries to load symbols from a DSO, it prefer reading from the
build-ID cache than the file system since it trusts build-IDs more than
the path name. See dso__load() and binary_type_symtab[].
So having multiple DSO's with the same build-ID can be a problem if they
are in the build-ID cache. Normally `perf inject -j` won't add the new
JIT-ed DSOs to the build-ID cache but it's still possible.
Thanks,
Namhyung
>
> > This patch is a quick-and-dirty fix just to add each byte of the
> > load_addr to the first 8 bytes of SHA-1 result. Probably we need to add
> > sha1_update() or similar to update the existing hash value and use it
> > here. I'd like something that can be backported to the stable trees
> > easily.
> >
> > Fixes: e3f612c1d8f3945b ("perf genelf: Remove libcrypto dependency and use built-in sha1()")
> > Cc: Eric Biggers <ebiggers@kernel.org>
> > Cc: Pablo Galindo <pablogsal@gmail.com>
> > Link: https://github.com/python/cpython/issues/139544
> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > ---
> > tools/perf/util/genelf.c | 9 +++++++++
> > 1 file changed, 9 insertions(+)
> >
> > diff --git a/tools/perf/util/genelf.c b/tools/perf/util/genelf.c
> > index 591548b10e34ef6a..a412e6faf70e37f3 100644
> > --- a/tools/perf/util/genelf.c
> > +++ b/tools/perf/util/genelf.c
> > @@ -395,6 +395,15 @@ jit_write_elf(int fd, uint64_t load_addr __maybe_unused, const char *sym,
> > * build-id generation
> > */
> > sha1(code, csize, bnote.build_id);
> > + /* FIXME: update the SHA-1 hash using additional contents */
> > + bnote.build_id[0] += (load_addr >> 0) & 0xff;
> > + bnote.build_id[1] += (load_addr >> 8) & 0xff;
> > + bnote.build_id[2] += (load_addr >> 16) & 0xff;
> > + bnote.build_id[3] += (load_addr >> 24) & 0xff;
> > + bnote.build_id[4] += (load_addr >> 32) & 0xff;
> > + bnote.build_id[5] += (load_addr >> 40) & 0xff;
> > + bnote.build_id[6] += (load_addr >> 48) & 0xff;
> > + bnote.build_id[7] += (load_addr >> 56) & 0xff;
> > bnote.desc.namesz = sizeof(bnote.name); /* must include 0 termination */
> > bnote.desc.descsz = sizeof(bnote.build_id);
> > bnote.desc.type = NT_GNU_BUILD_ID;
> > --
> > 2.52.0.rc1.455.g30608eb744-goog
> >
next prev parent reply other threads:[~2025-11-14 18:57 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-14 9:29 [PATCH 1/2] perf jitdump: Add load_addr to build-ID generation Namhyung Kim
2025-11-14 9:29 ` [PATCH 2/2] perf test: Add python JIT dump test Namhyung Kim
2025-11-14 17:44 ` Ian Rogers
2025-11-14 19:03 ` Namhyung Kim
2025-11-14 17:33 ` [PATCH 1/2] perf jitdump: Add load_addr to build-ID generation Ian Rogers
2025-11-14 18:57 ` Namhyung Kim [this message]
2025-11-14 19:32 ` Ian Rogers
2025-11-14 23:24 ` Namhyung Kim
2025-11-14 23:58 ` Ian Rogers
2025-11-16 7:22 ` Fangrui Song
2025-11-17 16:58 ` Ian Rogers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aRd7iJJxgT-0G8Ky@google.com \
--to=namhyung@kernel.org \
--cc=acme@kernel.org \
--cc=adrian.hunter@intel.com \
--cc=ebiggers@kernel.org \
--cc=irogers@google.com \
--cc=james.clark@linaro.org \
--cc=jolsa@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=pablogsal@gmail.com \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox