All of lore.kernel.org
 help / color / mirror / Atom feed
From: Namhyung Kim <namhyung@kernel.org>
To: Ian Rogers <irogers@google.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>,
	James Clark <james.clark@linaro.org>,
	Jiri Olsa <jolsa@kernel.org>,
	Adrian Hunter <adrian.hunter@intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-perf-users@vger.kernel.org,
	Eric Biggers <ebiggers@kernel.org>,
	Pablo Galindo <pablogsal@gmail.com>
Subject: Re: [PATCH 1/2] perf jitdump: Add load_addr to build-ID generation
Date: Fri, 14 Nov 2025 10:57:12 -0800	[thread overview]
Message-ID: <aRd7iJJxgT-0G8Ky@google.com> (raw)
In-Reply-To: <CAP-5=fWsFLkYGFFyzSrpZNdRLpjaTZAuW7-YVUr-zMVH5dk8eg@mail.gmail.com>

On Fri, Nov 14, 2025 at 09:33:29AM -0800, Ian Rogers wrote:
> On Fri, Nov 14, 2025 at 1:29 AM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > It was reported that python backtrace with JIT dump was broken after the
> > change to built-in SHA-1 implementation.  It seems python generates the
> > same JIT code for each function.  They will become separate DSOs but the
> > contents are the same.  Only difference is in the symbol name.
> >
> > But this caused a problem that every JIT'ed DSOs will have the same
> > build-ID which makes perf confused.  And it resulted in no python
> > symbols (from JIT) in the output.
> 
> The lookup of a DSO involves the build ID and the filename. I'm
> confused as to why things weren't deduplicated and why no symbols
> rather than repeatedly the same symbol?

I don't know, but that's the symptom in the original bug report in the
python github (see Links: below).  I guess the behavior is
non-deterministic.

> 
> > Looking back at the original code before the conversion, it used the
> > load_addr as well as the code section to distinguish each DSO.  I think
> > we should do the same or use symbol table as an additional input for
> > SHA-1.
> 
> Hmm.. the build ID for the contents of the code should be a constant.
> As the build ID is a note for the entire ELF file then something is
> wrong with the filename handling it seems.

When it tries to load symbols from a DSO, it prefer reading from the
build-ID cache than the file system since it trusts build-IDs more than
the path name.  See dso__load() and binary_type_symtab[].

So having multiple DSO's with the same build-ID can be a problem if they
are in the build-ID cache.  Normally `perf inject -j` won't add the new
JIT-ed DSOs to the build-ID cache but it's still possible.

Thanks,
Namhyung

> 
> > This patch is a quick-and-dirty fix just to add each byte of the
> > load_addr to the first 8 bytes of SHA-1 result.  Probably we need to add
> > sha1_update() or similar to update the existing hash value and use it
> > here.  I'd like something that can be backported to the stable trees
> > easily.
> >
> > Fixes: e3f612c1d8f3945b ("perf genelf: Remove libcrypto dependency and use built-in sha1()")
> > Cc: Eric Biggers <ebiggers@kernel.org>
> > Cc: Pablo Galindo <pablogsal@gmail.com>
> > Link: https://github.com/python/cpython/issues/139544
> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > ---
> >  tools/perf/util/genelf.c | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> >
> > diff --git a/tools/perf/util/genelf.c b/tools/perf/util/genelf.c
> > index 591548b10e34ef6a..a412e6faf70e37f3 100644
> > --- a/tools/perf/util/genelf.c
> > +++ b/tools/perf/util/genelf.c
> > @@ -395,6 +395,15 @@ jit_write_elf(int fd, uint64_t load_addr __maybe_unused, const char *sym,
> >          * build-id generation
> >          */
> >         sha1(code, csize, bnote.build_id);
> > +       /* FIXME: update the SHA-1 hash using additional contents */
> > +       bnote.build_id[0] += (load_addr >> 0) & 0xff;
> > +       bnote.build_id[1] += (load_addr >> 8) & 0xff;
> > +       bnote.build_id[2] += (load_addr >> 16) & 0xff;
> > +       bnote.build_id[3] += (load_addr >> 24) & 0xff;
> > +       bnote.build_id[4] += (load_addr >> 32) & 0xff;
> > +       bnote.build_id[5] += (load_addr >> 40) & 0xff;
> > +       bnote.build_id[6] += (load_addr >> 48) & 0xff;
> > +       bnote.build_id[7] += (load_addr >> 56) & 0xff;
> >         bnote.desc.namesz = sizeof(bnote.name); /* must include 0 termination */
> >         bnote.desc.descsz = sizeof(bnote.build_id);
> >         bnote.desc.type   = NT_GNU_BUILD_ID;
> > --
> > 2.52.0.rc1.455.g30608eb744-goog
> >

  reply	other threads:[~2025-11-14 18:57 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-14  9:29 [PATCH 1/2] perf jitdump: Add load_addr to build-ID generation Namhyung Kim
2025-11-14  9:29 ` [PATCH 2/2] perf test: Add python JIT dump test Namhyung Kim
2025-11-14 17:44   ` Ian Rogers
2025-11-14 19:03     ` Namhyung Kim
2025-11-14 19:12   ` Pablo Galindo Salgado
2025-11-14 19:23     ` Namhyung Kim
2025-11-14 22:27       ` Pablo Galindo Salgado
2025-11-14 23:27         ` Namhyung Kim
2025-11-14 23:49           ` Pablo Galindo Salgado
2025-11-14 17:33 ` [PATCH 1/2] perf jitdump: Add load_addr to build-ID generation Ian Rogers
2025-11-14 18:57   ` Namhyung Kim [this message]
2025-11-14 19:32     ` Ian Rogers
2025-11-14 23:24       ` Namhyung Kim
2025-11-14 23:58         ` Ian Rogers
2025-11-16  7:22       ` Fangrui Song
2025-11-17 16:58         ` Ian Rogers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aRd7iJJxgT-0G8Ky@google.com \
    --to=namhyung@kernel.org \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=ebiggers@kernel.org \
    --cc=irogers@google.com \
    --cc=james.clark@linaro.org \
    --cc=jolsa@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=pablogsal@gmail.com \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.