All of lore.kernel.org
 help / color / mirror / Atom feed
From: Namhyung Kim <namhyung@kernel.org>
To: Ian Rogers <irogers@google.com>
Cc: maskray@sourceware.org,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	James Clark <james.clark@linaro.org>,
	Jiri Olsa <jolsa@kernel.org>,
	Adrian Hunter <adrian.hunter@intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-perf-users@vger.kernel.org,
	Eric Biggers <ebiggers@kernel.org>,
	Pablo Galindo <pablogsal@gmail.com>
Subject: Re: [PATCH 1/2] perf jitdump: Add load_addr to build-ID generation
Date: Fri, 14 Nov 2025 15:24:46 -0800	[thread overview]
Message-ID: <aRe6PnZ1rk2oOz85@google.com> (raw)
In-Reply-To: <CAP-5=fUC88TVwnjTuTgU+s=LqqP0xOhNTG5hU27cSbaZRH7Jpg@mail.gmail.com>

On Fri, Nov 14, 2025 at 11:32:52AM -0800, Ian Rogers wrote:
> On Fri, Nov 14, 2025 at 10:57 AM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Fri, Nov 14, 2025 at 09:33:29AM -0800, Ian Rogers wrote:
> > > On Fri, Nov 14, 2025 at 1:29 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > > >
> > > > It was reported that python backtrace with JIT dump was broken after the
> > > > change to built-in SHA-1 implementation.  It seems python generates the
> > > > same JIT code for each function.  They will become separate DSOs but the
> > > > contents are the same.  Only difference is in the symbol name.
> > > >
> > > > But this caused a problem that every JIT'ed DSOs will have the same
> > > > build-ID which makes perf confused.  And it resulted in no python
> > > > symbols (from JIT) in the output.
> > >
> > > The lookup of a DSO involves the build ID and the filename. I'm
> > > confused as to why things weren't deduplicated and why no symbols
> > > rather than repeatedly the same symbol?
> >
> > I don't know, but that's the symptom in the original bug report in the
> > python github (see Links: below).  I guess the behavior is
> > non-deterministic.
> >
> > >
> > > > Looking back at the original code before the conversion, it used the
> > > > load_addr as well as the code section to distinguish each DSO.  I think
> > > > we should do the same or use symbol table as an additional input for
> > > > SHA-1.
> > >
> > > Hmm.. the build ID for the contents of the code should be a constant.
> > > As the build ID is a note for the entire ELF file then something is
> > > wrong with the filename handling it seems.
> >
> > When it tries to load symbols from a DSO, it prefer reading from the
> > build-ID cache than the file system since it trusts build-IDs more than
> > the path name.  See dso__load() and binary_type_symtab[].
> >
> > So having multiple DSO's with the same build-ID can be a problem if they
> > are in the build-ID cache.  Normally `perf inject -j` won't add the new
> > JIT-ed DSOs to the build-ID cache but it's still possible.
> 
> +Fangrui
> 
> I'm surprised that build IDs don't include symbol names but:
> ```
> $ cat a.s
> .text
> .global main
> .global foo
> main:
> foo:
>        ret
> $ cat b.s
> .text
> .global main
> .global bar
> main:
> bar:
>        ret
> $ gcc -Wl,--build-id a.s -o a.out
> $ gcc -Wl,--build-id b.s -o b.out
> $ readelf -n a.out
> ...
>    Build ID: 9dd0371b953db5d72929af5d98552e4ee1043616
> ...
> $ readelf -n b.out
> ...
>    Build ID: 9dd0371b953db5d72929af5d98552e4ee1043616
> ...
> ```
> so ugh. Perhaps we need to have jitdump make a single object file (and
> so 1 build ID) but with multiple unique symbols.

Right, that'd be better.  But I'm afraid some JIT code could spread to
many segments so it's not possible to create a map to cover all areas
due to conflicts with other libraries.

Thanks,
Namhyung


  reply	other threads:[~2025-11-14 23:24 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-14  9:29 [PATCH 1/2] perf jitdump: Add load_addr to build-ID generation Namhyung Kim
2025-11-14  9:29 ` [PATCH 2/2] perf test: Add python JIT dump test Namhyung Kim
2025-11-14 17:44   ` Ian Rogers
2025-11-14 19:03     ` Namhyung Kim
2025-11-14 19:12   ` Pablo Galindo Salgado
2025-11-14 19:23     ` Namhyung Kim
2025-11-14 22:27       ` Pablo Galindo Salgado
2025-11-14 23:27         ` Namhyung Kim
2025-11-14 23:49           ` Pablo Galindo Salgado
2025-11-14 17:33 ` [PATCH 1/2] perf jitdump: Add load_addr to build-ID generation Ian Rogers
2025-11-14 18:57   ` Namhyung Kim
2025-11-14 19:32     ` Ian Rogers
2025-11-14 23:24       ` Namhyung Kim [this message]
2025-11-14 23:58         ` Ian Rogers
2025-11-16  7:22       ` Fangrui Song
2025-11-17 16:58         ` Ian Rogers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aRe6PnZ1rk2oOz85@google.com \
    --to=namhyung@kernel.org \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=ebiggers@kernel.org \
    --cc=irogers@google.com \
    --cc=james.clark@linaro.org \
    --cc=jolsa@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=maskray@sourceware.org \
    --cc=mingo@kernel.org \
    --cc=pablogsal@gmail.com \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.