public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Namhyung Kim <namhyung@kernel.org>
To: Ian Rogers <irogers@google.com>
Cc: maskray@sourceware.org,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	James Clark <james.clark@linaro.org>,
	Jiri Olsa <jolsa@kernel.org>,
	Adrian Hunter <adrian.hunter@intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-perf-users@vger.kernel.org,
	Eric Biggers <ebiggers@kernel.org>,
	Pablo Galindo <pablogsal@gmail.com>
Subject: Re: [PATCH 1/2] perf jitdump: Add load_addr to build-ID generation
Date: Fri, 14 Nov 2025 15:24:46 -0800	[thread overview]
Message-ID: <aRe6PnZ1rk2oOz85@google.com> (raw)
In-Reply-To: <CAP-5=fUC88TVwnjTuTgU+s=LqqP0xOhNTG5hU27cSbaZRH7Jpg@mail.gmail.com>

On Fri, Nov 14, 2025 at 11:32:52AM -0800, Ian Rogers wrote:
> On Fri, Nov 14, 2025 at 10:57 AM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Fri, Nov 14, 2025 at 09:33:29AM -0800, Ian Rogers wrote:
> > > On Fri, Nov 14, 2025 at 1:29 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > > >
> > > > It was reported that python backtrace with JIT dump was broken after the
> > > > change to built-in SHA-1 implementation.  It seems python generates the
> > > > same JIT code for each function.  They will become separate DSOs but the
> > > > contents are the same.  Only difference is in the symbol name.
> > > >
> > > > But this caused a problem that every JIT'ed DSOs will have the same
> > > > build-ID which makes perf confused.  And it resulted in no python
> > > > symbols (from JIT) in the output.
> > >
> > > The lookup of a DSO involves the build ID and the filename. I'm
> > > confused as to why things weren't deduplicated and why no symbols
> > > rather than repeatedly the same symbol?
> >
> > I don't know, but that's the symptom in the original bug report in the
> > python github (see Links: below).  I guess the behavior is
> > non-deterministic.
> >
> > >
> > > > Looking back at the original code before the conversion, it used the
> > > > load_addr as well as the code section to distinguish each DSO.  I think
> > > > we should do the same or use symbol table as an additional input for
> > > > SHA-1.
> > >
> > > Hmm.. the build ID for the contents of the code should be a constant.
> > > As the build ID is a note for the entire ELF file then something is
> > > wrong with the filename handling it seems.
> >
> > When it tries to load symbols from a DSO, it prefer reading from the
> > build-ID cache than the file system since it trusts build-IDs more than
> > the path name.  See dso__load() and binary_type_symtab[].
> >
> > So having multiple DSO's with the same build-ID can be a problem if they
> > are in the build-ID cache.  Normally `perf inject -j` won't add the new
> > JIT-ed DSOs to the build-ID cache but it's still possible.
> 
> +Fangrui
> 
> I'm surprised that build IDs don't include symbol names but:
> ```
> $ cat a.s
> .text
> .global main
> .global foo
> main:
> foo:
>        ret
> $ cat b.s
> .text
> .global main
> .global bar
> main:
> bar:
>        ret
> $ gcc -Wl,--build-id a.s -o a.out
> $ gcc -Wl,--build-id b.s -o b.out
> $ readelf -n a.out
> ...
>    Build ID: 9dd0371b953db5d72929af5d98552e4ee1043616
> ...
> $ readelf -n b.out
> ...
>    Build ID: 9dd0371b953db5d72929af5d98552e4ee1043616
> ...
> ```
> so ugh. Perhaps we need to have jitdump make a single object file (and
> so 1 build ID) but with multiple unique symbols.

Right, that'd be better.  But I'm afraid some JIT code could spread to
many segments so it's not possible to create a map to cover all areas
due to conflicts with other libraries.

Thanks,
Namhyung


  reply	other threads:[~2025-11-14 23:24 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-14  9:29 [PATCH 1/2] perf jitdump: Add load_addr to build-ID generation Namhyung Kim
2025-11-14  9:29 ` [PATCH 2/2] perf test: Add python JIT dump test Namhyung Kim
2025-11-14 17:44   ` Ian Rogers
2025-11-14 19:03     ` Namhyung Kim
2025-11-14 17:33 ` [PATCH 1/2] perf jitdump: Add load_addr to build-ID generation Ian Rogers
2025-11-14 18:57   ` Namhyung Kim
2025-11-14 19:32     ` Ian Rogers
2025-11-14 23:24       ` Namhyung Kim [this message]
2025-11-14 23:58         ` Ian Rogers
2025-11-16  7:22       ` Fangrui Song
2025-11-17 16:58         ` Ian Rogers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aRe6PnZ1rk2oOz85@google.com \
    --to=namhyung@kernel.org \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=ebiggers@kernel.org \
    --cc=irogers@google.com \
    --cc=james.clark@linaro.org \
    --cc=jolsa@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=maskray@sourceware.org \
    --cc=mingo@kernel.org \
    --cc=pablogsal@gmail.com \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox