All of lore.kernel.org
 help / color / mirror / Atom feed
From: Namhyung Kim <namhyung@kernel.org>
To: Ian Rogers <irogers@google.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>,
	Kan Liang <kan.liang@linux.intel.com>,
	Jiri Olsa <jolsa@kernel.org>,
	Adrian Hunter <adrian.hunter@intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-perf-users@vger.kernel.org,
	Stephane Eranian <eranian@google.com>
Subject: Re: [PATCH] perf report: Add 'tgid' sort key
Date: Mon, 24 Feb 2025 18:51:08 -0800	[thread overview]
Message-ID: <Z70wHEl6Sp0H0c-3@google.com> (raw)
In-Reply-To: <CAP-5=fXEEMFgPF2aZhKsfrY_En+qoqX20dWfuE_ad73Uxf0ZHQ@mail.gmail.com>

On Mon, Feb 24, 2025 at 10:18:37AM -0800, Ian Rogers wrote:
> On Thu, Feb 20, 2025 at 11:04 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Thu, Feb 20, 2025 at 09:12:46AM -0800, Ian Rogers wrote:
> > > On Wed, Feb 19, 2025 at 1:10 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > > >
> > > > On Wed, Feb 19, 2025 at 03:47:44PM +0100, Arnaldo Carvalho de Melo wrote:
> > > > > On Wed, Feb 19, 2025 at 03:37:10PM +0100, Arnaldo Carvalho de Melo wrote:
> > > > > > On Tue, Feb 18, 2025 at 02:03:01PM -0800, Namhyung Kim wrote:
> > > > > > > On Tue, Feb 18, 2025 at 10:01:33PM +0100, Arnaldo Carvalho de Melo wrote:
> > > > > > > > On Tue, Feb 18, 2025 at 09:36:52PM +0100, Arnaldo Carvalho de Melo wrote:
> > > > > > > > > So the call to maps_fixup_end() will set maps->end_broken to false,
> > > > > > > > > since it fixed up the map ends, etc, but then we insert more maps with
> > > > > > > > > broken ends:
> > > > > > > >
> > > > > > > > > #6  0x0000000000633d52 in check_invariants (maps=0xf967c0) at util/maps.c:95
> > > > > > > > > 95                                            assert(map__end(prev) <= map__end(map));
> > > > > > > > > (gdb) p prev->dso->name
> > > > > > > > > $1 = 0xfc47ab "bpf_trampoline_6442522522"
> > > > > > > >
> > > > > > > > So the above map is created overlapping a previously existing map:
> > > > > > > >
> > > > > > > > root@number:~# perf probe -l
> > > > > > > >   probe_perf:maps_fixup_end (on maps__fixup_end:1@util/maps.c in /home/acme/bin/perf with maps)
> > > > > > > >   probe_perf:maps_insert (on maps__insert:1@util/maps.c in /home/acme/bin/perf with maps name start end)
> > > > > > > > root@number:~#
> > > > > > > >
> > > > > > > > root@number:~# perf trace --lib -e probe_perf:maps* perf record sleep
> > > > > > > > <SNIP>
> > > > > > > >    319.791 perf/1732173 probe_perf:maps_insert((634e5e) maps=0x2d9715d0 name="bpf_prog_6deef7357e7b4530_sd_fw_egress" start=0xffffffffc0160788 end=0xffffffffc01607c8)
> > > > > > > >    319.810 perf/1732173 probe_perf:maps_insert((634e5e) maps=0x2d9715d0 name="bpf_prog_6deef7357e7b4530_sd_fw_ingress" start=0xffffffffc01647b8 end=0xffffffffc01647f8)
> > > > > > > >    319.822 perf/1732173 probe_perf:maps_insert((634e5e) maps=0x2d9715d0 name="bpf_prog_6deef7357e7b4530_sd_fw_egress" start=0xffffffffc016482c end=0xffffffffc016486c)
> > > > > > > >    319.834 perf/1732173 probe_perf:maps_insert((634e5e) maps=0x2d9715d0 name="bpf_prog_6deef7357e7b4530_sd_fw_ingress" start=0xffffffffc01648ac end=0xffffffffc01648ec)
> > > > > > > >    319.845 perf/1732173 probe_perf:maps_insert((634e5e) maps=0x2d9715d0 name="bpf_prog_be31ae23198a0378_sd_devices" start=0xffffffffc0186388 end=0xffffffffc01864b2)
> > > > > > > >    319.857 perf/1732173 probe_perf:maps_insert((634e5e) maps=0x2d9715d0 name="bpf_trampoline_6442522522" start=0xffffffffc0147640 end=0xffffffffc0148640)
> > > > > > > > [ perf record: Captured and wrote 0.035 MB perf.data (7 samples) ]
> > > > > > > > perf: util/maps.c:95: check_invariants: Assertion `map__end(prev) <= map__end(map)' failed.
> > > > > > > > root@number:~#
> > > > > > > >
> > > > > > > > So a PERF_RECORD_KSYMBOL processing will add a map for
> > > > > > > > "bpf_trampoline_6442522522" that has its start after before the
> > > > > > > > "bpf_prog_40ddf486530245f5_sd_devices" start, ok, but ends after
> > > > > > > > "bpf_prog_40ddf486530245f5_sd_devices", overlapping it.
> > > > > > > >
> > > > > > > > machine__process_ksymbol_register() does:
> > > > > > > >
> > > > > > > > 713                     map__set_start(map, event->ksymbol.addr);
> > > > > > > > 714                     map__set_end(map, map__start(map) + event->ksymbol.len);
> > > > > > > > 715                     err = maps__insert(machine__kernel_maps(machine), map);
> > > > > > > >
> > > > > > > > And:
> > > > > > > >
> > > > > > > > (gdb) p /x event->ksymbol.addr
> > > > > > > > $2 = 0xffffffffc0147a2c
> > > > > > > > (gdb) p event->ksymbol.len
> > > > > > > > $3 = 306
> > > > > > >
> > > > > > > Hmm.. so I think the situation is like below.
> > > > > > >
> > > > > > >              (bpf_trampoline_6442522522)
> > > > > > >       +---------------------------------------+
> > > > > > >       |                                       |
> > > > > > >       |       +------------------------+      |
> > > > > > >       |       | (bpf_prog_40ddf486...) | <----+----  adding this
> > > > > > >       |       |                        |      |
> > > > > > >       |       |                        |      |
> > > > > > >       |   c0147a2c                            |
> > > > > > >       |                                       |
> > > > > > >   c0147640                                 c0148640
> > > > > > >
> > > > > > > And it failed to add bpf_prog_40ddf486... in check_invariants() because
> > > > > > > the end address is smaller than the previous map.
> > > > > >
> > > > > > No, it didn't fail to add, it managed to do it which left the kernel
> > > > > > maps in a broken state, with overlappings while it had a cleared
> > > > > > ends_broken, then, later, when the checks_invariant is finally called at
> > > > > > perf record exit time:
> > > > >
> > > > > Nope, __maps__insert() should notice that the ends are broken and set
> > > > > it:
> > > > >
> > > > >         if (nr_maps == 1) {
> > > > >                 /* If there's just 1 entry then maps are sorted. */
> > > > >                 maps__set_maps_by_address_sorted(maps, true);
> > > > >                 maps__set_maps_by_name_sorted(maps, maps_by_name != NULL);
> > > > >         } else {
> > > > >                 /* Sorted if maps were already sorted and this map starts after the last one. */
> > > > >                 maps__set_maps_by_address_sorted(maps,
> > > > >                         maps__maps_by_address_sorted(maps) &&
> > > > >                         map__end(maps_by_address[nr_maps - 2]) <= map__start(new));
> > > > >                 maps__set_maps_by_name_sorted(maps, false);
> > > > >         }
> > > > >         if (map__end(new) < map__start(new))
> > > > >                 RC_CHK_ACCESS(maps)->ends_broken = true;
> > > > >
> > > > >
> > > > > humm, RC_CHK_ACCESS(maps)->ends_broken should be set for the case we
> > > > > have and I think it isn't being... Then the bpf trampoline map that is
> > > > > the last entry to be added is before the last entry and thus
> > > > > maps_by_address_sorted is set to false, ends_broken continues false and
> > > > > at the end maps_by_address_sorted is set to true and the last
> > > > > check_invariants triggerrs the asserts...
> > > >
> > > > Right, probably it needs to set the ends_broken when the end address of
> > > > the new map is smaller than the previous (but the start address is
> > > > bigger) and fixup the end address when it sorts the maps by address.
> > >
> > > Ugh, I get git blamed for ends_broken and I was wondering what the heck it is:
> > > https://lore.kernel.org/all/20240210031746.4057262-2-irogers@google.com/
> > > My memory is that when the rb-tree was built the maps put in it could
> > > be broken and ends_broken was to capture we were in this state as the
> > > sorting would get broken, invariants be off, etc.. The rb-tree
> > > constructing code would then call maps__fixup_end. Having the caller
> > > call maps__fixup_end seems error prone, as does the whole
> > > "ends_broken" thing - remember I was in the code to fix memory leaks
> > > so modifying the maps API wasn't front of mind. I added ends_broken,
> > > the original rb-tree had no notion of it, because I was trying to get
> > > the invariants right for the testing I could do and ends_broken was
> > > the pragmatic thing to do for odd cases like kernel modules before
> > > maps__fixup_end is called.
> > >
> > > The maps API has evolved and we have a pretty robust, but possibly not
> > > fast, maps__fixup_overlap_and_insert:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/maps.h?h=perf-tools-next#n69
> > > I think ideally we'd make maps__insert uphold the invariants and not
> > > have ends_broken. I'm worried that making ends_broken more load
> > > bearing isn't the right thing to do, we may even be able to not have
> > > the variable for the "ifndef NDEBUG" case, which making it load
> > > bearing would completely defeat.
> > >
> > > So I think the fix here should be to understand the maps construction
> > > code for the modules, try to work out why maps__fixup_end wasn't
> > > called, perhaps migrate the code to maps__fixup_overlap_and_insert or
> > > add a missed maps__fixup_end call.
> >
> > IIUC module size in /proc/modules are wrong due to the reason in the
> > commit 876e80cf83d10585 ("perf tools: Fixup end address of modules") and
> > it called maps__fixup_end() for that.
> >
> > But the problem is some BPF maps processed at real-time during the
> > build-id processing at the end of perf record.  One map is inside of
> > another and check_invariants() didn't expect such maps and crashed.
> 
> I thought the real-time processing had to use
> maps__fixup_overlap_and_insert (rather than maps__insert) as mmap
> events only give us VMA data and two mmaps may have been merged.
> Shouldn't doing this change be the simplest fix?

Make sense.  How about this?

Thanks,
Namhyung


---8<---
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 316f0879e5e08d66..d80b34717090db44 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -717,7 +717,7 @@ static int machine__process_ksymbol_register(struct machine *machine,
 
 		map__set_start(map, event->ksymbol.addr);
 		map__set_end(map, map__start(map) + event->ksymbol.len);
-		err = maps__insert(machine__kernel_maps(machine), map);
+		err = maps__fixup_overlap_and_insert(machine__kernel_maps(machine), map);
 		if (err) {
 			err = -ENOMEM;
 			goto out;
@@ -1459,8 +1459,6 @@ static int machine__create_modules(struct machine *machine)
 	if (modules__parse(modules, machine, machine__create_module))
 		return -1;
 
-	maps__fixup_end(machine__kernel_maps(machine));
-
 	if (!machine__set_modules_path(machine))
 		return 0;
 
@@ -1554,6 +1552,8 @@ int machine__create_kernel_maps(struct machine *machine)
 		}
 	}
 
+	maps__fixup_end(machine__kernel_maps(machine));
+
 out_put:
 	dso__put(kernel);
 	return ret;

  reply	other threads:[~2025-02-25  2:51 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-06  0:01 [PATCH] perf report: Add 'tgid' sort key Namhyung Kim
2025-02-11 22:43 ` Ian Rogers
2025-02-12 21:05 ` Arnaldo Carvalho de Melo
2025-02-12 21:07   ` Arnaldo Carvalho de Melo
2025-02-12 21:59     ` Ian Rogers
2025-02-12 22:10       ` Ian Rogers
2025-02-13  1:52         ` Namhyung Kim
2025-02-14 22:22         ` Arnaldo Carvalho de Melo
2025-02-18 20:36           ` Arnaldo Carvalho de Melo
2025-02-18 21:01             ` Arnaldo Carvalho de Melo
2025-02-18 22:03               ` Namhyung Kim
2025-02-19 14:37                 ` Arnaldo Carvalho de Melo
2025-02-19 14:47                   ` Arnaldo Carvalho de Melo
2025-02-19 21:10                     ` Namhyung Kim
2025-02-20 17:12                       ` Ian Rogers
2025-02-21  7:04                         ` Namhyung Kim
2025-02-24 18:18                           ` Ian Rogers
2025-02-25  2:51                             ` Namhyung Kim [this message]
2025-02-25  4:40                               ` Ian Rogers
2025-02-25  7:51                                 ` Namhyung Kim
2025-02-25 19:07                                   ` Arnaldo Carvalho de Melo
2025-02-25 19:11                                     ` Arnaldo Carvalho de Melo
2025-02-25 19:25                                       ` Arnaldo Carvalho de Melo
2025-02-25 19:48                                         ` Arnaldo Carvalho de Melo
2025-02-26 19:34                                           ` Namhyung Kim
2025-02-26 21:37                                             ` Arnaldo Carvalho de Melo
2025-02-26 21:38                                               ` Arnaldo Carvalho de Melo
2025-02-26 21:40                                                 ` Arnaldo Carvalho de Melo
2025-02-26 21:49                                                   ` Arnaldo Carvalho de Melo
2025-02-26 22:17                                                     ` Namhyung Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z70wHEl6Sp0H0c-3@google.com \
    --to=namhyung@kernel.org \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=eranian@google.com \
    --cc=irogers@google.com \
    --cc=jolsa@kernel.org \
    --cc=kan.liang@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.