linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Mateusz Kowalczyk" <fuuzetsu@fuuzetsu.co.uk>
To: "Ian Rogers" <irogers@google.com>,
	linux-perf-users <linux-perf-users@vger.kernel.org>
Subject: Re: Linux `perf report` regression
Date: Mon, 27 May 2024 09:37:41 +0900	[thread overview]
Message-ID: <a8ea965e-1a88-4297-8ef7-8a8e80ae6a4a@app.fastmail.com> (raw)
In-Reply-To: <CAP-5=fVpgRgx839qaeEbnOPSeMD5xXUwNv7Jeqe1PQj8yqV9_g@mail.gmail.com>

On Sat, 25 May 2024, at 5:17 AM, Ian Rogers wrote:
> On Fri, May 24, 2024 at 1:00 AM Mateusz Kowalczyk
> <fuuzetsu@fuuzetsu.co.uk> wrote:
>>
>> Hi Ian,
>>
>> I have recently upgraded my system and I have found that `perf report` no longer gives nice traces.
>>
>> For the same `perf record` capture, I used to get traces like this, with `perf report --hierarchy -M intel --no-inline`.
>>
>> ```
>> -   98.33%        sim
>>    -   42.66%        sim
>>       -    3.40%        [.] <serde_json::read::StrRead as serde_json::read::Read>::parse_str
>>            ordercache::Reader::open
>>            serde_json::de::from_str
>>            <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deserialize_struct
>>          + <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deserialize_map
>> ```
>>
>> However after upgrading, I get the following report on the very same file:
>>
>> ```
>> -   98.33%        sim
>>    -   42.66%        sim
>>       -    3.40%        [.] <serde_json::read::StrRead as serde_json::read::Read>::parse_str
>>            0x55e840085c78
>>            0x55e84008fb0d
>>            0x55e8400986d6
>>          + 0x55e840096122
>> ```
>>
>>
>> I have bisected the issue to this commit: https://lore.kernel.org/all/20230608232823.4027869-21-irogers@google.com/ a.k.a. `bffb5b0c0976aa46aaa961dd19a47c9d6301cfe1`. The working `perf report` is from the commit right before that, `1981da1fe2499823f626c86c5ba3be6b89844384`.
>>
>> I'm not sure if there's some mailing list I should be trying to report it at. I can not share the perf capture as-is but I can try to come up with something on Monday that should hopefully reproduce the problem, if necessary.
>
> Thanks Mateusz,
>
> The mailing list to use is linux-perf-users, added. Thanks for the
> bisect, looking at the patch in question it looks like prior to the
> fix there was probably a use after a value had been put. We probably
> need to increase the lifetime of the get/put to avoid the issue you
> are seeing. The reference count checking has some documentation here:
> https://perf.wiki.kernel.org/index.php/Reference_Count_Checking
>
> Looking at the patch, and adding some annotations:
> ```
> --- a/tools/perf/util/callchain.c
> +++ b/tools/perf/util/callchain.c
> @@ -590,6 +590,7 @@ fill_node(struct callchain_node *node, struct
> callchain_cursor *cursor)
>   call->ip = cursor_node->ip;
>   call->ms = cursor_node->ms;
>   call->ms.map = map__get(call->ms.map);
> + call->ms.maps = maps__get(call->ms.maps); <---- adding a get should
> be safe, it increases a reference count and shouldn't make the thing
> go away
>   call->srcline = cursor_node->srcline;
>
>   if (cursor_node->branch) {
> @@ -649,6 +650,7 @@ add_child(struct callchain_node *parent,
>   list_for_each_entry_safe(call, tmp, &new->val, list) {
>   list_del_init(&call->list);
>   map__zput(call->ms.map);
> + maps__zput(call->ms.maps); <----- a put could be the source of the
> problem, especially a zput that makes the value NULL. However, the
> next line here frees "call" and so we're unlikely to have been using
> it later in the code.
>   free(call);
>   }
>   free(new);
> @@ -1010,10 +1012,16 @@ merge_chain_branch(struct callchain_cursor *cursor,
>   int err = 0;
>
>   list_for_each_entry_safe(list, next_list, &src->val, list) {
> - callchain_cursor_append(cursor, list->ip, &list->ms,
> - false, NULL, 0, 0, 0, list->srcline);
> + struct map_symbol ms = {
> + .maps = maps__get(list->ms.maps),
> + .map = map__get(list->ms.map), <--- gets should be safe
> + };
> + callchain_cursor_append(cursor, list->ip, &ms, false, NULL, 0, 0, 0,
> list->srcline);
>   list_del_init(&list->list);
> + map__zput(ms.map);
> + maps__zput(ms.maps); <--- these puts should match the gets above, so
> should be sound
>   map__zput(list->ms.map);
> + maps__zput(list->ms.maps); <--- these puts are on a value in "list"
> which is freed below, so we should be good.
>   free(list);
>   }
> ```
> Nothing was particularly jumping out from the changes but given the
> reproduction you have you should be able to revert the change in each
> file and narrow down which change is causing the issue.
>
> Thanks,
> Ian
>
>
>
>
>> I thought I'd fire an e-mail in case this is something obvious to you and the fix is simple. I tried to search on the web but I was unable to find anything useful. Maybe most people are using some old version or it's something to do with my setup.
>>
>> Please let me know if I should send this to some mailing list instead -- presumably with a reproduction example.
>>
>> Thank you!
>>
>> --
>> Mateusz K.

Hi again,

I was first able to revert `util/callchain.c` and confirmed that fixed the issue. To narrow it down, I tried commenting out every `maps__zput` that was added and later even every `map__zput` that already existed but nothing fixed the problem.

The only thing that did help was this change:

```diff
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 909f62b3b266..77215cedf35d 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -1016,7 +1016,7 @@ merge_chain_branch(struct callchain_cursor *cursor,
                        .maps = maps__get(list->ms.maps),
                        .map = map__get(list->ms.map),
                };
-               callchain_cursor_append(cursor, list->ip, &ms, false, NULL, 0, 0, 0, list->srcline);
+               callchain_cursor_append(cursor, list->ip, &list->ms, false, NULL, 0, 0, 0, list->srcline);
                list_del_init(&list->list);
                map__zput(ms.map);
                maps__zput(ms.maps);
```

I don't know _why_ this fixes it unless something weird is happening with the maps__get or map__get above...?

I was able to confirm on the most recent kernel tree checkout that the same patch still makes the problem go away. Of course, this is probably not the right fix but I don't really know what I'm doing...

Maybe this gives some hint about what may be wrong.

Thanks

--
Mateusz K.

  reply	other threads:[~2024-05-27  0:39 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <fd2b28b4-8fc5-45c1-824b-4de4422e8837@app.fastmail.com>
2024-05-24 20:17 ` Linux `perf report` regression Ian Rogers
2024-05-27  0:37   ` Mateusz Kowalczyk [this message]
2024-05-31  5:47     ` Namhyung Kim
2024-05-31 16:56       ` Ian Rogers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a8ea965e-1a88-4297-8ef7-8a8e80ae6a4a@app.fastmail.com \
    --to=fuuzetsu@fuuzetsu.co.uk \
    --cc=irogers@google.com \
    --cc=linux-perf-users@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).