linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: James Clark <james.clark@arm.com>
To: Pablo Galindo Salgado <pablogsal@gmail.com>,
	linux-perf-users@vger.kernel.org
Subject: Re: Perf support in CPython
Date: Thu, 30 Nov 2023 11:00:38 +0000	[thread overview]
Message-ID: <896b5786-8ed7-af6c-2c64-a24bb06a0d89@arm.com> (raw)
In-Reply-To: <CAFjbc8G+q4sFpO5D6PqbDFNbHHpgYZ_2ADV7sUp7Lw=AefprJA@mail.gmail.com>



On 20/11/2023 23:50, Pablo Galindo Salgado wrote:
> Hi,
> 
> I am Pablo Galindo Salgado from the CPython core development team. In
> the past release of CPython (Python 3.12) I have added support
> for including Python function calls in the output of perf by
> leveraging the jit-interface support by writing to /tmp/perf-%d.map.
> The support
> works by compiling assembly trampolines at runtime that just call the
> Python bytecode evaluation loop and assigning function names to the
> trampolines by writing in /tmp/perf-%d.map. This has worked really
> well when CPython is compiled with frame pointers but unfortunately
> CPython also suffers from 5% to 10% slow down when frame pointers are
> used. Unfortunately, perf cannot unwind through the jitted trampolines
> without frame pointers currently.
> 

I doubt this will have any impact on the 5% - 10%, but I'll leave it
here anyway just in case:

(At least on Arm) you can avoid "-mno-omit-leaf-frame-pointer". As in,
leaf frame pointers can be omitted. This is because of this change that
we added to Perf:

  void arch__add_leaf_frame_record_opts(struct record_opts *opts)
  {
     opts->sample_user_regs |= sample_reg_masks[PERF_REG_ARM64_LR].mask;
  }

After sampling the register, even in FP unwind mode, we'll still try to
do a single step of Dwarf unwind of the last frame, using only the link
register as the input data, and none of the stack space. The unwinder
knows at that instruction whether the link register contains the return
address, and there is a high chance that it does.

We did this because at least with some compilers,
"-no-omit-frame-pointer" actually has no effect on leaf frames, it will
still omit it even if you asked for it not to be (because they don't do
anything in leaf frames). Although maybe "-no-omit-leaf-frame-pointer"
fixes this, you actually don't need it as long as you sampled the link
register and have the Dwarf available.

So I suppose to get that working you'd have to add the
JIT_CODE_UNWINDING_INFO stuff.

> I am looking to extend support to the jitdump specification to allow
> perf to unwind through these trampolines when using dwarf unwinding
> mode.

Do you really want to use full Dwarf mode? You have to save the entire
stack space on every sample for that to work, so it seems a bit slow.
But maybe it's not an issue for lower sampling frequencies. If the
application has a huge stack it's not really scalable.

Is that just so you have a mode that works out of the box without any
recompilation of Python? But not necessarily the best way to do it?

I don't know if it's possible to have some kind of hybrid approach where
only the trampolines have leaf frame pointers, but none of the other
code does, so you don't get overhead issue? Or that would also fall
apart in the kernel's FP unwinder?

> I have a draft for a patch
> (https://github.com/python/cpython/pull/112254) that adds support for
> the specification by writing JIT_CODE_LOAD and
> JIT_CODE_UNWINDING_INFO (currently with empty EH Frame entries to
> force perf to defer to FP based unwinding). This works well in my
> current
> distribution (Arch Linux with perf version 6.5) but when tested in
> other distributions it falls apart.
> 
> * One problem I have struggled to debug is that even when compiled
> with frame pointers and without emitting JIT_CODE_UNWINDING_INFO
> (using fp-based unwinding), in some cases (such as the latest Ubuntu
> versions), the shared objects and addresses generated by perf-inject
> -j look sane, the addresses match but perf report or perf script do
> not include the symbol names in their output. I do not understand why
> this
> is happening and my debuggins so far has not allowed me to get any
> thread I can pull so I wanted to reach to this mailing list for
> assistance.
> 
> * The other problem I am finding is to include the eh_frames to our
> trampolines as it looks like deferring to FP unwinding is unreliable
> even if
> our trampolines are trivial. This is a different problem but my
> current attempts to add EH Frames have failed and I don't know how to
> properly
> debug them correctly as it looks like perf doesn't expect some of the
> values we are writing. I would love it if someone with perf and DWARF
> expertise could help us in the CPython team to get steered in the
> right direction.
> 
> Thanks a lot in advance,
> Pablo Galindo Salgado
> 

  parent reply	other threads:[~2023-11-30 11:00 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-20 23:50 Perf support in CPython Pablo Galindo Salgado
2023-11-21 18:15 ` Namhyung Kim
2023-11-21 18:30   ` Pablo Galindo Salgado
2023-11-21 18:49     ` Namhyung Kim
2023-11-22 21:04 ` Ian Rogers
2023-11-30  6:39   ` Namhyung Kim
2023-11-30 11:00 ` James Clark [this message]
2023-11-30 12:42   ` Pablo Galindo Salgado
2023-11-30 18:09     ` James Clark
2023-11-30 21:16     ` Ian Rogers
2023-12-01  9:57       ` James Clark

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=896b5786-8ed7-af6c-2c64-a24bb06a0d89@arm.com \
    --to=james.clark@arm.com \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=pablogsal@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).