From: Josh Poimboeuf <jpoimboe@kernel.org>
To: Peter Zijlstra <peterz@infradead.org>,
Steven Rostedt <rostedt@goodmis.org>,
Ingo Molnar <mingo@kernel.org>,
Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: linux-kernel@vger.kernel.org, x86@kernel.org,
Indu Bhagat <indu.bhagat@oracle.com>,
Mark Rutland <mark.rutland@arm.com>,
Alexander Shishkin <alexander.shishkin@linux.intel.com>,
Jiri Olsa <jolsa@kernel.org>, Namhyung Kim <namhyung@kernel.org>,
Ian Rogers <irogers@google.com>,
Adrian Hunter <adrian.hunter@intel.com>,
linux-perf-users@vger.kernel.org, Mark Brown <broonie@kernel.org>,
linux-toolchains@vger.kernel.org
Subject: [PATCH RFC 00/10] perf: user space sframe unwinding
Date: Wed, 8 Nov 2023 16:41:05 -0800 [thread overview]
Message-ID: <cover.1699487758.git.jpoimboe@kernel.org> (raw)
Some distros have started compiling frame pointers into all their
packages to enable the kernel to do system-wide profiling of user space.
Unfortunately that creates a runtime performance penalty across the
entire system. Using DWARF (or .eh_frame) instead isn't feasible
because of complexity and slowness.
For in-kernel unwinding we solved this problem with the creation of the
ORC unwinder for x86_64. Similarly, for user space the GNU assembler
has created the SFrame ("Simple Frame") format starting with binutils
2.40.
These patches add support for unwinding user space from the kernel using
SFrame with perf. It should be easy to add user unwinding support for
other components like ftrace.
I tested it on Gentoo by recompiling everything with -Wa,-gsframe and
using a custom glibc patch (which I'll send in a reply to this email).
The unwinding itself seems to work well, though I still have a major
problem: how to tell perf tool to stitch together the separate
kernel+user callchains into a single event?
Right now I have a hack which somehow causes perf tool to overwrite the
kernel callchain with the user one. I'm perf-clueless, any ideas or
patches for a clean way to implement that would be most helpful.
Otherwise there were two main challenges:
1) Finding .sframe sections in shared/dlopened libraries
The kernel has no visibility to the contents of shared libraries.
This was solved by adding a PR_ADD_SFRAME option to prctl() which
allows the runtime linker to manually provide the in-memory address
of an .sframe section to the kernel.
2) Dealing with page faults
Keeping all binaries' sframe data pinned would likely waste a lot of
memory. Instead, read it from user space on demand. That can't be
done from perf NMI context due to page faults, so defer the unwind to
the next user exit. Since the NMI handler doesn't do exit work,
self-IPI and then schedule task work to be run on exit from the IPI.
Special thanks to Indu for the original concept, and to Steven and Peter
for helping a lot with the design. And to Steven for letting me do it ;-)
TODO:
- Stitch kernel+user events together in perf tool (help needed)
- Add arm64 support
- Add VDSO .sframe support
- Allow specifying FP vs sframe from perf tool? Right now it's
auto-detected, maybe that's enough
- Port ftrace and others to use sframe
- Support sframe v2
- Determine the impact of missing DRAP support (aligned stacks which
SFrame doesn't currently support)
- Add debugging hooks
Josh Poimboeuf (10):
perf: Remove get_perf_callchain() 'init_nr' argument
perf: Remove get_perf_callchain() 'crosstask' argument
perf: Simplify get_perf_callchain() user logic
perf: Introduce deferred user callchains
perf/x86: Add HAVE_PERF_CALLCHAIN_DEFERRED
unwind: Introduce generic user space unwinding interfaces
unwind/x86: Add HAVE_USER_UNWIND
perf/x86: Use user_unwind interface
unwind: Introduce SFrame user space unwinding
unwind/x86/64: Add HAVE_USER_UNWIND_SFRAME
arch/Kconfig | 9 +
arch/x86/Kconfig | 3 +
arch/x86/events/core.c | 65 ++---
arch/x86/include/asm/mmu.h | 2 +-
arch/x86/include/asm/user_unwind.h | 11 +
fs/binfmt_elf.c | 46 +++-
include/linux/mm_types.h | 3 +
include/linux/perf_event.h | 24 +-
include/linux/sframe.h | 46 ++++
include/linux/user_unwind.h | 33 +++
include/uapi/linux/elf.h | 1 +
include/uapi/linux/perf_event.h | 1 +
include/uapi/linux/prctl.h | 3 +
kernel/Makefile | 1 +
kernel/bpf/stackmap.c | 6 +-
kernel/events/callchain.c | 39 ++-
kernel/events/core.c | 96 ++++++-
kernel/fork.c | 10 +
kernel/sys.c | 11 +
kernel/unwind/Makefile | 2 +
kernel/unwind/sframe.c | 414 +++++++++++++++++++++++++++++
kernel/unwind/sframe.h | 217 +++++++++++++++
kernel/unwind/user.c | 86 ++++++
mm/init-mm.c | 2 +
24 files changed, 1060 insertions(+), 71 deletions(-)
create mode 100644 arch/x86/include/asm/user_unwind.h
create mode 100644 include/linux/sframe.h
create mode 100644 include/linux/user_unwind.h
create mode 100644 kernel/unwind/Makefile
create mode 100644 kernel/unwind/sframe.c
create mode 100644 kernel/unwind/sframe.h
create mode 100644 kernel/unwind/user.c
--
2.41.0
next reply other threads:[~2023-11-09 0:43 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-09 0:41 Josh Poimboeuf [this message]
2023-11-09 0:41 ` [PATCH RFC 01/10] perf: Remove get_perf_callchain() 'init_nr' argument Josh Poimboeuf
2023-11-11 6:09 ` Namhyung Kim
2023-11-09 0:41 ` [PATCH RFC 02/10] perf: Remove get_perf_callchain() 'crosstask' argument Josh Poimboeuf
2023-11-11 6:11 ` Namhyung Kim
2023-11-11 20:53 ` Jordan Rome
2023-11-09 0:41 ` [PATCH RFC 03/10] perf: Simplify get_perf_callchain() user logic Josh Poimboeuf
2023-11-11 6:11 ` Namhyung Kim
2023-11-09 0:41 ` [PATCH RFC 04/10] perf: Introduce deferred user callchains Josh Poimboeuf
2023-11-11 6:57 ` Namhyung Kim
2023-11-11 18:49 ` Josh Poimboeuf
2023-11-11 18:54 ` Josh Poimboeuf
2023-11-13 16:56 ` Namhyung Kim
2023-11-13 17:21 ` Peter Zijlstra
2023-11-13 17:48 ` Namhyung Kim
2023-11-13 18:49 ` Peter Zijlstra
2023-11-13 19:16 ` Namhyung Kim
2023-11-15 16:13 ` Namhyung Kim
2023-11-20 14:03 ` Peter Zijlstra
2024-09-13 13:08 ` Josh Poimboeuf
2024-09-13 13:36 ` Peter Zijlstra
2024-09-13 13:53 ` Josh Poimboeuf
2024-09-13 14:47 ` Josh Poimboeuf
2024-09-13 15:26 ` Peter Zijlstra
2023-11-09 0:41 ` [PATCH RFC 05/10] perf/x86: Add HAVE_PERF_CALLCHAIN_DEFERRED Josh Poimboeuf
2023-11-09 0:41 ` [PATCH RFC 06/10] unwind: Introduce generic user space unwinding interfaces Josh Poimboeuf
2023-11-09 0:41 ` [PATCH RFC 07/10] unwind/x86: Add HAVE_USER_UNWIND Josh Poimboeuf
2023-11-09 0:41 ` [PATCH RFC 08/10] perf/x86: Use user_unwind interface Josh Poimboeuf
2023-11-09 0:41 ` [PATCH RFC 09/10] unwind: Introduce SFrame user space unwinding Josh Poimboeuf
2023-11-09 19:31 ` Indu Bhagat
2023-11-09 19:37 ` Josh Poimboeuf
2023-11-09 19:49 ` Steven Rostedt
2023-11-09 19:53 ` Josh Poimboeuf
2023-11-09 0:41 ` [PATCH RFC 10/10] unwind/x86/64: Add HAVE_USER_UNWIND_SFRAME Josh Poimboeuf
2023-11-09 0:45 ` [PATCH RFC 00/10] perf: user space sframe unwinding Josh Poimboeuf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1699487758.git.jpoimboe@kernel.org \
--to=jpoimboe@kernel.org \
--cc=acme@kernel.org \
--cc=adrian.hunter@intel.com \
--cc=alexander.shishkin@linux.intel.com \
--cc=broonie@kernel.org \
--cc=indu.bhagat@oracle.com \
--cc=irogers@google.com \
--cc=jolsa@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=linux-toolchains@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=mingo@kernel.org \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).