From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CAE00C77B7C for ; Mon, 1 May 2023 23:00:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230484AbjEAXAZ (ORCPT ); Mon, 1 May 2023 19:00:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43362 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229822AbjEAXAY (ORCPT ); Mon, 1 May 2023 19:00:24 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0E90B2D63 for ; Mon, 1 May 2023 16:00:22 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 6CA0461591 for ; Mon, 1 May 2023 23:00:21 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id ECD50C433D2; Mon, 1 May 2023 23:00:19 +0000 (UTC) Date: Mon, 1 May 2023 19:00:18 -0400 From: Steven Rostedt To: Indu Bhagat Cc: linux-toolchains@vger.kernel.org, daandemeyer@meta.com, andrii@kernel.org, kris.van.hees@oracle.com, elena.zannoni@oracle.com, nick.alcock@oracle.com Subject: Re: [POC 4/5] sframe: add an SFrame format stack tracer Message-ID: <20230501190018.24ae7704@gandalf.local.home> In-Reply-To: <20230501200410.3973453-5-indu.bhagat@oracle.com> References: <20230501200410.3973453-1-indu.bhagat@oracle.com> <20230501200410.3973453-5-indu.bhagat@oracle.com> X-Mailer: Claws Mail 3.17.8 (GTK+ 2.24.33; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-toolchains@vger.kernel.org On Mon, 1 May 2023 13:04:09 -0700 Indu Bhagat wrote: > This patch adds an SFrame format based stack tracer. > > The files iterate_phdr.c, iterate_phdr.h implement a dl_iterate_phdr() > like functionality. > > The SFrame format based stack tracer is implemented in the > sframe_unwind.c with architecture specific bits in the > arch/arm64/include/asm/sframe_regs.h and > arch/x86/include/asm/sframe_regs.h. Please note that the SFrame format > is supported for x86_64 (AMD64 ABI) and aarch64 (AAPCS64 ABI) only at > this time. > > The files sframe_state.[ch] implement the SFrame state management APIs. > > Some aspects of the implementation are "POC like". These will need to > addressed for the implementation to become more palatable: > - dealing with only Elf64_Phdr (no Elf32_Phdr) at this time, and other > TODOs in the iterate_phdr.c, > - detecting whether a program did a dlopen/dlclose, > - code stubs around user space memory access (.sframe section, ELF hdr > etc.) by the kernel need careful review. > > There are more aspects than above; The intention of this patch set is to > help drive the discussion on how to best incorporate an SFrame-based user > space unwinder in the kernel. > > Signed-off-by: Indu Bhagat > --- > arch/arm64/include/asm/sframe_regs.h | 37 +++ > arch/x86/include/asm/sframe_regs.h | 34 +++ > include/sframe/sframe_regs.h | 11 + > include/sframe/sframe_unwind.h | 62 ++++ > lib/sframe/Makefile | 8 +- > lib/sframe/iterate_phdr.c | 113 +++++++ > lib/sframe/iterate_phdr.h | 34 +++ > lib/sframe/sframe_state.c | 424 +++++++++++++++++++++++++++ > lib/sframe/sframe_state.h | 80 +++++ > lib/sframe/sframe_unwind.c | 208 +++++++++++++ > 10 files changed, 1010 insertions(+), 1 deletion(-) > create mode 100644 arch/arm64/include/asm/sframe_regs.h > create mode 100644 arch/x86/include/asm/sframe_regs.h > create mode 100644 include/sframe/sframe_regs.h > create mode 100644 include/sframe/sframe_unwind.h > create mode 100644 lib/sframe/iterate_phdr.c > create mode 100644 lib/sframe/iterate_phdr.h > create mode 100644 lib/sframe/sframe_state.c > create mode 100644 lib/sframe/sframe_state.h > create mode 100644 lib/sframe/sframe_unwind.c > > diff --git a/arch/arm64/include/asm/sframe_regs.h b/arch/arm64/include/asm/sframe_regs.h > new file mode 100644 > index 000000000000..ae9ab9d5d3c1 > --- /dev/null > +++ b/arch/arm64/include/asm/sframe_regs.h > @@ -0,0 +1,37 @@ > +/* SPDX-License-Identifier: GPL-2.0-or-later */ > +/* > + * Copyright (C) 2023, Oracle and/or its affiliates. > + */ > + > +#ifdef ASM_ARM64_SFRAME_REGS_H > +#define ASM_ARM64_SFRAME_REGS_H > + > +#define STACK_ACCESS_LEN 8 > + > +static inline uint64_t > +get_ptregs_ip(struct pt_regs *regs) > +{ > + return regs->pc; > +} > + > +static inline uint64_t > +get_ptregs_sp(struct pt_regs *regs) > +{ > + return regs->sp; > +} > + > +static inline uint64_t > +get_ptregs_fp(struct pt_regs *regs) > +{ > +#define UNWIND_AARCH64_X29 29 /* 64-bit frame pointer. */ > + return (uint64_t)regs->regs[UNWIND_AARCH64_X29]; > +} > + > +static inline uint64_t > +get_ptregs_ra(struct pt_regs *regs) > +{ > +#define UNWIND_AARCH64_X30 30 /* 64-bit link pointer. */ > + return (uint64_t)regs->regs[UNWIND_AARCH64_X30]; > +} > + > +#endif /* ASM_ARM64_SFRAME_REGS_H */ > diff --git a/arch/x86/include/asm/sframe_regs.h b/arch/x86/include/asm/sframe_regs.h > new file mode 100644 > index 000000000000..99f84955854a > --- /dev/null > +++ b/arch/x86/include/asm/sframe_regs.h > @@ -0,0 +1,34 @@ > +/* SPDX-License-Identifier: GPL-2.0-or-later */ > +/* > + * Copyright (C) 2023, Oracle and/or its affiliates. > + */ > + > +#ifndef ASM_X86_SFRAME_REGS_H > +#define ASM_X86_SFRAME_REGS_H > + > +#define STACK_ACCESS_LEN 8 > + > +static inline uint64_t > +get_ptregs_ip(struct pt_regs *regs) > +{ > + return (uint64_t)regs->ip; > +} > + > +static inline uint64_t > +get_ptregs_sp(struct pt_regs *regs) > +{ > + return (uint64_t)regs->sp; > +} > + > +static inline uint64_t > +get_ptregs_fp(struct pt_regs *regs) > +{ > + return (uint64_t)regs->bp; > +} > + > +static inline uint64_t > +get_ptregs_ra(struct pt_regs *regs) > +{ > + return 0; /* SFRAME_CFA_FIXED_RA_INVALID */ > +} > +#endif /* ASM_X86_SFRAME_REGS_H */ > diff --git a/include/sframe/sframe_regs.h b/include/sframe/sframe_regs.h > new file mode 100644 > index 000000000000..32b67f7a7c78 > --- /dev/null > +++ b/include/sframe/sframe_regs.h > @@ -0,0 +1,11 @@ > +/* SPDX-License-Identifier: GPL-2.0-or-later */ > +/* > + * Copyright (C) 2023, Oracle and/or its affiliates. > + */ > + > +#ifndef _SFRAME_REGS_H > +#define _SFRAME_REGS_H > + > +#include > + > +#endif /* _SFRAME_REGS_H */ > diff --git a/include/sframe/sframe_unwind.h b/include/sframe/sframe_unwind.h > new file mode 100644 > index 000000000000..3e2c12816b60 > --- /dev/null > +++ b/include/sframe/sframe_unwind.h Also, these should probably go into include/linux, Unless there's going to be a lot more header files. > @@ -0,0 +1,62 @@ > +/* SPDX-License-Identifier: GPL-2.0-or-later */ > +/* > + * Copyright (C) 2023, Oracle and/or its affiliates. > + */ > + > +#ifndef _SFRAME_UNWIND_H > +#define _SFRAME_UNWIND_H > + > +#include > +#include > + > +#define PT_GNU_SFRAME 0x6474e554 > + > +/* > + * State used for SFrame-based stack tracing for a user space task. > + */ > +struct user_unwind_state { > + uint64_t pc, sp, fp, ra; I know this is POC, but please make each structure field a separate item. Also, should be tab delimited. > + enum stack_type stype; > + struct task_struct *task; > + bool error; > +}; Also swap the task and the stype, as the pointer to the task will create a hole in the structure. struct user_unwind_state { uint64_t pc; uint64_t sp; uint64_t fp; uint64_t ra; struct task_stuct *task; enum stack_type stype; bool error; }; > + > +/* > + * APIs for an SFrame based stack tracer. > + */ > + > +void sframe_unwind_start(struct user_unwind_state *state, > + struct task_struct *task, struct pt_regs *regs); > +bool sframe_unwind_next_frame(struct user_unwind_state *state); > +uint64_t sframe_unwind_get_return_address(struct user_unwind_state *state); > + > +static inline bool sframe_unwind_done(struct user_unwind_state *state) > +{ > + return state->stype == STACK_TYPE_UNKNOWN; > +} > + > +static inline bool sframe_unwind_error(struct user_unwind_state *state) > +{ > + return state->error; > +} > + > +/* > + * APIs to manage the SFrame state per task for stack tracing. > + */ > + > +extern struct sframe_state *unwind_sframe_state_alloc(struct task_struct *task); > +extern int unwind_sframe_state_update(struct task_struct *task); > +extern void unwind_sframe_state_cleanup(struct task_struct *task); > + > +extern bool unwind_sframe_state_valid_p(struct sframe_state *sfstate); > +extern bool unwind_sframe_state_ready_p(struct sframe_state *sftate); > + > +/* > + * Get the callchain using SFrame unwind info for the given task. > + */ > +extern int > +sframe_callchain_user(struct task_struct *task, > + struct perf_callchain_entry_ctx *entry, > + struct pt_regs *regs); I plan on using this without any perf involvement, I'd like to keep perf separate from the sframe logic. As I mentioned in a previous email, I expect sframe to have callbacks. So the callchain format should be defined by sframe, and not reuse perf. > + > +#endif /* _SFRAME_UNWIND_H */ > diff --git a/lib/sframe/Makefile b/lib/sframe/Makefile > index 4e4291d9294f..5ee9e3e7ec93 100644 > --- a/lib/sframe/Makefile > +++ b/lib/sframe/Makefile > @@ -1,5 +1,11 @@ > # SPDX-License-Identifier: GPL-2.0 > ################################## > -obj-$(CONFIG_USER_UNWINDER_SFRAME) += sframe_read.o \ > +obj-$(CONFIG_USER_UNWINDER_SFRAME) += iterate_phdr.o \ > + sframe_read.o \ > + sframe_state.o \ > + sframe_unwind.o Ah, the backslash is fixed here. > > +CFLAGS_iterate_phdr.o += -I $(srctree)/lib/sframe/ -Wno-error=declaration-after-statement > CFLAGS_sframe_read.o += -I $(srctree)/lib/sframe/ > +CFLAGS_sframe_state.o += -I $(srctree)/lib/sframe/ > +CFLAGS_sframe_unwind.o += -I $(srctree)/lib/sframe/ > diff --git a/lib/sframe/iterate_phdr.c b/lib/sframe/iterate_phdr.c > new file mode 100644 > index 000000000000..c10d590ecc67 > --- /dev/null > +++ b/lib/sframe/iterate_phdr.c > @@ -0,0 +1,113 @@ > +// SPDX-License-Identifier: GPL-2.0-or-later > +/* > + * Copyright (C) 2023, Oracle and/or its affiliates. > + */ > + > +#include > +#include > +#include > +#include > + > +#include "iterate_phdr.h" > + > +/* > + * Iterate over the task's memory mappings and find the ELF headers. > + * > + * This is expected to be called from perf_callchain_user(), so user process > + * context is expected. My thought is that this will be called in the ptrace path (not the perf path), so yes, it will be in user process context. > + */ > + > +int iterate_phdr(int (*callback)(struct phdr_info *info, > + struct task_struct *task, > + void *data), > + struct task_struct *task, void *data) > +{ > + struct mm_struct *mm; > + struct vm_area_struct *vma_mt; > + struct page *page; > + > + Elf64_Ehdr *ehdr; > + struct phdr_info phinfo; > + > + int ret = 0, res = 0; > + int err = 0; > + bool first = true; > + > + memset(&phinfo, 0, sizeof(struct phdr_info)); > + > + mm = task->mm; > + > + MA_STATE(mas, &mm->mm_mt, 0, 0); > + So this is the code I want to discuss at LSFMM :-) As there will be more experts about this than what I know. Let me go and start making the infrastructure to encompass this. -- Steve > + mas_for_each(&mas, vma_mt, ULONG_MAX) { > + /* ELF header has a fixed place in the file, starting at offset > + * zero. > + */ > + if (vma_mt->vm_pgoff) > + continue; > + > + /* For the callback to infer if its the prog or DSO we are > + * dealing with. > + */ > + phinfo.pi_prog = first; > + first = false; > + /* FIXME TODO > + * - This code assumes 64-bit ELF by using Elf64_Ehdr. > + * - Detect the case when ELF program headers to be of > + * size > 1 page. > + */ > + > + /* FIXME TODO KERNEL > + * - get_user_pages_WHAT, which API. > + * What flags ? Is this correct ? > + */ > + ret = get_user_pages_remote(mm, vma_mt->vm_start, 1, FOLL_GET, > + &page, &vma_mt, NULL); > + if (ret <= 0) > + continue; > + > + /* The first page must have the ELF header. */ > + ehdr = vmap(&page, 1, VM_MAP, PAGE_KERNEL); > + if (!ehdr) > + goto put_page; > + > + /* Check for magic bytes to make sure this is ehdr. */ > + err = 0; > + err |= ((ehdr->e_ident[EI_MAG0] != ELFMAG0) > + || (ehdr->e_ident[EI_MAG1] != ELFMAG1) > + || (ehdr->e_ident[EI_MAG2] != ELFMAG2) > + || (ehdr->e_ident[EI_MAG3] != ELFMAG3)); > + if (err) > + goto unmap; > + > + /* > + * FIXME TODO handle the case when number of program headers is > + * greater than or equal to PN_XNUM later. > + */ > + if (ehdr->e_phnum == PN_XNUM) > + goto unmap; > + /* > + * FIXME TODO handle the case when Elf phdrs span more than one > + * page later ? > + */ > + if ((sizeof(Elf64_Ehdr) + ehdr->e_phentsize * ehdr->e_phnum) > + > PAGE_SIZE) > + goto unmap; > + > + /* Save the location of program headers and the phnum. */ > + phinfo.pi_addr = vma_mt->vm_start; > + phinfo.pi_phdr = (void *)ehdr + ehdr->e_phoff; > + phinfo.pi_phnum = ehdr->e_phnum; > + > + res = callback(&phinfo, task, data); > +unmap: > + vunmap(ehdr); > +put_page: > + put_page(page); > + > + if (res < 0) > + break; > + } > + > + return res; > +} >