From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D48C4C54FB9 for ; Tue, 21 Nov 2023 09:03:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=e7ECQqWBN2fJrgwOKGyy43dkVsvx9YUWQ37baKs0FNI=; b=YVzt3CrPgS0kfI 1qKRWngUpCZ6t9wX9mgpxYVAaR7fbMJabxGLAJByuv3hoHLl87RoN5NQtaAWplH7KpI2bWT10seEu +qOYABjk3WU6oUtB/xM/vsf4TUmeXFOAUL6HTksNv6OttklGZ0SA1bH6HBcbIzKKNHNo8zal4Win4 iUl1hpcXv+yhsz27ip3fak2RE8o+B+Nx/J6Pkn9a0HfrlaHvQSchE7Gws5Ptd+IhXzfV2uYff6Jve wvT5erPLBznh787jIQpomdRj3qXyCmwHGwc86KNDS9zjPf8vgT8KgqrXEHwl5XZGPMkDxxNE8mwJU AHAj1NXimEkVsqU5h0uA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1r5Mf0-00G4Gy-2w; Tue, 21 Nov 2023 09:03:10 +0000 Received: from mail-pl1-x633.google.com ([2607:f8b0:4864:20::633]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1r5Mew-00G4GM-2u for linux-arm-kernel@lists.infradead.org; Tue, 21 Nov 2023 09:03:09 +0000 Received: by mail-pl1-x633.google.com with SMTP id d9443c01a7336-1cc2575dfc7so37373235ad.1 for ; Tue, 21 Nov 2023 01:03:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1700557384; x=1701162184; darn=lists.infradead.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=Q5SCcVPcA9NKrByS121TAkfRvdi1lQr2Qu3W04vx7Xs=; b=c9TqFVs5Z9bTo2Mtg94VLEgpvIxAdNfRtJdm/jDq73Yjw0njCwfD5odWZfsfj/r+yx p3fZiGWoVqRhmKdPgukavEYQ4d24vm0z5nEd5ALApY58ejU1WWWyLd8QutUSXpjMN9DX ox8dDjfkUGoLHTad08Au3Gz8hkMM3PbBz5ZQJ4AVGbusVSWrmx0egaZzeoESpOUge3fb 18plPjwWW78Saaom7uaBxj1/RDpcIxZwQMXyfAP66zz8Y/V7K2IDBGzcnZHigtkzjiLP Bnh2payLxkCKCyDCMu4KYPGhmizqwYIa215F6gLxFEU0ckoAF/9a7jarA/b/DusK74rH BtIQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700557384; x=1701162184; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=Q5SCcVPcA9NKrByS121TAkfRvdi1lQr2Qu3W04vx7Xs=; b=h/S1NVo6j7ElEFfByaeOGT+9/KO4STOEKlQHigyK4/3aoyUY3I+0vyKvyoeL/W6EeH MaoYCrHbkCp1WOomBbPWOoaJVgmdae9EVQ2TDXhvgXzAtWx7kZ3rXGIGajvmdUiPMr6e ZD5JgUm0tV2Xo9iXnO1+J3eDC3o7/756DP5Wv0MaZKekvFDxT4IUeVev5SBOCSg15zWq mQMNRr9CFOaPT6t1vnm7MQnO1cTreVXFu8Zd1awOYNtkuJOo6u+Pqg1sKEvKEv1i3Q3M DAptHr6pM1PKGCW0WpMl2lc9C5GjRXcjr2UUTYP1h1gY9H4031vy3mVzbtlMXTBut/Jn 7qhw== X-Gm-Message-State: AOJu0Yy6SzM/4pA4L4BGpNYVLEyDRgN9gnQqIZkDAU4EdYal8pbT7XJ/ lXYSXUcgqCbZx/cCb3Lx3suM37zYD3A= X-Google-Smtp-Source: AGHT+IEbRiW8XYHrQMNpI8axel6RN3Zr0KB79Eo6FNdY6FJYWqRPIUqSWM5iDq1QpPe9aGxqXRSV3w== X-Received: by 2002:a17:903:453:b0:1cf:5d4f:1981 with SMTP id iw19-20020a170903045300b001cf5d4f1981mr5449683plb.36.1700557383923; Tue, 21 Nov 2023 01:03:03 -0800 (PST) Received: from localhost ([107.155.12.245]) by smtp.gmail.com with ESMTPSA id c4-20020a170902c1c400b001ce5f0de726sm7193060plc.70.2023.11.21.01.03.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Nov 2023 01:03:03 -0800 (PST) Date: Tue, 21 Nov 2023 17:03:00 +0800 From: chenqiwu To: Mark Rutland Cc: catalin.marinas@arm.com, will@kernel.org, kaleshsingh@google.com, ardb@kernel.org, mhiramat@kernel.org, linux-arm-kernel@lists.infradead.org Subject: Re: [PATCH] arm64: Add user stacktrace support Message-ID: <20231121090300.GA5120@rlk> References: <20231118134504.154842-1-qiwu.chen@transsion.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20231121_010306_939936_16A887C3 X-CRM114-Status: GOOD ( 64.33 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Mon, Nov 20, 2023 at 02:29:17PM +0000, Mark Rutland wrote: > On Sat, Nov 18, 2023 at 09:45:04PM +0800, qiwuchen55@gmail.com wrote: > > From: chenqiwu > > > > 1. Introduce and export arch_dump_user_stacktrace() API to support > > user stacktrace dump for a user task (both current and non-current task). > > A example test about the log format of user stacktrace as shown below: > > [test-515] Dump user backtrace: > > <0xffffb0c1a750> in /lib/aarch64-linux-gnu/libc-2.32.so[ffffb0b53000-ffffb0cb1000] > > <0xaaaacbf8097c> in /mnt/test[aaaacbf80000-aaaacbf81000] > > <0xffffb0b778b8> in /lib/aarch64-linux-gnu/libc-2.32.so[ffffb0b53000-ffffb0cb1000] > > <0xaaaacbf80834> in /mnt/test[aaaacbf80000-aaaacbf81000] > > Where is this used? > It's used in kernel space for some case need user backtrace for debugging, such as: https://lkml.org/lkml/2023/11/9/1365 > We already have user stacktracing code in arch/arm64/kernel/perf_callchain.c > which doesn't depend on this API. What does this API enable that we don't > support today? > Sorry, I indeed ignored this case, but it seems only work for perf syscall chain in case of CONFIG_PERF_EVENTS enabled without universality. It's supposed to introduce a common API for dumping user backtrace in kernel space. > > 2. Add arch_stack_walk_user() implementation to support userstacktrace transsionce option. > > What is this 'userstacktrace transsionce option' ? > > > A example test about the output format of ftrace userstacktrace as shown below: > > bash-489 [000] ..... 2167.660775: sched_process_fork: comm=bash pid=489 child_comm=bash child_pid=596 > > bash-489 [000] ..... 2167.660787: > > => /lib/aarch64-linux-gnu/libc-2.32.so[+0xa76d8] > > => /bin/bash[+0x5f354] > > => /bin/bash[+0x4876c] > > => /bin/bash[+0x4aec4] > > => /bin/bash[+0x4da48] > > => /bin/bash[+0x4b710] > > => /bin/bash[+0x4c31c] > > => /bin/bash[+0x339b0] > > > > Tested-by-by: chenqiwu > > Signed-off-by: chenqiwu > > --- > > arch/arm64/Kconfig | 1 + > > arch/arm64/kernel/stacktrace.c | 208 +++++++++++++++++++++++++++++++++ > > include/linux/stacktrace.h | 10 ++ > > 3 files changed, 219 insertions(+) > > As above, we already have user stacktracing code, and we shouldn't add > *distinct* unwinders. Either that code should be factored out and reused, or > this code should replace it. > Currently, ARM64 platform is not supported for ftrace userstacktrace profile feature, since CONFIG_USER_STACKTRACE_SUPPORT is not enabled, the call chain cannot be accessed: ftrace_trace_userstack -> stack_trace_save_user -> arch_stack_walk_user > > > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > > index 7b071a004..4c5066f88 100644 > > --- a/arch/arm64/Kconfig > > +++ b/arch/arm64/Kconfig > > @@ -255,6 +255,7 @@ config ARM64 > > select TRACE_IRQFLAGS_SUPPORT > > select TRACE_IRQFLAGS_NMI_SUPPORT > > select HAVE_SOFTIRQ_ON_OWN_STACK > > + select USER_STACKTRACE_SUPPORT > > help > > ARM 64-bit (AArch64) Linux support. > > > > diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c > > index 17f66a74c..4e7bf2922 100644 > > --- a/arch/arm64/kernel/stacktrace.c > > +++ b/arch/arm64/kernel/stacktrace.c > > @@ -215,6 +215,214 @@ static bool dump_backtrace_entry(void *arg, unsigned long where) > > return true; > > } > > > > +/* The struct defined for AArch64 userspace stack frame */ > > +struct stack_frame_user { > > + unsigned long fp; > > + unsigned long sp; > > + unsigned long pc; > > +}; > > + > > +/* > > + * The function of AArch64 userspace stack frame unwind method. > > + * Note: If the caller is not current task, it's supposed to call > > + * access_process_vm() to access another task' address space. > > + */ > > +static int arch_unwind_user_frame(struct task_struct *tsk, unsigned long high, > > + struct stack_frame_user *frame) > > +{ > > + int ret = 0; > > + unsigned long fp = frame->fp; > > + unsigned long low = frame->sp; > > + > > + if (fp < low || fp > high || fp & 0xf) > > + return -EFAULT; > > + > > + frame->sp = fp + 0x10; > > Given you always set frame->sp as fp + 0x10, why does frame->sp need to exist > at all? > frame->sp refer to the bottom of stack VMA, which increased at least 0x10 on every entry of arch_unwind_user_frame. A vaild frame->fp is suppoed to between upper and lower limit of task stack VMA. > Per AAPCS64, the frame record only conatins a copy of the FP and LR, and is > *not* directly associated with the SP, so I don't think we should pretend it > is. > > > + /* Disable page fault to make sure get_user going on wheels */ > > I have no idea what this comment is trying to say. > > Why exactly you you think we need to disable page faults? Isn't that going to > make this fail arbitrarily when we *can* fault pages in? I know that the > existing perf unwinder does this, but that's a design problem we'd like to > solve (e.g. by deferring the unwind until return to userspace). > Hhmm, I refer to the design of existing user unwinder. User access methods will not sleep when called from a pagefault_disabled(), so the get_user can be acceed in atomic context. > > + pagefault_disable(); > > + if (tsk == current) { > > + if (get_user(frame->fp, (unsigned long __user *)fp) || > > + get_user(frame->pc, (unsigned long __user *)(fp + 8))) > > + ret = -EFAULT; > > + } else { > > + if (access_process_vm(tsk, fp, &frame->fp, > > + sizeof(unsigned long), 0) != sizeof(unsigned long) || > > + access_process_vm(tsk, fp + 0x08, &frame->pc, > > + sizeof(unsigned long), 0) != sizeof(unsigned long)) > > + ret = -EFAULT; > > + } > > + pagefault_enable(); > > If task isn't current, userspace could be running and this will be racy and > unreliable. > > Where is this used with task != current? Why do we need to support that case at > all? > It's my idea to support the case for caller who really want to dump another task's backtrace. Not sure the racy and unreliablity since access_process_vm call get_task_mm and mmap_read_lock to pin the task's address space. We can see access_process_vm is safely used in get_cmdline() case. > What does this do for COMPAT tasks? > This patch is not covered COMPAT task unwinder yet. > > + > > + return ret; > > +} > > + > > +/* > > + * Print the executable address and corresponding VMA info. > > + */ > > +static void print_vma_addr_info(char *prefix, struct task_struct *task, > > + unsigned long ip, const char *loglvl) > > +{ > > + struct mm_struct *mm; > > + struct vm_area_struct *vma; > > + > > + if (task != current) > > + mm = get_task_mm(task); > > + else > > + mm = task->mm; > > Why can't we always use get_task_mm(), even for task == current? > get_task_mm increase the task's mm_users which refers to the number of users who access the mm in order to pin the task's mm. I think it's meaningless to use get_task_mm for task == current since task will not decrease its mm_users before calling do_exit -> exit_mm. > > + > > + if (!mm) > > + return; > > + /* > > + * we might be running from an atomic context so we cannot sleep > > + */ > > + if (!mmap_read_trylock(mm)) { > > + mmput(mm); > > + return; > > + } > > When is this called from an atomic context? > User who call it in an atomic context such as interrupt context. > > + > > + vma = find_vma(mm, ip); > > + if (vma && vma->vm_file) { > > + struct file *f = vma->vm_file; > > + char *buf = (char *)__get_free_page(GFP_NOWAIT); > > + > > + if (buf) { > > + char *p; > > + > > + p = file_path(f, buf, PAGE_SIZE); > > + if (IS_ERR(p)) > > + p = "?"; > > + printk("%s%s%s[%lx-%lx]\n", loglvl, prefix, p, > > + vma->vm_start, > > + vma->vm_end); > > + free_page((unsigned long)buf); > > + } > > + } > > + mmap_read_unlock(mm); > > + if (task != current) > > + mmput(mm); > > +} > > + > > +static struct vm_area_struct *find_user_stack_vma(struct task_struct *task, unsigned long sp) > > +{ > > + struct mm_struct *mm; > > + struct vm_area_struct *vma; > > + > > + if (task != current) > > + mm = get_task_mm(task); > > + else > > + mm = task->mm; > > + > > + if (!mm) > > + return NULL; > > + /* > > + * we might be running from an atomic context so we cannot sleep > > + */ > > + if (!mmap_read_trylock(mm)) { > > + mmput(mm); > > + return NULL; > > + } > > + vma = find_vma(mm, sp); > > + mmap_read_unlock(mm); > > + if (task != current) > > + mmput(mm); > > What guarantees the VMA is safe to use after this? What ensures that it won't > be freed? What ensures that it is still valid and not subject to concurrent > modification? > get_task_mm and mmap_read_trylock will ensure the VMA safe to use, the reliazation of two APIs refer to print_vma_addr() in mm/memory.c. > > + > > + return vma; > > +} > > + > > +static void dump_user_backtrace_entry(struct task_struct *tsk, > > + unsigned long where, const char *loglvl) > > +{ > > + char prefix[64]; > > + > > + snprintf(prefix, sizeof(prefix), "<0x%lx> in ", where); > > + print_vma_addr_info(prefix, tsk, where, loglvl); > > +} > > + > > +void arch_dump_user_stacktrace(struct pt_regs *regs, struct task_struct *tsk, > > + const char *loglvl) > > +{ > > + struct stack_frame_user frame; > > + struct vm_area_struct *vma; > > + unsigned long userstack_start, userstack_end; > > + > > + if (!tsk) > > + tsk = current; > > + > > + /* > > + * If @regs is not specified or caller is not current task,. > > + * @regs is supposed to get from @tsk. > > + */ > > + if (!regs || tsk != current) > > + regs = task_pt_regs(tsk); > > The user state is *always* in task_pt_regs(tsk), even when tsk == current. > > Why does this function take the regs as an argument at all? > The API export the two argument(regs and tsk) for caller, we must make legitimacy judgments to aviod the caller passed unreasonable arguments. > > + > > + /* TODO: support stack unwind for compat user mode */ > > + if (compat_user_mode(regs)) > > + return; > > + > > + userstack_start = regs->user_regs.sp; > > + vma = find_user_stack_vma(tsk, userstack_start); > > + if (!vma) > > + return; > > + > > + userstack_end = vma->vm_end; > > + frame.fp = regs->user_regs.regs[29]; > > + frame.sp = userstack_start; > > + frame.pc = regs->user_regs.pc; > > + > > + printk("%s[%s-%d] Dump user backtrace:\n", loglvl, tsk->comm, tsk->pid); > > + while (1) { > > + unsigned long where = frame.pc; > > + > > + if (!where || where & 0x3) > > + break; > > + dump_user_backtrace_entry(tsk, where, loglvl); > > + if (arch_unwind_user_frame(tsk, userstack_end, &frame) < 0) > > + break; > > + } > > +} > > +EXPORT_SYMBOL_GPL(arch_dump_user_stacktrace); > > Where is this used from? > > Why should it be exported? > As replied at front, the API is supposed to be used by kernel space such as kernel modules. > > + > > +/** > > + * stack_trace_save_user - Save user space stack traces into a storage array > > + * @consume_entry: Callback for save a user space stack trace > > + * @cookie: Caller supplied pointer handed back by arch_stack_walk() > > + * @regs: The pt_regs pointer of current task > > + */ > > +void arch_stack_walk_user(stack_trace_consume_fn consume_entry, void *cookie, > > + const struct pt_regs *regs) > > +{ > > + struct stack_frame_user frame; > > + struct vm_area_struct *vma; > > + unsigned long userstack_start, userstack_end; > > + struct task_struct *tsk = current; > > + > > + /* TODO: support stack unwind for compat user mode */ > > + if (!regs || !user_mode(regs) || compat_user_mode(regs)) > > + return; > > + > > + userstack_start = regs->user_regs.sp; > > + vma = find_user_stack_vma(tsk, userstack_start); > > + if (!vma) > > Yet again this duplicates the code above. > > If we really need this, then arch_stack_walk_user() should be the real > unwinder, and the caes above should be built atop arch_stack_walk_user(). > > > + return; > > + > > + userstack_end = vma->vm_end; > > + frame.fp = regs->user_regs.regs[29]; > > + frame.sp = userstack_start; > > + frame.pc = regs->user_regs.pc; > > + > > + while (1) { > > + unsigned long where = frame.pc; > > + > > + /* Sanity check: ABI requires pc to be aligned 4 bytes. */ > > + if (!where || where & 0x3) > > + break; > > Why do we care whether the PC is valid? > If pc is invaild, it's meaningless to unwind whole unwind, just skip unwinding. > There are plenty of other things that we could check (e.g. whether this points > to executable memory), but it seems kinda pointless to care beyond whether we > can unwind the frame. > > Note that we're missing the LR anyway, so this *isn't* a reliable unwind. > The frame.pc is used to record LR, sanity check for lr and fp will make a reliable unwind since we can safely use print_vma_addr_info() to transfer LR addr to VMA. > Thanks, > Mark. > > > + if (!consume_entry(cookie, where)) > > + break; > > + if (arch_unwind_user_frame(tsk, userstack_end, &frame) < 0) > > + break; > > + } > > +} > > + > > void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk, > > const char *loglvl) > > { > > diff --git a/include/linux/stacktrace.h b/include/linux/stacktrace.h > > index 97455880a..bc5a7bf56 100644 > > --- a/include/linux/stacktrace.h > > +++ b/include/linux/stacktrace.h > > @@ -60,6 +60,16 @@ int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry, void *cookie, > > > > void arch_stack_walk_user(stack_trace_consume_fn consume_entry, void *cookie, > > const struct pt_regs *regs); > > + > > +/** > > + * arch_dump_user_stacktrace - Architecture specific function to dump the > > + * stack trace for user process > > + * @regs: Pointer to the pt_regs of user process > > + * @tsk: Pointer to the task_struct of user process > > + * @loglvl: Log level > > + */ > > +void arch_dump_user_stacktrace(struct pt_regs *regs, struct task_struct *tsk, > > + const char *loglvl); > > #endif /* CONFIG_ARCH_STACKWALK */ > > > > #ifdef CONFIG_STACKTRACE > > -- > > 2.25.1 > > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel