From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id BCE1DC61D97
	for <linux-arm-kernel@archiver.kernel.org>; Fri, 24 Nov 2023 09:56:07 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
	d=lists.infradead.org; s=bombadil.20210309; h=Sender:
	Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post:
	List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References:
	Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description:
	Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:
	List-Owner; bh=GKG3q5pJ52/rrsSbAaThwkMIPwpdMrQEYsbFRuFisyk=; b=vXxAK589ktkq87
	Z3zzV+YBszYmVsk7JtyHnjq8iNRQkdljo5/Yet/OZz4QC31FnAgfK7q2q5fuQHYS/mgv6Fchy80dW
	EWU9XtgY9TXpGwG9kdnK+EumpT8YEDiTv6XNNtT2xL9i9qqFemJuziVmeodlP4DMphuRgXn4oahMH
	wR1gZp2ocFn7Bir33bLNsBL3GHhtqYcoHLMGIhuWUv+n0VMJLjXSHUeoRWAKz/G3CJn0wqH//+g/U
	luDMKkwH7yumWuoSGi5AwXUntvjTGCWOdYeuZ26jc3mszVbuTwxGFDUenX4EeM2+9xwbysLd+2/Qb
	nH0815D3mbZr/r+i5E1g==;
Received: from localhost ([::1] helo=bombadil.infradead.org)
	by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux))
	id 1r6SuP-006kj7-2o;
	Fri, 24 Nov 2023 09:55:37 +0000
Received: from mail-pg1-x536.google.com ([2607:f8b0:4864:20::536])
	by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux))
	id 1r6SuL-006khx-0V
	for linux-arm-kernel@lists.infradead.org;
	Fri, 24 Nov 2023 09:55:36 +0000
Received: by mail-pg1-x536.google.com with SMTP id 41be03b00d2f7-5bd6ac9833fso1092428a12.0
        for <linux-arm-kernel@lists.infradead.org>; Fri, 24 Nov 2023 01:55:31 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1700819731; x=1701424531; darn=lists.infradead.org;
        h=in-reply-to:content-disposition:mime-version:references:message-id
         :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to;
        bh=251Z7Af3NLIoGLiHPFqrUPzSEoQ3xgDvOVSbnIRRwZo=;
        b=m8nojK9wINlCZrDjBM4ZvLofFfaEJiaHUOHD81YcxhUWexYL+KBUcy8wFe44qRGwKp
         FNFBE0jUJ6520n+/UhjzFRhR6+egujyahhwMxc4WHR9P9gEY8hDbzK61EkLiYA458g2Y
         jmNnucJlyNqmXqZkqKakyKODiOwTJX9ONZuQyS3GTnZ29aFnUd/arFV3oa6JDjOTv21E
         KhFrzYKIeouLvU/g34/vSdNQs8FJHgFxCTuxLJLlJmmU5SVKiYPuRDzchceINoPexD45
         0qhxYKea+YQrItPpCnGZKUs1QS2+BztMOwlNtzVyofErmvmXhB5IkNWvFecd4U9Sh7sB
         zMOQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1700819731; x=1701424531;
        h=in-reply-to:content-disposition:mime-version:references:message-id
         :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date
         :message-id:reply-to;
        bh=251Z7Af3NLIoGLiHPFqrUPzSEoQ3xgDvOVSbnIRRwZo=;
        b=Pi57VEqgC8Vmq/HeirSbaRhaEhmleEb9GPLOin/SM/dvDghOMPNmSV6fTBY+UDEL/N
         R7gDNtvnDsvlrtKkgB1f2vq4pthOJa7O6ZytC/oRTVkFFO3hMKC1EAvZzUQNyFWgDzml
         /ssMYxF6aetkYrkH6VF8f+srA+hd9jGR2cXp1lETDdiCnNv7p0RQun5wKJhIfAAA2YO3
         MNxDFJv6pn/jJx9IhW3qlxsV8LzPg9TMPsylwYx8PmPgGJN/a493ZvA8Vfvop0tLMvxT
         A9mNaqiyz99sIUv3y6eRoBkvjM0XAZkCD1hGGwvI7NWGbTvgx/5t82IVulbVYtUTsc8p
         NcDQ==
X-Gm-Message-State: AOJu0YzFjVM7Mqrht1u9BW0xXv+oC3INE4I9TA04yRKtvmNPS2mTbAOj
	jh201n72GB6A1M+eyP1Zn/ckTM12EPlwSw==
X-Google-Smtp-Source: AGHT+IFtFkf/LIZKfaIzw2bRCMNY21E5hoFCQNy7l8zj57xXU/4/Kp4aKZdfq3e5c4W0/RCy0Q7IXg==
X-Received: by 2002:a05:6a20:144a:b0:18b:4226:5ffa with SMTP id a10-20020a056a20144a00b0018b42265ffamr2393688pzi.15.1700819731013;
        Fri, 24 Nov 2023 01:55:31 -0800 (PST)
Received: from localhost ([107.155.12.245])
        by smtp.gmail.com with ESMTPSA id fe25-20020a056a002f1900b006c34f19c459sm2535975pfb.139.2023.11.24.01.55.29
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Fri, 24 Nov 2023 01:55:30 -0800 (PST)
Date: Fri, 24 Nov 2023 17:55:27 +0800
From: chenqiwu <qiwuchen55@gmail.com>
To: Mark Rutland <mark.rutland@arm.com>
Cc: catalin.marinas@arm.com, will@kernel.org, kaleshsingh@google.com,
	ardb@kernel.org, mhiramat@kernel.org,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH] arm64: Add user stacktrace support
Message-ID: <20231124095527.GA73913@rlk>
References: <20231118134504.154842-1-qiwu.chen@transsion.com>
 <ZVttPU64cCem6szI@FVFF77S0Q05N.cambridge.arm.com>
 <20231121090300.GA5120@rlk>
 <ZVy5KeEgeSuI0asU@FVFF77S0Q05N.cambridge.arm.com>
MIME-Version: 1.0
Content-Disposition: inline
In-Reply-To: <ZVy5KeEgeSuI0asU@FVFF77S0Q05N.cambridge.arm.com>
X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 
X-CRM114-CacheID: sfid-20231124_015533_198211_10A258DB 
X-CRM114-Status: GOOD (  74.96  )
X-BeenThere: linux-arm-kernel@lists.infradead.org
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: <linux-arm-kernel.lists.infradead.org>
List-Unsubscribe: <http://lists.infradead.org/mailman/options/linux-arm-kernel>,
 <mailto:linux-arm-kernel-request@lists.infradead.org?subject=unsubscribe>
List-Archive: <http://lists.infradead.org/pipermail/linux-arm-kernel/>
List-Post: <mailto:linux-arm-kernel@lists.infradead.org>
List-Help: <mailto:linux-arm-kernel-request@lists.infradead.org?subject=help>
List-Subscribe: <http://lists.infradead.org/mailman/listinfo/linux-arm-kernel>,
 <mailto:linux-arm-kernel-request@lists.infradead.org?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org>
Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org

On Tue, Nov 21, 2023 at 02:05:29PM +0000, Mark Rutland wrote:
> On Tue, Nov 21, 2023 at 05:03:00PM +0800, chenqiwu wrote:
> > On Mon, Nov 20, 2023 at 02:29:17PM +0000, Mark Rutland wrote:
> > > On Sat, Nov 18, 2023 at 09:45:04PM +0800, qiwuchen55@gmail.com wrote:
> > > > From: chenqiwu <qiwu.chen@transsion.com>
> > > > 
> > > > 1. Introduce and export arch_dump_user_stacktrace() API to support
> > > > user stacktrace dump for a user task (both current and non-current task).
> > > > A example test about the log format of user stacktrace as shown below:
> > > > [test-515] Dump user backtrace:
> > > > <0xffffb0c1a750> in /lib/aarch64-linux-gnu/libc-2.32.so[ffffb0b53000-ffffb0cb1000]
> > > > <0xaaaacbf8097c> in /mnt/test[aaaacbf80000-aaaacbf81000]
> > > > <0xffffb0b778b8> in /lib/aarch64-linux-gnu/libc-2.32.so[ffffb0b53000-ffffb0cb1000]
> > > > <0xaaaacbf80834> in /mnt/test[aaaacbf80000-aaaacbf81000]
> > > 
> > > Where is this used?
> > > 
> > It's used in kernel space for some case need user backtrace for debugging,
> > such as:  https://lkml.org/lkml/2023/11/9/1365
> 
> That's not in mainline.
>
I konw, you may misslead my intention for introducing the arch_dump_user_stacktrace API.
For this case only dump panic kernel trace any user regs and backtrace info on global
init exit. If arch_dump_user_stacktrace API can be merged into kernel-tree firstly, we
can use the API directly to dump user backtrace on global init exit, which will be helpful
to find the global init exit reason.
BTW, I think there are two significant places for the usage of arch_dump_user_stacktrace:
1) we can dump any important user thread backtrace without log ratelimit in arm64_show_signal(),
e.g: global_init thread
+++ b/arch/arm64/kernel/traps.c
@@ -247,12 +247,14 @@ static void arm64_show_signal(int signo, const char *str)
        unsigned long esr = tsk->thread.fault_code;
        struct pt_regs *regs = task_pt_regs(tsk);

+       if (unlikely(is_global_init(tsk)))
+               goto dump;
        /* Leave if the signal won't be shown */
        if (!show_unhandled_signals ||
            !unhandled_signal(tsk, signo) ||
            !__ratelimit(&rs))
                return;
-
+dump:
        pr_info("%s[%d]: unhandled exception: ", tsk->comm, task_pid_nr(tsk));
        if (esr)
                pr_cont("%s, ESR 0x%016lx, ", esr_get_class_string(esr), esr);
@@ -261,6 +263,7 @@ static void arm64_show_signal(int signo, const char *str)
        print_vma_addr(KERN_CONT " in ", regs->pc);
        pr_cont("\n");
        __show_regs(regs);
+       arch_dump_user_stacktrace(regs);
 }

2) we can dump the user thread's backtrace which blocked in D state in hung_task,
in this way, we can get more clues to find the user thread blocked reason.
+++ b/kernel/hung_task.c
@@ -137,6 +137,8 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
                        init_utsname()->version);
                pr_err("\"echo 0 > /proc/sys/kernel/hung_task_timeout_secs\""
                        " disables this message.\n");
+               if (user_mode(regs))
+                       arch_dump_user_stacktrace(task_pt_regs(t));
                sched_show_task(t);


> > > We already have user stacktracing code in arch/arm64/kernel/perf_callchain.c
> > > which doesn't depend on this API. What does this API enable that we don't
> > > support today?
> > > 
> > Sorry, I indeed ignored this case, but it seems only work for perf syscall chain in
> > case of CONFIG_PERF_EVENTS enabled without universality.
> > It's supposed to introduce a common API for dumping user backtrace in kernel space.
> 
> I understand that; I'm saying that you should *modify* this code to handle both
> cases.
> 
> > > > 2. Add arch_stack_walk_user() implementation to support userstacktrace transsionce option.
> > > 
> > > What is this 'userstacktrace transsionce option' ?
> > > 
> > > > A example test about the output format of ftrace userstacktrace as shown below:
> > > >     bash-489     [000] .....  2167.660775: sched_process_fork: comm=bash pid=489 child_comm=bash child_pid=596
> > > >     bash-489     [000] .....  2167.660787: <user stack trace>
> > > >  => /lib/aarch64-linux-gnu/libc-2.32.so[+0xa76d8]
> > > >  => /bin/bash[+0x5f354]
> > > >  => /bin/bash[+0x4876c]
> > > >  => /bin/bash[+0x4aec4]
> > > >  => /bin/bash[+0x4da48]
> > > >  => /bin/bash[+0x4b710]
> > > >  => /bin/bash[+0x4c31c]
> > > >  => /bin/bash[+0x339b0]
> > > > 
> > > > Tested-by-by: chenqiwu <qiwu.chen@transsion.com>
> > > > Signed-off-by: chenqiwu <qiwu.chen@transsion.com>
> > > > ---
> > > >  arch/arm64/Kconfig             |   1 +
> > > >  arch/arm64/kernel/stacktrace.c | 208 +++++++++++++++++++++++++++++++++
> > > >  include/linux/stacktrace.h     |  10 ++
> > > >  3 files changed, 219 insertions(+)
> > > 
> > > As above, we already have user stacktracing code, and we shouldn't add
> > > *distinct* unwinders. Either that code should be factored out and reused, or
> > > this code should replace it.
> > > 
> > Currently, ARM64 platform is not supported for ftrace userstacktrace profile feature,
> > since CONFIG_USER_STACKTRACE_SUPPORT is not enabled, the call chain cannot be accessed:
> > ftrace_trace_userstack -> stack_trace_save_user -> arch_stack_walk_user
> 
> As above, please reuse the existing code, e.g. take the existing logic in
> perf_callchain_user() and use it to implement arch_stack_walk_user(), and call
> that from perf_callchain_user() an stack_trace_save_user().
> 
Okay, I will develop the stack_trace_save_user() refers to perf_callchain_user() and
resend this patch as V2.

Thanks
Qiwu
> > > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > > > index 7b071a004..4c5066f88 100644
> > > > --- a/arch/arm64/Kconfig
> > > > +++ b/arch/arm64/Kconfig
> > > > @@ -255,6 +255,7 @@ config ARM64
> > > >  	select TRACE_IRQFLAGS_SUPPORT
> > > >  	select TRACE_IRQFLAGS_NMI_SUPPORT
> > > >  	select HAVE_SOFTIRQ_ON_OWN_STACK
> > > > +	select USER_STACKTRACE_SUPPORT
> > > >  	help
> > > >  	  ARM 64-bit (AArch64) Linux support.
> > > >  
> > > > diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
> > > > index 17f66a74c..4e7bf2922 100644
> > > > --- a/arch/arm64/kernel/stacktrace.c
> > > > +++ b/arch/arm64/kernel/stacktrace.c
> > > > @@ -215,6 +215,214 @@ static bool dump_backtrace_entry(void *arg, unsigned long where)
> > > >  	return true;
> > > >  }
> > > >  
> > > > +/* The struct defined for AArch64 userspace stack frame */
> > > > +struct stack_frame_user {
> > > > +	unsigned long fp;
> > > > +	unsigned long sp;
> > > > +	unsigned long pc;
> > > > +};
> > > > +
> > > > +/*
> > > > + * The function of AArch64 userspace stack frame unwind method.
> > > > + * Note: If the caller is not current task, it's supposed to call
> > > > + * access_process_vm() to access another task' address space.
> > > > + */
> > > > +static int arch_unwind_user_frame(struct task_struct *tsk, unsigned long high,
> > > > +				struct stack_frame_user *frame)
> > > > +{
> > > > +	int ret = 0;
> > > > +	unsigned long fp = frame->fp;
> > > > +	unsigned long low = frame->sp;
> > > > +
> > > > +	if (fp < low || fp > high || fp & 0xf)
> > > > +		return -EFAULT;
> > > > +
> > > > +	frame->sp = fp + 0x10;
> > > 
> > > Given you always set frame->sp as fp + 0x10, why does frame->sp need to exist
> > > at all?
> > > 
> > frame->sp refer to the bottom of stack VMA, which increased at least 0x10 on every entry of
> > arch_unwind_user_frame. A vaild frame->fp is suppoed to between upper and lower limit of
> > task stack VMA.
> 
> What you're calling the 'sp' is the top of the last frame record. It has
> nothing to do with the SP, and has nothing to do with any VMA.
> 
> You don't need a separate field for this.
> 
> > 
> > > Per AAPCS64, the frame record only conatins a copy of the FP and LR, and is
> > > *not* directly associated with the SP, so I don't think we should pretend it
> > > is.
> > > 
> > > > +	/* Disable page fault to make sure get_user going on wheels */
> > > 
> > > I have no idea what this comment is trying to say.
> > > 
> > > Why exactly you you think we need to disable page faults? Isn't that going to
> > > make this fail arbitrarily when we *can* fault pages in? I know that the
> > > existing perf unwinder does this, but that's a design problem we'd like to
> > > solve (e.g. by deferring the unwind until return to userspace).
> > > 
> > Hhmm, I refer to the design of existing user unwinder. User access methods will not sleep
> > when called from a pagefault_disabled(), so the get_user can be acceed in atomic context.
> 
> I think the comment should clearly say that, then.
> 
> I was confused by "going on wheels", which doesn't mean anything in particular.
> 
> > 
> > > > +	pagefault_disable();
> > > > +	if (tsk == current) {
> > > > +		if (get_user(frame->fp, (unsigned long __user *)fp) ||
> > > > +			get_user(frame->pc, (unsigned long __user *)(fp + 8)))
> > > > +			ret = -EFAULT;
> > > > +	} else {
> > > > +		if (access_process_vm(tsk, fp, &frame->fp,
> > > > +			sizeof(unsigned long), 0) != sizeof(unsigned long) ||
> > > > +			access_process_vm(tsk, fp + 0x08, &frame->pc,
> > > > +			sizeof(unsigned long), 0) != sizeof(unsigned long))
> > > > +			ret = -EFAULT;
> > > > +	}
> > > > +	pagefault_enable();
> > > 
> > > If task isn't current, userspace could be running and this will be racy and
> > > unreliable.
> > > 
> > > Where is this used with task != current? Why do we need to support that case at
> > > all?
> > > 
> > It's my idea to support the case for caller who really want to dump another task's backtrace.
> 
> Are there *any* eixsting users in the kernel tree?
> 
> If not, please just delete this.
> 
> > Not sure the racy and unreliablity since access_process_vm call get_task_mm and mmap_read_lock
> > to pin the task's address space.
> 
> If task != current, then it can be concurrently executing. In that case, its
> regs are stale and the task can concurrently modify its stack. Pinning the
> memory doesn't prevent that.
> 
> > We can see access_process_vm is safely used in get_cmdline() case.
> > 
> > > What does this do for COMPAT tasks?
> > >
> > This patch is not covered COMPAT task unwinder yet.
> 
> Supporting COMPAT is necessary for this to be merged.
> 
> > > > +
> > > > +	return ret;
> > > > +}
> > > > +
> > > > +/*
> > > > + * Print the executable address and corresponding VMA info.
> > > > + */
> > > > +static void print_vma_addr_info(char *prefix, struct task_struct *task,
> > > > +				unsigned long ip, const char *loglvl)
> > > > +{
> > > > +	struct mm_struct *mm;
> > > > +	struct vm_area_struct *vma;
> > > > +
> > > > +	if (task != current)
> > > > +		mm = get_task_mm(task);
> > > > +	else
> > > > +		mm = task->mm;
> > > 
> > > Why can't we always use get_task_mm(), even for task == current?
> > > 
> > get_task_mm increase the task's mm_users which refers to the number of users who access the mm
> > in order to pin the task's mm.
> > I think it's meaningless to use get_task_mm for task == current since task will not decrease
> > its mm_users before calling do_exit -> exit_mm.
> 
> So? It's *safe* to use get_task_mm() and mmput() regardless, and it's far
> simpler than conditional logic.
> 
> Please do not treat current as a special case. Always use the same logic to get/put the mm.
> 
> > > > +
> > > > +	if (!mm)
> > > > +		return;
> > > > +	/*
> > > > +	 * we might be running from an atomic context so we cannot sleep
> > > > +	 */
> > > > +	if (!mmap_read_trylock(mm)) {
> > > > +		mmput(mm);
> > > > +		return;
> > > > +	}
> > > 
> > > When is this called from an atomic context?
> > > 
> > User who call it in an atomic context such as interrupt context.
> 
> Yes, but *which* users?
> 
> e.g. does ftrace end up calling this in atomic context, because we might trace
> functions called in an IRQ handler?
> 
> > > > +
> > > > +	vma = find_vma(mm, ip);
> > > > +	if (vma && vma->vm_file) {
> > > > +		struct file *f = vma->vm_file;
> > > > +		char *buf = (char *)__get_free_page(GFP_NOWAIT);
> > > > +
> > > > +		if (buf) {
> > > > +			char *p;
> > > > +
> > > > +			p = file_path(f, buf, PAGE_SIZE);
> > > > +			if (IS_ERR(p))
> > > > +				p = "?";
> > > > +			printk("%s%s%s[%lx-%lx]\n", loglvl, prefix, p,
> > > > +					vma->vm_start,
> > > > +					vma->vm_end);
> > > > +			free_page((unsigned long)buf);
> > > > +		}
> > > > +	}
> > > > +	mmap_read_unlock(mm);
> > > > +	if (task != current)
> > > > +		mmput(mm);
> > > > +}
> > > > +
> > > > +static struct vm_area_struct *find_user_stack_vma(struct task_struct *task, unsigned long sp)
> > > > +{
> > > > +	struct mm_struct *mm;
> > > > +	struct vm_area_struct *vma;
> > > > +
> > > > +	if (task != current)
> > > > +		mm = get_task_mm(task);
> > > > +	else
> > > > +		mm = task->mm;
> > > > +
> > > > +	if (!mm)
> > > > +		return NULL;
> > > > +	/*
> > > > +	 * we might be running from an atomic context so we cannot sleep
> > > > +	 */
> > > > +	if (!mmap_read_trylock(mm)) {
> > > > +		mmput(mm);
> > > > +		return NULL;
> > > > +	}
> > > > +	vma = find_vma(mm, sp);
> > > > +	mmap_read_unlock(mm);
> > > > +	if (task != current)
> > > > +		mmput(mm);
> > > 
> > > What guarantees the VMA is safe to use after this? What ensures that it won't
> > > be freed? What ensures that it is still valid and not subject to concurrent
> > > modification?
> > > 
> > get_task_mm and mmap_read_trylock will ensure the VMA safe to use, the reliazation of two APIs refer
> > to print_vma_addr() in mm/memory.c.
> 
> We've just called:
> 
> 	mmap_read_unlock(mm);
> 	...
> 	mmput(mm);
> 
> What ensures that the dangling reference to the VMA is still safe?
> 
> AFAICT, nothing does.
> 
> > > > +
> > > > +	return vma;
> > > > +}
> > > > +
> > > > +static void dump_user_backtrace_entry(struct task_struct *tsk,
> > > > +				unsigned long where, const char *loglvl)
> > > > +{
> > > > +	char prefix[64];
> > > > +
> > > > +	snprintf(prefix, sizeof(prefix), "<0x%lx> in ", where);
> > > > +	print_vma_addr_info(prefix, tsk, where, loglvl);
> > > > +}
> > > > +
> > > > +void arch_dump_user_stacktrace(struct pt_regs *regs, struct task_struct *tsk,
> > > > +								const char *loglvl)
> > > > +{
> > > > +	struct stack_frame_user frame;
> > > > +	struct vm_area_struct *vma;
> > > > +	unsigned long userstack_start, userstack_end;
> > > > +
> > > > +	if (!tsk)
> > > > +		tsk = current;
> > > > +
> > > > +	/*
> > > > +	 * If @regs is not specified or caller is not current task,.
> > > > +	 * @regs is supposed to get from @tsk.
> > > > +	 */
> > > > +	if (!regs || tsk != current)
> > > > +		regs = task_pt_regs(tsk);
> > > 
> > > The user state is *always* in task_pt_regs(tsk), even when tsk == current.
> > > 
> > > Why does this function take the regs as an argument at all?
> > > 
> > The API export the two argument(regs and tsk) for caller, we must make legitimacy judgments
> > to aviod the caller passed unreasonable arguments.
> 
> My point is that you can aovid that *by construction*.
> 
> If this was:
> 
> 	void arch_dump_user_stacktrace(struct task_struct *tsk, const char *loglvl)
> 
> ... then it's *impossible* for callers to pass incorrect arguments.
> 
> That said, there is no in-tree user for this function.
> 
> Please delete this function.
> 
> > > > +
> > > > +	/* TODO: support stack unwind for compat user mode */
> > > > +	if (compat_user_mode(regs))
> > > > +		return;
> > > > +
> > > > +	userstack_start = regs->user_regs.sp;
> > > > +	vma = find_user_stack_vma(tsk, userstack_start);
> > > > +	if (!vma)
> > > > +		return;
> > > > +
> > > > +	userstack_end = vma->vm_end;
> > > > +	frame.fp = regs->user_regs.regs[29];
> > > > +	frame.sp = userstack_start;
> > > > +	frame.pc = regs->user_regs.pc;
> > > > +
> > > > +	printk("%s[%s-%d] Dump user backtrace:\n", loglvl, tsk->comm, tsk->pid);
> > > > +	while (1) {
> > > > +		unsigned long where = frame.pc;
> > > > +
> > > > +		if (!where || where & 0x3)
> > > > +			break;
> > > > +		dump_user_backtrace_entry(tsk, where, loglvl);
> > > > +		if (arch_unwind_user_frame(tsk, userstack_end, &frame) < 0)
> > > > +			break;
> > > > +	}
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(arch_dump_user_stacktrace);
> > > 
> > > Where is this used from?
> > > 
> > > Why should it be exported?
> > > 
> > As replied at front, the API is supposed to be used by kernel space such as kernel modules.
> 
> This should only be added if there is an in-tree user. There are no existing
> users and none are added by this series.
> 
> Please delete this function.
> 
> NAK to adding this.
> 
> > > > +
> > > > +/**
> > > > + * stack_trace_save_user - Save user space stack traces into a storage array
> > > > + * @consume_entry: Callback for save a user space stack trace
> > > > + * @cookie:	Caller supplied pointer handed back by arch_stack_walk()
> > > > + * @regs: The pt_regs pointer of current task
> > > > + */
> > > > +void arch_stack_walk_user(stack_trace_consume_fn consume_entry, void *cookie,
> > > > +			  const struct pt_regs *regs)
> > > > +{
> > > > +	struct stack_frame_user frame;
> > > > +	struct vm_area_struct *vma;
> > > > +	unsigned long userstack_start, userstack_end;
> > > > +	struct task_struct *tsk = current;
> > > > +
> > > > +	/* TODO: support stack unwind for compat user mode */
> > > > +	if (!regs || !user_mode(regs) || compat_user_mode(regs))
> > > > +		return;
> > > > +
> > > > +	userstack_start = regs->user_regs.sp;
> > > > +	vma = find_user_stack_vma(tsk, userstack_start);
> > > > +	if (!vma)
> > > 
> > > Yet again this duplicates the code above.
> > > 
> > > If we really need this, then arch_stack_walk_user() should be the real
> > > unwinder, and the caes above should be built atop arch_stack_walk_user().
> > > 
> > > > +		return;
> > > > +
> > > > +	userstack_end = vma->vm_end;
> > > > +	frame.fp = regs->user_regs.regs[29];
> > > > +	frame.sp = userstack_start;
> > > > +	frame.pc = regs->user_regs.pc;
> > > > +
> > > > +	while (1) {
> > > > +		unsigned long where = frame.pc;
> > > > +
> > > > +		/* Sanity check: ABI requires pc to be aligned 4 bytes. */
> > > > +		if (!where || where & 0x3)
> > > > +			break;
> > > 
> > > Why do we care whether the PC is valid?
> > > 
> > If pc is invaild, it's meaningless to unwind whole unwind, just skip unwinding.
> 
> Why should we enforce that policy?
> 
> If the PC is 0, or is in a non-executable page, it's equally meaningless but we
> don't check that.
> 
> What if someone is deliberately using a non-canonical PC to force a prefetch
> abort, which they'll handle later (e.g. JITs might do that)?
> 
> > > There are plenty of other things that we could check (e.g. whether this points
> > > to executable memory), but it seems kinda pointless to care beyond whether we
> > > can unwind the frame.
> > > 
> > > Note that we're missing the LR anyway, so this *isn't* a reliable unwind.
> > > 
> > The frame.pc is used to record LR, sanity check for lr and fp will make a reliable unwind
> > since we can safely use print_vma_addr_info() to transfer LR addr to VMA.
> 
> I'm saying you're missing the regs->user_regs.lr, and since you cannot know
> whether that LR is valid, the unwind is unreliable. In the absence of metadata
> from the compiler, you cannot handle that reliably.
> 
> The existing perf unwinder has the same issue, and we leave it to userspace to
> recover the LR via DWARF is necessary, which isn't an option here.
> 
> Thanks,
> Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel