public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed
From: Mark Rutland <mark.rutland@arm.com>
To: Wang Qing <wangqing@vivo.com>
Cc: opensource.kernel@vivo.com, linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH] ARM64: fixed dump_backtrace() when task running on another cpu
Date: Tue, 14 Apr 2020 13:54:54 +0100	[thread overview]
Message-ID: <20200414125454.GE2486@C02TD0UTHF1T.local> (raw)
In-Reply-To: <1586769788-5954-1-git-send-email-wangqing@vivo.com>

On Mon, Apr 13, 2020 at 05:23:08PM +0800, Wang Qing wrote:
> >Hi,
> >
> >On Thu, Apr 09, 2020 at 05:38:16PM +0800, Wang Qing wrote:
> >> We cannot get FP and PC when the task is running on another CPU,
> >> task->thread.cpu_context is the last time the task was switched out,
> >> it's better to give a reminder than to provide wrong information.
> >> 
> >> Signed-off-by: Wang Qing <wangqing@vivo.com>
> >
> >Are you seeing this happen anywhere in particular today?
> 
> This problem is not so obvious, because it will not cause any exceptions
> but will show "old" stack always ending with switch_to, I finally confirmed
> the problem through debugging.
> 
> For example:Task blocked in spinlock/interrupt/busy loop, I want to print
> the backtrace when detected(like PSI in Android), the printing is wrong(old).

Sure, but *where* are you seeing this?

Is this a problem in mainline, or only in code that you add?
 
> >
> >> ---
> >>  arch/arm64/kernel/traps.c | 8 ++++++++
> >>  1 file changed, 8 insertions(+)
> >> 
> >> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
> >> index cf402be..c04e3e8 100644
> >> --- a/arch/arm64/kernel/traps.c
> >> +++ b/arch/arm64/kernel/traps.c
> >> @@ -106,6 +106,14 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk)
> >>  		start_backtrace(&frame,
> >>  				(unsigned long)__builtin_frame_address(0),
> >>  				(unsigned long)dump_backtrace);
> >> +	} else if (tsk->on_cpu) {
> >> +		/*
> >> +		 * The task is running in another cpu, so the call stack
> >> +		 * is changing and we cannot get it.
> >> +		 */
> >> +		pr_warn("tsk: %s is running in CPU%d, Don't call trace!\n",
> >> +			tsk->comm, tsk->cpu);
> >
> >I believe that we can race with a concurrent write to tsk->cpu in both
> >cases above. We could use READ_ONCE() to get a snapshot, but we can
> >still race and miss cases where the task was runnning as we backtrace
> >it.
> >
> >Thanks,
> >Mark.
> 
> I will use task_cpu(tsk) instead of tsk->cpu, and add task_running_oncpu() in
> include/linux/sched.h instead of tsk->on_cpu, but as you said, by this patch,
> we can still race and miss cases where the task was runnning as we backtrace.
> 
> But from the user's perspective, printing wrong backtrace is confused when
> we call this function while task already running. However, it's reasonable to
> print the last backtrace when task enter running during the function is called.

The contract of dump_backtrace() is that it's only called for either:

* the current task
* a task that is blocked in switch_to()

... so I don't think that the current behaviour is wrong as such, though
if it's easy to catch misuse I agree it would be nice to do so for
robustness.

However, we can alwatys race here, so detecting this case is best-effort
and not entirely reliable, and I don't want to leave the impression that
it is. I'm also not a fan of pr_warn() here, since this is indicative of
a kernel bug rather than a user/system issue, and can spam the console
due to a lack of ratelimiting.

So, based on whether this is a problem in existing code, I'd like that
code fix first, and then we can consider adding a WARN_ONCE() or
something ratelimited. Ideally something that'll give us the bactrace or
the code that's calling dump_backtrace() erroneously.

Thanks,
Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

      reply	other threads:[~2020-04-14 12:55 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-09  9:38 [PATCH] ARM64: fixed dump_backtrace() when task running on another cpu Wang Qing
2020-04-09 10:40 ` Mark Rutland
2020-04-13  9:23 ` Wang Qing
2020-04-14 12:54   ` Mark Rutland [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200414125454.GE2486@C02TD0UTHF1T.local \
    --to=mark.rutland@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=opensource.kernel@vivo.com \
    --cc=wangqing@vivo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox