public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed
From: Will Deacon <will@kernel.org>
To: minyard@acm.org
Cc: Catalin Marinas <catalin.marinas@arm.com>,
	linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	Corey Minyard <cminyard@mvista.com>
Subject: Re: [PATCH v2] arm64:kgdb: Fix kernel single-stepping
Date: Thu, 20 Feb 2020 14:22:14 +0000	[thread overview]
Message-ID: <20200220142214.GC14459@willie-the-truck> (raw)
In-Reply-To: <20200219152403.3495-1-minyard@acm.org>

On Wed, Feb 19, 2020 at 09:24:03AM -0600, minyard@acm.org wrote:
> From: Corey Minyard <cminyard@mvista.com>
> 
> I was working on a single-step bug on kgdb on an ARM64 system, and I saw
> this scenario:
> 
> * A single step is setup to return to el1
> * The ERET return to el1
> * An interrupt is pending and runs before the instruction
> * As soon as PSTATE.D (the debug disable bit) is cleared, the single
>     step happens in that location, not where it should have.
> 
> This appears to be due to PSTATE.SS not being cleared when the exception
> happens.  Per section D.2.12.5 of the ARMv8 reference manual, that
> appears to be incorrect, it says "As part of exception entry, the PE
> does all of the following: ...  Sets PSTATE.SS to 0."

Sorry, but I don't follow you here. If PSTATE.SS is not cleared, why would
you take the step exception?

> However, I appear to not be the first person who has noticed this.  In
> the el0-only portion of the kernel_entry macro in entry.S, I found the
> following comment: "Ensure MDSCR_EL1.SS is clear, since we can unmask
> debug exceptions when scheduling."  Exactly the same scenario, except
> coming from a userland single step, not a kernel one.

No, I think you might be conflating PSTATE.SS and MDSCR_EL1.SS.

> As I was studying this, though, I realized that the following scenario
> had an issue:
> 
> * Kernel enables MDSCR.SS, MDSCR.KDE, MDSCR.MDE (unnecessary), and
>   PSTATE.SS to enable a single step in el1, for kgdb or kprobes,
>   on the current CPU's MDSCR register and the process' PSTATE.SS
>   register.
> * Kernel returns from the exception with ERET.
> * An interrupt or page fault happens on the instruction, causing the
>   instruction to not be run, but the exception handler runs.
> * The exception causes the task to migrate to a new core.
> * The return from the exception runs on a different processor now,
>   where the MDSCR values are not set up for a single step.
> * The single step fails to happen.
> 
> This is bad for kgdb, of course, but it seems really bad for kprobes if
> this happens.

I don't see how this can happen for kprobes. Have you managed to reproduce
the failure?

> To fix both these problems, rework the handling of single steps to clear
> things out upon entry to the kernel from el1, and then to set up single
> step when returning to el1, and not do the setup in debug-monitors.c.
> This means that single stepping does not use
> enable/disable_debug_monitors(); it is no longer necessary to track
> those flags for single stepping.  This is much like single stepping is
> handled for el0.  A new flag is added in pt_regs to enable single
> stepping from el1.  Unfortunately, the old value of PSTATE.SS cannot be
> used for this because of the hardware bug mentioned earlier.

I don't think there's a hardware bug.

It sound like you're trying to make kernel debugging per-task instead
of per-cpu, but I don't think that's the right thing to do. What if I /want/
to debug an interrupt handler? For example, I might have a watchpoint on
something accessed by timer interrupt.

> As part of this, there is an interaction between single stepping and the
> other users of debug monitors with the MDSCR.KDE bit.  That bit has to
> be set for both hardware breakpoints at el1 and single stepping at el1.
> A new variable was created to store the cpu-wide value of MDSCR.KDE; the
> single stepping code makes sure not to clear that bit on kernel entry if
> it's set in the per-cpu variable.
> 
> After fixing this and doing some more testing, I ran into another issue:
> 
> * Kernel enables the pt_regs single step
> * Kernel returns from the exception with ERET.
> * An interrupt or page fault happens on the instruction, causing the
>   instruction to not be run, but the exception handler runs.

This sounds like you've broken debug; we should take the step exception
in the exception handler. That's the way this is supposed to work.

> There's no easy way to find the pt_regs that has the single step flag
> set.  So a thread info flag was added so that the single step could be
> disabled in this case.  Both that flag and the flag in pt_regs must be
> set to enable a single step.

Honestly, I get the feeling that you don't really understand the code
you're changing here and it's a tonne of effort to try to untangle what
you're doing. That's not necessarily your fault because the debug
architecture is a nightmare to comprehend, but I'm not keen to change it
unless we have a really good justification. I'm sure kgdb is riddled with
bugs but, as I said before, the fixes should be in kgdb, not by tearing
up the low-level debug code (which has the potential to break other users).

Maybe it would be easier if you tried to fix one problem per patch,
preferably with a way to reproduce the issue you're seeing each time?

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  parent reply	other threads:[~2020-02-20 14:22 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-19 15:24 [PATCH v2] arm64:kgdb: Fix kernel single-stepping minyard
2020-02-20 14:06 ` Daniel Thompson
2020-02-20 14:52   ` Corey Minyard
2020-02-20 14:21 ` Marc Zyngier
2020-02-20 14:50   ` Corey Minyard
2020-02-20 15:06     ` Marc Zyngier
2020-02-20 14:22 ` Will Deacon [this message]
2020-02-20 16:30   ` Corey Minyard
2020-02-20 21:30     ` Corey Minyard
2020-02-24 18:07       ` James Morse
2020-02-25 15:38         ` Corey Minyard
2020-02-25 17:55           ` James Morse
2020-02-26  2:58             ` Corey Minyard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200220142214.GC14459@willie-the-truck \
    --to=will@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=cminyard@mvista.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=minyard@acm.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox