Simplifying or removing DEBUG

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Simplifying or removing DEBUG_STACK?
@ 2015-04-21 19:57 Andy Lutomirski
  2015-04-22  6:52 ` Ingo Molnar
  0 siblings, 1 reply; 2+ messages in thread
From: Andy Lutomirski @ 2015-04-21 19:57 UTC (permalink / raw)
  To: H. Peter Anvin, Jan Beulich, Ingo Molnar, X86 ML,
	linux-kernel@vger.kernel.org, Linus Torvalds, Borislav Petkov,
	Denys Vlasenko, Steven Rostedt, Peter Zijlstra

Hi all-

On x86_64, we use IST for #BP and #DB.  On x86_32, we don't.

We started using IST for #BP in:

b556b35e98ad [PATCH] x86_64: Move int 3 handler to debug stack and
allow to increase it.

and we started using IST for #DB even earlier in:

7abe2c67299e [PATCH] x86-64 merge for 2.6.4

This has some unpleasant side effects these days.  Primarily, it
requires a bunch of ugly code to avoid recursive use of the debug
stack when, say, an NMI interrupts do_int3 or do_debug and either hits
a kprobe int3 or a #DB if it inadvertently touches a userspace
watchpoint.  See TRACE_IRQS_OFF_DEBUG for another bit wart in that
code.

Here are all of the reasons I can come up with for using IST:

1. SYSENTER with TF set will immediately (or after one instruction --
I'm not quite sure) cause #DB.  This is easy to handle -- we can just
set up a sysenter stack just like x86_32.

2. #DB needs paranoid gsbase handling (due to SYSENTER if nothing
else).  However, there's no real reason that IST and paranoid gsbase
handling need to be tied together.

3. Stack usage.  Almost anything can hit a kprobe and any uaccess
operation can hit a watchpoint.  I'm not sure how much of a problem
this is.  If it is a real problem, we could use something more like
the irqstack mechanism instead of IST.

4. kgdb.  kgdb doesn't appear to respect the kprobe blacklist at all,
so kdbg would blow up if it tried to breakpoint early or late in
syscall handling.  (Hmm.  I bet kdbg also blows up if you use it to
put a breakpoint early in do_int3.)

Thoughts?

Even if it turns out that we can't get rid of IST for #DB and #BP, I
bet we could simplify matters by rigging up the all of the IST entries
to switch IST off for #DB and #BP immediately upon entry and to leave
them off until immediately before returning, thereby simplifying the
logic quite a bit.  I think this would be a pure performance win --
the only patch here in which performance matters is NMI AFAICT, and
the NMI code already does that, albeit rather deeply buried.

--Andy

--Andy

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Simplifying or removing DEBUG_STACK?
  2015-04-21 19:57 Simplifying or removing DEBUG_STACK? Andy Lutomirski
@ 2015-04-22  6:52 ` Ingo Molnar
  0 siblings, 0 replies; 2+ messages in thread
From: Ingo Molnar @ 2015-04-22  6:52 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: H. Peter Anvin, Jan Beulich, X86 ML, linux-kernel@vger.kernel.org,
	Linus Torvalds, Borislav Petkov, Denys Vlasenko, Steven Rostedt,
	Peter Zijlstra


* Andy Lutomirski <luto@amacapital.net> wrote:

> Hi all-
> 
> On x86_64, we use IST for #BP and #DB.  On x86_32, we don't.
> 
> We started using IST for #BP in:
> 
> b556b35e98ad [PATCH] x86_64: Move int 3 handler to debug stack and
> allow to increase it.
> 
> and we started using IST for #DB even earlier in:
> 
> 7abe2c67299e [PATCH] x86-64 merge for 2.6.4
> 
> This has some unpleasant side effects these days.  Primarily, it
> requires a bunch of ugly code to avoid recursive use of the debug
> stack when, say, an NMI interrupts do_int3 or do_debug and either hits
> a kprobe int3 or a #DB if it inadvertently touches a userspace
> watchpoint.  See TRACE_IRQS_OFF_DEBUG for another bit wart in that
> code.
> 
> Here are all of the reasons I can come up with for using IST:
> 
> 1. SYSENTER with TF set will immediately (or after one instruction --
> I'm not quite sure) cause #DB.  This is easy to handle -- we can just
> set up a sysenter stack just like x86_32.
> 
> 2. #DB needs paranoid gsbase handling (due to SYSENTER if nothing
> else).  However, there's no real reason that IST and paranoid gsbase
> handling need to be tied together.
> 
> 3. Stack usage.  Almost anything can hit a kprobe and any uaccess
> operation can hit a watchpoint.  I'm not sure how much of a problem
> this is.  If it is a real problem, we could use something more like
> the irqstack mechanism instead of IST.

This might have been an issue back when we still tried to fit things 
into 8K kernel stacks (4K on 32-bit). These days we have ~15K kernel 
stacks on 64-bit:

  arch/x86/include/asm/page_64_types.h:#define THREAD_SIZE_ORDER  (2 + KASAN_STACK_ORDER)

and we also have irq stacks that dramatically reduce asynchronous 
stack nesting effects.

> 4. kgdb.  kgdb doesn't appear to respect the kprobe blacklist at 
> all, so kdbg would blow up if it tried to breakpoint early or late 
> in syscall handling.  (Hmm.  I bet kdbg also blows up if you use it 
> to put a breakpoint early in do_int3.)

Yes, my answer to kernel debuggers is: "Don't do it then, or implement 
support for it more cleanly than this hackery."

> Thoughts?
> 
> Even if it turns out that we can't get rid of IST for #DB and #BP, I 
> bet we could simplify matters by rigging up the all of the IST 
> entries to switch IST off for #DB and #BP immediately upon entry and 
> to leave them off until immediately before returning, thereby 
> simplifying the logic quite a bit.  I think this would be a pure 
> performance win -- the only patch here in which performance matters 
> is NMI AFAICT, and the NMI code already does that, albeit rather 
> deeply buried.

I'd suggest we try get rid of it and restart with a clean 
implementation.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2015-04-22  6:52 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-04-21 19:57 Simplifying or removing DEBUG_STACK? Andy Lutomirski
2015-04-22  6:52 ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox