From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965369AbbDVGwN (ORCPT ); Wed, 22 Apr 2015 02:52:13 -0400 Received: from mail-wg0-f52.google.com ([74.125.82.52]:33448 "EHLO mail-wg0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S964833AbbDVGwI (ORCPT ); Wed, 22 Apr 2015 02:52:08 -0400 Date: Wed, 22 Apr 2015 08:52:03 +0200 From: Ingo Molnar To: Andy Lutomirski Cc: "H. Peter Anvin" , Jan Beulich , X86 ML , "linux-kernel@vger.kernel.org" , Linus Torvalds , Borislav Petkov , Denys Vlasenko , Steven Rostedt , Peter Zijlstra Subject: Re: Simplifying or removing DEBUG_STACK? Message-ID: <20150422065203.GA4038@gmail.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Andy Lutomirski wrote: > Hi all- > > On x86_64, we use IST for #BP and #DB. On x86_32, we don't. > > We started using IST for #BP in: > > b556b35e98ad [PATCH] x86_64: Move int 3 handler to debug stack and > allow to increase it. > > and we started using IST for #DB even earlier in: > > 7abe2c67299e [PATCH] x86-64 merge for 2.6.4 > > This has some unpleasant side effects these days. Primarily, it > requires a bunch of ugly code to avoid recursive use of the debug > stack when, say, an NMI interrupts do_int3 or do_debug and either hits > a kprobe int3 or a #DB if it inadvertently touches a userspace > watchpoint. See TRACE_IRQS_OFF_DEBUG for another bit wart in that > code. > > Here are all of the reasons I can come up with for using IST: > > 1. SYSENTER with TF set will immediately (or after one instruction -- > I'm not quite sure) cause #DB. This is easy to handle -- we can just > set up a sysenter stack just like x86_32. > > 2. #DB needs paranoid gsbase handling (due to SYSENTER if nothing > else). However, there's no real reason that IST and paranoid gsbase > handling need to be tied together. > > 3. Stack usage. Almost anything can hit a kprobe and any uaccess > operation can hit a watchpoint. I'm not sure how much of a problem > this is. If it is a real problem, we could use something more like > the irqstack mechanism instead of IST. This might have been an issue back when we still tried to fit things into 8K kernel stacks (4K on 32-bit). These days we have ~15K kernel stacks on 64-bit: arch/x86/include/asm/page_64_types.h:#define THREAD_SIZE_ORDER (2 + KASAN_STACK_ORDER) and we also have irq stacks that dramatically reduce asynchronous stack nesting effects. > 4. kgdb. kgdb doesn't appear to respect the kprobe blacklist at > all, so kdbg would blow up if it tried to breakpoint early or late > in syscall handling. (Hmm. I bet kdbg also blows up if you use it > to put a breakpoint early in do_int3.) Yes, my answer to kernel debuggers is: "Don't do it then, or implement support for it more cleanly than this hackery." > Thoughts? > > Even if it turns out that we can't get rid of IST for #DB and #BP, I > bet we could simplify matters by rigging up the all of the IST > entries to switch IST off for #DB and #BP immediately upon entry and > to leave them off until immediately before returning, thereby > simplifying the logic quite a bit. I think this would be a pure > performance win -- the only patch here in which performance matters > is NMI AFAICT, and the NMI code already does that, albeit rather > deeply buried. I'd suggest we try get rid of it and restart with a clean implementation. Thanks, Ingo