From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from merlin.infradead.org ([205.233.59.134]:43239 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751625AbcEXRJx (ORCPT ); Tue, 24 May 2016 13:09:53 -0400 Date: Tue, 24 May 2016 19:09:34 +0200 From: Peter Zijlstra To: Andy Lutomirski Cc: Andy Lutomirski , X86 ML , Borislav Petkov , Andi Kleen , lkml@vger.kernel.org, Oleg Nesterov , stable Subject: Re: [PATCH] x86/traps: Don't for in_interrupt() to return true in IST handlers Message-ID: <20160524170934.GD15189@worktop.bitpit.net> References: <20160524085945.GE3192@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: stable-owner@vger.kernel.org List-ID: On Tue, May 24, 2016 at 08:43:57AM -0700, Andy Lutomirski wrote: > > So this has implications for code like > > kernel/events/internal.h:get_recursion_context() and > > kernel/trace/trace.c:get_trace_buf(). > > > > Which use a sequence of: in_nmi(), in_irq(), in_softirq() to pick 1 out > > of 4 possible contexts. > > > > I would really like the Changelog to reflect on this. The current code > > will have ISTs land in in_irq(), with this chance, not so much. > > I can change the changelog. This is basically all I'm asking. > > Now ISTs as a whole invalidate the whole 'we have only 4 contexts' and > > the mapping back to those 4 is going to be somewhat arbitrary I suspect, > > but changes like this should be very much aware of these things. And > > make an explicit choice. > > I'm not so comfortable with trying to make any particular guarantees > about what all the in_xyz() things will return for different entry > types and how they nest. For example, debug can nest inside itself > quite easily (at one point I even had a user program to force it to > happen) -- this can trigger when a watchpoint nests inside a > single-step trap, and it can also happen when a watchpoint handler is > interrupted by an NMI than then recursively triggers the watchpoint. > The latter could easily result in nested NMIs that are directly > visible to the trace or event code. > > On x86, there's also MCE, which is NMI-ish, and NMI and MCE can freely > nest inside each other. (Blech.) Right; all the nesting possibilities are endless and insane. And I don't think it makes sense to try and enumerate them all and worse; expose this to generic code. I think having the 4: task, softirq, hardirq, nmi is fine. We just need to be a little careful with how we map to them, that we don't wreck some common situation and suddenly start loosing trace data. > Would it make more sense to adjust the trace code to have a percpu > nesting count and to match up get_trace_buf with put_trace_buf to > decrement the count? The event code looks like the same thing could > happen. We have, but its coupled with static storage, such as to avoid having to do it on stack or *horror* dynamically allocate. So in order to protect these resource we have the per context nesting count. > Also, on further reflection, I'm going to get rid of the 3* hack. Makes sense.