From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752518AbaKKXJe (ORCPT ); Tue, 11 Nov 2014 18:09:34 -0500 Received: from mail.skyhub.de ([78.46.96.112]:56939 "EHLO mail.skyhub.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752252AbaKKXJc (ORCPT ); Tue, 11 Nov 2014 18:09:32 -0500 Date: Wed, 12 Nov 2014 00:09:26 +0100 From: Borislav Petkov To: Andy Lutomirski Cc: X86 ML , "linux-kernel@vger.kernel.org" , Peter Zijlstra , Oleg Nesterov , Tony Luck , Andi Kleen Subject: Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace Message-ID: <20141111230926.GR31490@pd.tnic> References: <20141111213628.GP31490@pd.tnic> <20141111223316.GQ31490@pd.tnic> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Nov 11, 2014 at 02:40:12PM -0800, Andy Lutomirski wrote: > I wonder what the IRET is for. There had better not be another magic > IRET unmask thing. I'm guessing that the actual semantics are that > nothing whatsoever can mask #MC, but that a second #MC when MCIP is > still set is a shutdown condition. Hmmm, both manuals are unclear as to what exactly reenables #MC. So forget about IRET and look at this: "When the processor receives a machine check when MCIP is set, it automatically enters the shutdown state." so this really reads like a second #MC while the first is happening would shutdown the system - regardless whether I'm still in #MC context or not, running the first #MC handler. I guess I needz me some hw people to actually confirm. > Define "atomic". > > You're still running with irqs off and MCIP set. At some point, Yes, I need to be atomic wrt to another #MC so that I can be able to read out the MCA MSRs in time and undisturbed. > you're presumably done with all of the machine check registers, and > you can clear MCIP. Now, if current == victim, you can enable irqs > and do whatever you want. This is the key: if I enable irqs and the process gets scheduled on another CPU, I lose. So I have to be able to say: before you run this task on any CPU, kill it. > In my mind, the benefit is that you don't need to think about how to > save your information and arrange to get called back the next time > that the victim task is a non-atomic context, since you *are* the > victim task and you're running in normal irqs-disabled kernel mode. > > In contrast, with the current entry code, if you enable IRQs or so > anything that could sleep, you're on the wrong stack, so you'll crash. > That means that taking mutexes, even after clearing MCIP, is > impossible. Hmm, it is late here and I need to think about this on a clear head again but I think I can see the benefit of this to a certain extent. However(!), I need to be able to run undisturbed and do the minimum work in the #MC handler before I reenable MCEs. But Tony also has a valid point as in what is going to happen if I get another MCE while doing the memory_failure() dance. I guess if memory_failure() takes proper locks, the second #MC will get to wait until the first is done. But who knows in reality ... -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. --