From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751521AbaKKQNR (ORCPT ); Tue, 11 Nov 2014 11:13:17 -0500 Received: from mail.skyhub.de ([78.46.96.112]:33367 "EHLO mail.skyhub.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751097AbaKKQNQ (ORCPT ); Tue, 11 Nov 2014 11:13:16 -0500 Date: Tue, 11 Nov 2014 17:13:09 +0100 From: Borislav Petkov To: Andy Lutomirski Cc: Chen Gong , X86 ML , Peter Zijlstra , Oleg Nesterov , Tony Luck , "linux-kernel@vger.kernel.org" Subject: Re: [PATCH v2 4/5] x86/mce: Simplify flow when handling recoverable memory errors Message-ID: <20141111161309.GG31490@pd.tnic> References: <1407998986-1834-1-git-send-email-gong.chen@linux.intel.com> <1407998986-1834-5-git-send-email-gong.chen@linux.intel.com> <20141111114248.GD31490@pd.tnic> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Nov 11, 2014 at 07:42:48AM -0800, Andy Lutomirski wrote: > The last time I looked at the MCE code, I got a bit lost in the > control flow. Is there ever a userspace-killing MCE that's delivered > from kernel mode? Yep, so while you're executing a userspace process, you get an #MC raised which reports an error for which action is required, i.e. look at all those MCE_AR_SEVERITY errors in arch/x86/kernel/cpu/mcheck/mce-severity.c. It happened within the context of current so we go and run the #MC handler which decides that the process needs to be killed in order to contain the error. So after we exit the handler and before we return to try to sched in the process again on any core, we want to actually kill it and poison all its memory. > By that, I mean that I think that all userspace-killing MCEs go have > user_mode_vm(regs) and go through paranoid_exit. Yes. > If so, why do you need to jump through hoops at all? You can't call > do_exit, but it should be completely safe to force a fatal signal and > let the scheduler and signal code take care of killing the process, > right? For that matter, you should also be able to poke at vm > structures, etc. Well, we do that already. memory-failure.c does kill the processes when it decides to. The only question is whether adding two new members to task_struct is ok. It is nicely convenient and it all falls into place. In the #MC handler we do: if (worst == MCE_AR_SEVERITY) { /* schedule action before return to userland */ + current->paddr = m.addr; + current->restartable = !!(m.mcgstatus & MCG_STATUS_RIPV); set_thread_flag(TIF_MCE_NOTIFY); } and then before we return to userspace we do: + if (!current->restartable) flags |= MF_MUST_KILL; if (memory_failure(pfn, MCE_VECTOR, flags) < 0) { and the MF_MUST_KILL makes sure memory_failure() does a force_sig(). So I think this is ok, I only think that people might oppose the two new members to task_struct but it looks clean to me this way. IMHO at least. > Or is there a meaningful case where mce_notify_process needs to help > with recovery but the original MCE happened with !user_mode_vm(regs)? Well, for the !user_mode_vm(regs) case we panic anyway. Thanks Andy. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. --