From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1755676AbbCSNVX (ORCPT <rfc822;w@1wt.eu>);
	Thu, 19 Mar 2015 09:21:23 -0400
Received: from mx1.redhat.com ([209.132.183.28]:48727 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1754618AbbCSNVT (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Thu, 19 Mar 2015 09:21:19 -0400
Message-ID: <550ACD41.6040607@redhat.com>
Date: Thu, 19 Mar 2015 14:21:05 +0100
From: Denys Vlasenko <dvlasenk@redhat.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0
MIME-Version: 1.0
To: Andy Lutomirski <luto@amacapital.net>
CC: Linus Torvalds <torvalds@linux-foundation.org>,
        Stefan Seyfried <stefan.seyfried@googlemail.com>,
        Takashi Iwai <tiwai@suse.de>, X86 ML <x86@kernel.org>,
        LKML <linux-kernel@vger.kernel.org>, Tejun Heo <tj@kernel.org>
Subject: Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?
References: <5505400B.8050300@message-id.googlemail.com> <s5hr3smfl11.wl-tiwai@suse.de> <s5hy4mukxpj.wl-tiwai@suse.de> <s5h4mpi5hbv.wl-tiwai@suse.de> <CALCETrVCMEcCOHZ35LneCU6uGH+W5SF0groKbUGp2zTjWpzB0w@mail.gmail.com> <5509CBF7.3040602@message-id.googlemail.com> <CALCETrU2R020HVniX2sczxexPO2qhEPbS++9DXzcxeycgxoGQg@mail.gmail.com> <CA+55aFwT4BJVR10i2Cm8pMH0UGd-J3EwnEUYKf3BWTM0awebbA@mail.gmail.com> <5509F161.3010101@redhat.com> <CALCETrXZvSiT41+AYAPizSsGZ_=O=7wmb+Lwo_ChEZySxUnH-A@mail.gmail.com>
In-Reply-To: <CALCETrXZvSiT41+AYAPizSsGZ_=O=7wmb+Lwo_ChEZySxUnH-A@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 03/18/2015 10:55 PM, Andy Lutomirski wrote:
> On Wed, Mar 18, 2015 at 2:42 PM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
>>> in 'irq_return_via_sysret' is new to 4.0, and instead of entering the
>>> kernel with a user stack poiinter, maybe we're *exiting* the kernel,
>>> and have just reloaded the user stack pointer when "USERGS_SYSRET64"
>>> takes some fault.
>>
>> Yes, so far we happily thought that SYSRET never fails...
>>
>> This merits adding some code which would at least BUG_ON
>> if the faulting address is seen to match SYSRET64.
> 
> sysret64 can only fail with #GP, and we're totally screwed if that
> happens, although I agree about the BUG_ON in principle.  Where would
> we add it that would help in this case, though?  We never even made it
> to C code.

I propose to widen such check to catch any cases where
we enter an exception from CPL0 and find that our RSP
is bad. This will cover the case of faulting SYSRET and possible
future obscure bugs.

What this patch does is it stops CPU dead if we find itself
with userspace RSP (not saved RSP, but _actual_ %RSP register)
in an exception handler prologue:

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index a0a3a6e..53a34ba 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -930,6 +930,12 @@ ENTRY(\sym)
 	INTR_FRAME
 	.endif

+	testq %rsp,%rsp
+	/* If RSP is positive, we are in kernel but have userspace RSP. */
+	/* We corrupted user stack already by storing iret frame there. */
+	/* This is supposed to be impossible. */
+0:	jns 0b
+
 	ASM_CLAC
 	PARAVIRT_ADJUST_EXCEPTION_FRAME


Hopefully then NMI watchdog will kill it, and we'll get better data.