From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1030493AbcBZSSW (ORCPT <rfc822;w@1wt.eu>);
	Fri, 26 Feb 2016 13:18:22 -0500
Received: from mail-pa0-f47.google.com ([209.85.220.47]:34727 "EHLO
	mail-pa0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1030478AbcBZSST (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Fri, 26 Feb 2016 13:18:19 -0500
Subject: Re: BUG: unable to handle kernel paging request from pty_write [was:
 Linux 4.4.2]
To: Linus Torvalds <torvalds@linux-foundation.org>
References: <CA+55aFxXAiJe=NPuxJ3vQFn6T12_nVV2+Yz4yChYbavtB+1caw@mail.gmail.com>
Cc: Jiri Slaby <jslaby@suse.cz>, Greg KH <gregkh@linuxfoundation.org>,
        Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
        Andrew Morton <akpm@linux-foundation.org>,
        stable <stable@vger.kernel.org>, lwn@lwn.net,
        Steven Rostedt <rostedt@goodmis.org>
From: Peter Hurley <peter@hurleysoftware.com>
Message-ID: <56D096E4.3010006@hurleysoftware.com>
Date: Fri, 26 Feb 2016 10:18:12 -0800
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101
 Thunderbird/38.5.1
MIME-Version: 1.0
In-Reply-To: <CA+55aFxXAiJe=NPuxJ3vQFn6T12_nVV2+Yz4yChYbavtB+1caw@mail.gmail.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 02/26/2016 10:05 AM, Linus Torvalds wrote:
> On Fri, Feb 26, 2016 at 9:52 AM, Peter Hurley <peter@hurleysoftware.com> wrote:
>>
>> So more analysis would seem to confirm that RSP has been bumped +8
>> while in ttwu_stat() so when the epilog executed, register restore
>> was off by 1 qword. However, there's nothing in ttwu_stat() that
>> results in stack pointer offset by +1 qword from prolog.
> 
> I agree.
> 
> That's why I'm actually starting to suspect that it's an AMD microcode
> bug that we know very little about. There's apparently register
> corruption (the guess being from NMI handling, but virtualization was
> also involved) under some circumstances.

Yep, that could explain it.

> Of course, if Jiri isn't actually running this on an AMD CPU, that
> theory flies right out the window.

I'll wait for Jiri to confirm before sinking more time here.


> But we do have a reported oops on
> the security list that looks totally different in the big picture, but
> shares the exact same "corrupted stack pointer register state
> resulting in crazy instruction pointer, resulting in NX fault"
> behavior in the end.
> 
> In the other case, microcode patchlevel 0x0600081c was fine, and
> 0x06000832 is the one exhibiting the corruption problem.
> 
> I've contacted Robert Święcki (who found the microcode problem) in
> case he wants to weigh in in this thread.. He was talking to some AMD
> people, but I don't know the exactly who.

Ok, thanks for the info.