From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1030493AbcBZSSW (ORCPT ); Fri, 26 Feb 2016 13:18:22 -0500 Received: from mail-pa0-f47.google.com ([209.85.220.47]:34727 "EHLO mail-pa0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1030478AbcBZSST (ORCPT ); Fri, 26 Feb 2016 13:18:19 -0500 Subject: Re: BUG: unable to handle kernel paging request from pty_write [was: Linux 4.4.2] To: Linus Torvalds References: Cc: Jiri Slaby , Greg KH , Linux Kernel Mailing List , Andrew Morton , stable , lwn@lwn.net, Steven Rostedt From: Peter Hurley Message-ID: <56D096E4.3010006@hurleysoftware.com> Date: Fri, 26 Feb 2016 10:18:12 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 02/26/2016 10:05 AM, Linus Torvalds wrote: > On Fri, Feb 26, 2016 at 9:52 AM, Peter Hurley wrote: >> >> So more analysis would seem to confirm that RSP has been bumped +8 >> while in ttwu_stat() so when the epilog executed, register restore >> was off by 1 qword. However, there's nothing in ttwu_stat() that >> results in stack pointer offset by +1 qword from prolog. > > I agree. > > That's why I'm actually starting to suspect that it's an AMD microcode > bug that we know very little about. There's apparently register > corruption (the guess being from NMI handling, but virtualization was > also involved) under some circumstances. Yep, that could explain it. > Of course, if Jiri isn't actually running this on an AMD CPU, that > theory flies right out the window. I'll wait for Jiri to confirm before sinking more time here. > But we do have a reported oops on > the security list that looks totally different in the big picture, but > shares the exact same "corrupted stack pointer register state > resulting in crazy instruction pointer, resulting in NX fault" > behavior in the end. > > In the other case, microcode patchlevel 0x0600081c was fine, and > 0x06000832 is the one exhibiting the corruption problem. > > I've contacted Robert Święcki (who found the microcode problem) in > case he wants to weigh in in this thread.. He was talking to some AMD > people, but I don't know the exactly who. Ok, thanks for the info.