From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1755224AbbLKMPM (ORCPT <rfc822;w@1wt.eu>);
	Fri, 11 Dec 2015 07:15:12 -0500
Received: from mx1.redhat.com ([209.132.183.28]:55202 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1755136AbbLKMPJ (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Fri, 11 Dec 2015 07:15:09 -0500
Subject: Re: [PATCH] kvm: x86: move tracepoints outside extended quiescent
 state
To: Borislav Petkov <bp@alien8.de>
References: <1449769137-8668-1-git-send-email-pbonzini@redhat.com>
 <20151210180945.GB3831@pd.tnic> <5669C137.7080601@redhat.com>
 <20151211102244.GA3660@pd.tnic> <566AA85A.9000507@redhat.com>
 <20151211114112.GA3704@pd.tnic>
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
        =?UTF-8?B?SsO2cmcgUsO2ZGVs?= <joro@8bytes.org>
From: Paolo Bonzini <pbonzini@redhat.com>
X-Enigmail-Draft-Status: N1110
Message-ID: <566ABE48.8020408@redhat.com>
Date: Fri, 11 Dec 2015 13:15:04 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101
 Thunderbird/38.3.0
MIME-Version: 1.0
In-Reply-To: <20151211114112.GA3704@pd.tnic>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org



On 11/12/2015 12:41, Borislav Petkov wrote:
> On Fri, Dec 11, 2015 at 11:41:30AM +0100, Paolo Bonzini wrote:
>> It would be a kvm hypervisor page, not a kvm guest page, hence unrelated
>> to the zapping thing.
> 
> Ah right, guest pages should be userspace addresses, come to think of
> it.
> 
>> Can you grab the kallsyms before making it crash?
> 
> Attached. It was a different corruption this time, see below. This time
> we don't even have a page table, PGD is 0, rIP is 1. (Fun :-))

Hmm, you had:

- RIP=0 in the original report (start_this_handle)
- RIP=0 in the second (mutex_lock_nested in ext4)
- RIP=1 now

The more interesting one is the other one which doesn't have a small RIP,
because it has RIP that is slightly larger than the stack pointer, meaning
it's likely a frame pointer.  And this means in turn that the call trace
is correct, and the bug might have happened closer to the actual corruption.

[  959.548625] RIP: 0010:[<ffff8800b9f9bdf0>]  [<ffff8800b9f9bdf0>] 0xffff8800b9f9bdf0
[  959.556338] RSP: 0018:ffff8800b9f9bde0  EFLAGS: 00010206
[  959.618579] Stack:
[  959.620607]  ffffffffa02d5e17 ffff8800b7d48000 ffff8800b9f9be08 ffffffffa02bdb1f
[  959.628104]  0000000000000000 ffff8800b9f9be98 ffffffffa02bdc7b ffff8804242a4400
[  959.635601]  0000000000000070 0000000000004000 ffffffff81a3c1e0 ffff8800b7ca5e00
[  959.643114] Call Trace:
[  959.645599]  [<ffffffffa02d5e17>] ? kvm_arch_vcpu_put+0x17/0x40 [kvm]
[  959.652081]  [<ffffffffa02bdb1f>] ? vcpu_put+0x1f/0x60 [kvm]
[  959.657782]  [<ffffffffa02bdc7b>] ? kvm_vcpu_ioctl+0x11b/0x6f0 [kvm]
[  959.664169]  [<ffffffff811a0930>] ? do_vfs_ioctl+0x2e0/0x540
[  959.669855]  [<ffffffff811ac8e9>] ? __fget_light+0x29/0x90
[  959.675364]  [<ffffffff811a0bdc>] ? SyS_ioctl+0x4c/0x90
[  959.680618]  [<ffffffff816e2d5b>] ? entry_SYSCALL_64_fastpath+0x16/0x6f

My wild guess is that RSP is getting corrupted, but I guess I'll have to try
to reproduce to figure out what happens.

The last thing I need from you (hopefully) is a Kconfig.  If you have some
time, it would be great to check if you can reproduce it with an older kernel
version---trying 4.4-rc1 and 4.3 would be great.

Paolo