From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755224AbbLKMPM (ORCPT ); Fri, 11 Dec 2015 07:15:12 -0500 Received: from mx1.redhat.com ([209.132.183.28]:55202 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755136AbbLKMPJ (ORCPT ); Fri, 11 Dec 2015 07:15:09 -0500 Subject: Re: [PATCH] kvm: x86: move tracepoints outside extended quiescent state To: Borislav Petkov References: <1449769137-8668-1-git-send-email-pbonzini@redhat.com> <20151210180945.GB3831@pd.tnic> <5669C137.7080601@redhat.com> <20151211102244.GA3660@pd.tnic> <566AA85A.9000507@redhat.com> <20151211114112.GA3704@pd.tnic> Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, =?UTF-8?B?SsO2cmcgUsO2ZGVs?= From: Paolo Bonzini X-Enigmail-Draft-Status: N1110 Message-ID: <566ABE48.8020408@redhat.com> Date: Fri, 11 Dec 2015 13:15:04 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <20151211114112.GA3704@pd.tnic> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/12/2015 12:41, Borislav Petkov wrote: > On Fri, Dec 11, 2015 at 11:41:30AM +0100, Paolo Bonzini wrote: >> It would be a kvm hypervisor page, not a kvm guest page, hence unrelated >> to the zapping thing. > > Ah right, guest pages should be userspace addresses, come to think of > it. > >> Can you grab the kallsyms before making it crash? > > Attached. It was a different corruption this time, see below. This time > we don't even have a page table, PGD is 0, rIP is 1. (Fun :-)) Hmm, you had: - RIP=0 in the original report (start_this_handle) - RIP=0 in the second (mutex_lock_nested in ext4) - RIP=1 now The more interesting one is the other one which doesn't have a small RIP, because it has RIP that is slightly larger than the stack pointer, meaning it's likely a frame pointer. And this means in turn that the call trace is correct, and the bug might have happened closer to the actual corruption. [ 959.548625] RIP: 0010:[] [] 0xffff8800b9f9bdf0 [ 959.556338] RSP: 0018:ffff8800b9f9bde0 EFLAGS: 00010206 [ 959.618579] Stack: [ 959.620607] ffffffffa02d5e17 ffff8800b7d48000 ffff8800b9f9be08 ffffffffa02bdb1f [ 959.628104] 0000000000000000 ffff8800b9f9be98 ffffffffa02bdc7b ffff8804242a4400 [ 959.635601] 0000000000000070 0000000000004000 ffffffff81a3c1e0 ffff8800b7ca5e00 [ 959.643114] Call Trace: [ 959.645599] [] ? kvm_arch_vcpu_put+0x17/0x40 [kvm] [ 959.652081] [] ? vcpu_put+0x1f/0x60 [kvm] [ 959.657782] [] ? kvm_vcpu_ioctl+0x11b/0x6f0 [kvm] [ 959.664169] [] ? do_vfs_ioctl+0x2e0/0x540 [ 959.669855] [] ? __fget_light+0x29/0x90 [ 959.675364] [] ? SyS_ioctl+0x4c/0x90 [ 959.680618] [] ? entry_SYSCALL_64_fastpath+0x16/0x6f My wild guess is that RSP is getting corrupted, but I guess I'll have to try to reproduce to figure out what happens. The last thing I need from you (hopefully) is a Kconfig. If you have some time, it would be great to check if you can reproduce it with an older kernel version---trying 4.4-rc1 and 4.3 would be great. Paolo