From mboxrd@z Thu Jan 1 00:00:00 1970 From: Chris Friesen Subject: Re: [Qemu-devel] kvm bug in __rmap_clear_dirty during live migration Date: Wed, 22 Feb 2017 07:31:26 -0600 Message-ID: <58AD92AE.6040502@windriver.com> References: <589C7E96.9060905@huawei.com> <589D83CE.1090803@huawei.com> <589DDC05.9010807@windriver.com> <58AA51D6.6020508@huawei.com> <1487565495.3740.27.camel@intel.com> <58AD0094.90304@windriver.com> <4dd92012-626a-2d80-9adb-0be398f73eb1@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit Cc: "kvm@vger.kernel.org" , "fangying1@huawei.com" , "herongguang.he@huawei.com" , "xudong.hao@linux.intel.com" , "qemu-devel@nongnu.org" , "wangxinxin.wang@huawei.com" , "kai.huang@linux.intel.com" , "rkrcmar@redhat.com" , "guangrong.xiao@linux.intel.com" To: Paolo Bonzini , "Han, Huaitong" , "hangaohuai@huawei.com" Return-path: Received: from mail.windriver.com ([147.11.1.11]:61950 "EHLO mail.windriver.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754269AbdBVNbs (ORCPT ); Wed, 22 Feb 2017 08:31:48 -0500 In-Reply-To: <4dd92012-626a-2d80-9adb-0be398f73eb1@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: On 02/22/2017 05:15 AM, Paolo Bonzini wrote: > > > On 22/02/2017 04:08, Chris Friesen wrote: >> On 02/19/2017 10:38 PM, Han, Huaitong wrote: >>> Hi, Gaohuai >>> >>> I tried to debug the problem, and I found the indirect cause may be that >>> the rmap value is not cleared when KVM mmu page is freed. I have read >>> code without the root cause. Can you stable reproduce the the issue? >>> Many guesses need to be verified. >> >> In both cases it seems to have been triggered by repeatedly >> live-migrating a KVM virtual machine between two hypervisors with >> Broadwell CPUs running the latest CentOS 7. >> >> It's a race of some sort, it doesn't happen every time. > > Can you reproduce it with kernel 4.8+? I'm suspecting commmit > 4e59516a12a6 ("kvm: vmx: ensure VMCS is current while enabling PML", > 2016-07-14) to be the fix. I can't easily try with a newer kernel, the software package we're using has kernel patches that would have to be ported. I'm at a conference, don't really have time to set up a pair of test machines from scratch with a custom kernel. Chris