From mboxrd@z Thu Jan 1 00:00:00 1970 From: Boris Ostrovsky Subject: Re: [xen-unstable test] 33690: regressions - FAIL Date: Mon, 26 Jan 2015 10:08:19 -0500 Message-ID: <54C65863.6040100@oracle.com> References: <54C62D560200007800059615@mail.emea.novell.com> <54C63560020000780005967F@mail.emea.novell.com> <54C6540B.7080506@citrix.com> <54C65461.2000503@oracle.com> <54C6559F.8010700@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta3.messagelabs.com ([195.245.230.39]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1YFlHJ-0006M4-GU for xen-devel@lists.xenproject.org; Mon, 26 Jan 2015 15:08:37 +0000 In-Reply-To: <54C6559F.8010700@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Andrew Cooper , Jan Beulich Cc: xen-devel List-Id: xen-devel@lists.xenproject.org On 01/26/2015 09:56 AM, Andrew Cooper wrote: > On 26/01/15 14:51, Boris Ostrovsky wrote: >> On 01/26/2015 09:49 AM, Andrew Cooper wrote: >>> On 26/01/15 11:38, Jan Beulich wrote: >>>>>>> On 26.01.15 at 12:04, wrote: >>>>>>>> On 24.01.15 at 13:54, wrote: >>>>>> test-amd64-amd64-xl-qemut-win7-amd64 7 windows-install fail >>>>>> REGR. vs. 33637 >>>>> Jan 24 00:35:16.262627 (XEN) ----[ Xen-4.6-unstable x86_64 >>>>> debug=y Not tainted ]---- >>>>> Jan 24 00:35:16.478599 (XEN) CPU: 1 >>>>> Jan 24 00:35:16.478624 (XEN) RIP: e008:[<0000000000000000>] >>>>> 0000000000000000 >>>>> Jan 24 00:35:16.486596 (XEN) RFLAGS: 0000000000010082 CONTEXT: >>>>> hypervisor >>>>> ... >>>>> Jan 24 00:35:16.678620 (XEN) Xen call trace: >>>>> Jan 24 00:35:16.678650 (XEN) [] >>>>> vpmu_do_interrupt+0x2f/0x8a >>>>> Jan 24 00:35:16.686605 (XEN) [] >>>>> pmu_apic_interrupt+0x33/0x35 >>>>> Jan 24 00:35:16.698582 (XEN) [] do_IRQ+0x9c/0x624 >>>>> Jan 24 00:35:16.698615 (XEN) [] >>>>> common_interrupt+0x62/0x70 >>>>> Jan 24 00:35:16.698653 (XEN) [] >>>>> _spin_unlock_irq+0x30/0x31 >>>>> Jan 24 00:35:16.706604 (XEN) [] >>>>> __do_softirq+0x81/0x8c >>>>> Jan 24 00:35:16.706638 (XEN) [] >>>>> do_softirq+0x13/0x15 >>>>> Jan 24 00:35:16.718591 (XEN) [] >>>>> vmx_asm_do_vmentry+0x2a/0x50 >>>> I think I see what the problem here is: Commit 8097616fbd >>>> ("x86/VPMU: handle APIC_LVTPC accesses") gives the guest >>>> control over LVTPC.mask regardless of whether the vPMU was >>>> actually initialized for it. Supposedly in the case above the >>>> guest is being run with core2_no_vpmu_ops, which in >>>> particular has .do_interrupt == NULL. It's not immediately >>>> clear whether vpmu_lvtpc_update() should do the check or its >>>> (sole) caller. In any event I'm going to revert that commit as >>>> the primary suspect for causing the regression. >>> I have just fallen over this as well. I second a revert in the absence >>> of a clear way to fix the patch. >> I can't reproduce this -- neither at this patch level nor at full series. >> >> Yes, we can test for do_interrupt presence in vpmu_lvtpc_update() (or >> in vpmu_interrupt() itself) but since we cannot arm the counters >> (there is no do_wrmsr op) I am not sure I understand what can trigger >> this interrupt. >> >> -boris >> >> > As Jan explained, The patch in question allows guests (windows in both > problematic cases) to arm LVTPC, with a vpmu instance with a NULL > pointer for do_interrupt. Right, I understand that. But you'd need to arm the counters (not just APIC) for the interrupt to happen, wouldn't you? And I don't see how that can happen. This issue also appears to happen on debian, so it's not Windows only. In any case, you should indeed revert this until I resend a safer patch. Before I do that I'd like to be able to reproduce this though. -boris > > When a pmu apic interrupt arrives, the interrupt handler dies from a > NULL function pointer dereference. > > ~Andrew