From mboxrd@z Thu Jan 1 00:00:00 1970 From: Konrad Rzeszutek Wilk Subject: Re: [Xen-users] Xen 4.3.1 / Linux 3.12 panic Date: Wed, 6 Nov 2013 15:44:58 -0500 Message-ID: <20131106204458.GA22742@phenom.dumpdata.com> References: <20131105131934.GB7624@uil.winnipeg.nl> <20131106091247.GC7624@uil.winnipeg.nl> <1383730899.26213.16.camel@kazak.uk.xensource.com> <20131106102004.GD7624@uil.winnipeg.nl> <1383735067.26213.65.camel@kazak.uk.xensource.com> <20131106132526.GE7624@uil.winnipeg.nl> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: <20131106132526.GE7624@uil.winnipeg.nl> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: xen-users@lists.xen.org, xen-devel@lists.xen.org, jbeulich@suse.com, insong.liu@intel.com, benv-xensource.com@junerules.com List-Id: xen-devel@lists.xenproject.org On Wed, Nov 06, 2013 at 02:25:28PM +0100, Wouter de Geus wrote: > * Ian Campbell [2013-11-06 10:51:07 +0000]: > > > > If this turns out to be stable I'll try again with cpufreq=dom0 to see if > > > that's also stable. I'll report my findings if you care. > > > > Please do. > > With cpufreq=none I've been able to run through a windows 2008 installation > and some kernel compiles without problems. After that I rebooted with > cpufreq=dom0, and within 5 minutes ran into the first oops again: Is there a particular reason you had tried 'cpufreq'? Sorry if that was answered earlier? > > [ 428.105061] BUG: unable to handle kernel paging request at ffffea0000dd8a48 > [ 428.105103] IP: [] unmap_single_vma+0x426/0x820 > [ 428.105115] PGD 1281d6067 PUD 1281d5067 PMD 1281ce067 PTE 801000097bf53068 > [ 428.105123] Oops: 0000 [#1] SMP > [ 428.105127] Modules linked in: > [ 428.105133] CPU: 3 PID: 1786 Comm: sh Not tainted 3.12.0-Desman #32 > [ 428.105138] Hardware name: Supermicro H8DG6/H8DGi/H8DG6/H8DGi, BIOS 3.0 09/10/2012 > [ 428.105142] task: ffff88011dbb1590 ti: ffff8800d5088000 task.ti: ffff8800d5088000 > [ 428.105147] RIP: e030:[] [] unmap_single_vma+0x426/0x820 > [ 428.105154] RSP: e02b:ffff8800d5089d30 EFLAGS: 00010246 > [ 428.105157] RAX: 80000008002db165 RBX: ffff8800d2ad0d60 RCX: 0000000000dd8a40 > [ 428.105161] RDX: 80000008002db165 RSI: 0000000001fac000 RDI: 80000008002db165 > [ 428.105165] RBP: ffffea0000dd8a40 R08: ffff8800d2b52cf0 R09: 00000000fffffffa > [ 428.105169] R10: 0000000000000a6f R11: 00000063ad0a7abc R12: 0000000001fe5000 > [ 428.105173] R13: ffffc00000000fff R14: 0000000001fac000 R15: ffff8800d5089e40 > [ 428.105181] FS: 00002b839c48c600(0000) GS:ffff880122a60000(0000) knlGS:0000000000000000 > [ 428.105186] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 428.105215] CR2: ffffea0000dd8a48 CR3: 00000000021de000 CR4: 0000000000040660 > [ 428.105220] Stack: > [ 428.105222] ffff8800d6961c00 0000000000000000 ffff8800d2b52cf0 0000000000000000 > [ 428.105229] ffffea00034ab430 80000008002db165 ffff8800c331c078 0000000001fe5000 > [ 428.105236] ffff880000000000 00003ffffffff000 ffff88011dbb1590 0000000001fe4fff > [ 428.105242] Call Trace: > [ 428.105248] [] ? unmap_vmas+0x41/0x90 > [ 428.105254] [] ? exit_mmap+0x8a/0x150 > [ 428.105261] [] ? mmput+0x49/0x100 > [ 428.105267] [] ? do_exit+0x273/0xa30 > [ 428.105273] [] ? vtime_account_user+0x45/0x60 > [ 428.105278] [] ? do_group_exit+0x34/0xa0 > [ 428.105284] [] ? SyS_exit_group+0xb/0x10 > [ 428.105290] [] ? tracesys+0xe1/0xe6 > [ 428.105294] Code: 48 8b 3c 24 4c 89 f6 48 89 da 66 66 66 90 66 66 90 41 80 4f 18 01 48 85 ed 0f 84 7a ff ff ff 48 83 7c 24 18 00 0f 85 02 03 00 00 45 08 01 0f 84 70 01 00 00 48 89 ef ff 8c 24 98 00 00 00 e8 > [ 428.105347] RIP [] unmap_single_vma+0x426/0x820 > [ 428.105353] RSP > [ 428.105356] CR2: ffffea0000dd8a48 > [ 428.105360] ---[ end trace 81935aa1c6524ae3 ]--- > > > I suspect it shouldn't be necessary to use command lines to override > > these things, but I've no idea how to diagnose this further. > > Removing the entire cpufreq part from my dom0 kernel might help :) > But then again, if that's a problem I would like the hypervisor to detect > and avoid this problem if that's possible. So the cpufreq=dom0 is kind of an nops as the Linux kernel will disable the native CPUfreq machinery. This is done b/c it does not make sense for Linux dom0 to control the CPU freq when it has no idea of the workloads (the hypervisor has it). But with the 'cpufreq=dom0' you are getting faults. So the other question is - does anything happen if you disable ACPI power states in the BIOS? > > > Once you have the findings if you could post a summary to xen-devel and > > CC jbeulich@suse.com & insong.liu@intel.com (cpufreq/power mgmt > > maintainers) perhaps they can advise. > > Summary: > -------- > The issue: Xen 4.3.1 and my Linux 3.12 build (with cpufreq) panics (page > requests, GPF, bad page state) usually within a few minutes. > When Xen is booted with cpufreq=none the problem seems to disappear, with > cpufreq=dom0 the problem is still there. > The machine I run this on is a dual opteron 6212 with 64GB ECC RAM on a > Supermicro H8DGi board. > > Regards, > > Wouter. > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel