From mboxrd@z Thu Jan  1 00:00:00 1970
From: Jeremy Fitzhardinge <jeremy@goop.org>
Subject: Re: dom0 pvops crash apparently due to guest migration
Date: Mon, 29 Nov 2010 10:53:48 -0800
Message-ID: <4CF3F6BC.3020906@goop.org>
References: <19699.38294.527879.943733@mariner.uk.xensource.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xensource.com>
In-Reply-To: <19699.38294.527879.943733@mariner.uk.xensource.com>
List-Unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xensource.com>
List-Help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-Subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
Sender: xen-devel-bounces@lists.xensource.com
Errors-To: xen-devel-bounces@lists.xensource.com
To: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: xen-devel@lists.xensource.com
List-Id: xen-devel@lists.xenproject.org

On 11/29/2010 03:59 AM, Ian Jackson wrote:
> One of my test boxes encountered the crash whose oops you see below.
> It doesn't do it every time, or even every time on this machine (since
> the credit2 test in the same run worked).  The crash seems to have
> occurred just at the end of the migration of a PV guest.

Do you have a feel for what the likelihood of failure is?  Has this
started happening recently?

> The setup is 32-bit dom0 and domU on 64-bit Xen.
> The pvops kernel version was 56eabf9f2a6632d3b2ef.
>
> The complete logs are here:
>   http://www.chiark.greenend.org.uk/~xensrcts/logs/2847/test-amd64-i386-xl-multivcpu/
> (The machine has since been reused so those logs are what there is.)
>
> Ian.
>
> ------------[ cut here ]------------
> kernel BUG at arch/x86/mm/fault.c:210!
> invalid opcode: 0000 [#1] SMP 
> last sysfs file: /sys/devices/virtual/net/lo/operstate
> Modules linked in: e1000e [last unloaded: scsi_wait_scan]
>
> Pid: 22, comm: xenwatch Not tainted (2.6.32.26 #1)         
> EIP: 0061:[<c104c058>] EFLAGS: 00010082 CPU: 0
> EIP is at vmalloc_sync_one+0x118/0x128
> EAX: 003f8360 EBX: 1fc1b067 ECX: ffffffe0 EDX: ab273fff
> ESI: 00000000 EDI: c182adf0 EBP: dfcdbe88 ESP: dfcdbe64
>  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
> Process xenwatch (pid: 22, ti=dfcda000 task=dfccc510 task.ti=dfcda000)
> Stack:
>  dbd7b384 00cdbe88 00000000 c568f200 dbd7b384 ab273fff f7c00000 c568f200
> <0> dbd7b384 dfcdbea8 c104ca9a c182adf0 c1780204 dbd75f40 dfd45a20 dbd75f40
> <0> dfcdbf5c dfcdbeb4 c10df14a dfcdbf1c dfcdbef8 c12313b1 0000001b 00000008
> Call Trace:
>  [<c104ca9a>] ? vmalloc_sync_all+0x5c/0xbe
>  [<c10df14a>] ? alloc_vm_area+0x44/0x4b

Hm, I'm still not really sure why alloc_vm_area() does a
vmalloc_sync_all in the first place...  But that BUG shouldn't happen
regardless.

    J

>  [<c12313b1>] ? blkif_map+0x2d/0x204
>  [<c1230cbb>] ? frontend_changed+0x194/0x209
>  [<c1229b39>] ? xenbus_otherend_changed+0x5c/0x61
>  [<c1229c97>] ? frontend_changed+0xa/0xd
>  [<c1228783>] ? xenwatch_thread+0xf6/0x11e
>  [<c10795df>] ? autoremove_wake_function+0x0/0x33
>  [<c122868d>] ? xenwatch_thread+0x0/0x11e
>  [<c1079397>] ? kthread+0x61/0x66
>  [<c1079336>] ? kthread+0x0/0x66
>  [<c1030dd7>] ? kernel_thread_helper+0x7/0x10
> Code: eb fe 89 d8 89 f2 ff 15 08 7d 68 c1 89 d6 8b 55 f0 89 c3 89 c8 0f ac d0 0c 89 c1 89 d8 0f ac f0 0c c1 e1 05 c1 e0 05 39 c1 74 06 <0f> 0b eb fe 31 ff 83 c4 18 89 f8 5b 5e 5f 5d c3 55 89 e5 56 53 
> EIP: [<c104c058>] vmalloc_sync_one+0x118/0x128 SS:ESP 0069:dfcdbe64
> ---[ end trace 7b608ed9c5e5ed4e ]---
>