From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1MBZLI-00073M-Ag
	for qemu-devel@nongnu.org; Tue, 02 Jun 2009 15:08:12 -0400
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1MBZLC-0006x2-Nr
	for qemu-devel@nongnu.org; Tue, 02 Jun 2009 15:08:12 -0400
Received: from [199.232.76.173] (port=55120 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1MBZLC-0006wt-K8
	for qemu-devel@nongnu.org; Tue, 02 Jun 2009 15:08:06 -0400
Received: from mx2.redhat.com ([66.187.237.31]:42345)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <avi@redhat.com>) id 1MBZLB-0007XK-Rk
	for qemu-devel@nongnu.org; Tue, 02 Jun 2009 15:08:06 -0400
Received: from int-mx2.corp.redhat.com (int-mx2.corp.redhat.com [172.16.27.26])
	by mx2.redhat.com (8.13.8/8.13.8) with ESMTP id n52J852b016415
	for <qemu-devel@nongnu.org>; Tue, 2 Jun 2009 15:08:05 -0400
Message-ID: <4A257890.3000706@redhat.com>
Date: Tue, 02 Jun 2009 22:08:00 +0300
From: Avi Kivity <avi@redhat.com>
MIME-Version: 1.0
Subject: Re: [Qemu-devel] i586 TCG: boot hangs intermittently on cryptomgr_test
	at doublefault_fn
References: <20090602175833.GA26882@amd.home.annexia.org>
In-Reply-To: <20090602175833.GA26882@amd.home.annexia.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Richard W.M. Jones" <rjones@redhat.com>
Cc: qemu-devel@nongnu.org

Richard W.M. Jones wrote:
> I have this bug[1] apparently in qemu which I'm trying to track down:
>
> ----------------------------------------------------------------------
> apm: BIOS version 1.2 Flags 0x03 (Driver version 1.16ac)
> apm: overridden by ACPI.
> audit: initializing netlink socket (disabled)
> type=2000 audit(1243614582.002:1): initialized
> HugeTLB registered 4 MB page size, pre-allocated 0 pages
> VFS: Disk quotas dquot_6.5.2
> Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
> msgmni has been set to 680
> BUG: unable to handle kernel NULL pointer dereference at 00000014
> IP: [<c041ddd9>] doublefault_fn+0xd/0x108
> *pde = 00000000 
> Oops: 0000 [#1] SMP 
> last sysfs file: 
> Modules linked in:
>
> Pid: 26, comm: cryptomgr_test Not tainted (2.6.30-0.91.rc7.git1.fc12.i586 #1) 
> EIP: 0060:[<c041ddd9>] EFLAGS: f8d8409e CPU: 0
> EIP is at doublefault_fn+0xd/0x108
> EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 00000000
> ESI: 00000000 EDI: 00000000 EBP: c0be1e2c ESP: c0be1e18
>  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> Process cryptomgr_test (pid: 26, ti=c0be0000 task=d5418000 task.ti=d5b88000)
> Stack:
>  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
>  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
>  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> Call Trace:
> Code: c2 eb 00 ba b8 dd 41 c0 ff e2 8d 15 e4 61 99 c0 8b 0a 51 8d 15 e0 61 99
> c0 8b 0a 51 c3 90 55 89 e5 56 53 83 ec 0c 0f 1f 44 00 00 <65> a1 14 00 00 00 89
> 45 f4 31 c0 8d 45 ee 66 c7 45 ee 00 00 c7 
> EIP: [<c041ddd9>] doublefault_fn+0xd/0x108 SS:ESP 0068:c0be1e18
> CR2: 0000000000000014
> ---[ end trace 6d450e935ee1897c ]---
> cryptomgr_test used greatest stack depth: 7348 bytes left
> ----------------------------------------------------------------------
>
> It seems to be: i386 architecture only, software emulation, and
> intermittent, quite hard to reproduce reliably.
>
> So my questions are: Has anyone seen anything like this before?
> Is there anything I can set or enable to track down which instructions
> are failing?
>   

The faulting instruction accesses gs:0x14.  Can you expand the register 
printout code to include the full information for the segment cache 
(base, limit, type, etc.)?

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.