From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1LfFwu-0008IX-6p
	for qemu-devel@nongnu.org; Thu, 05 Mar 2009 10:57:28 -0500
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1LfFwt-0008HT-8I
	for qemu-devel@nongnu.org; Thu, 05 Mar 2009 10:57:27 -0500
Received: from [199.232.76.173] (port=38401 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1LfFwt-0008HK-0R
	for qemu-devel@nongnu.org; Thu, 05 Mar 2009 10:57:27 -0500
Received: from ns.suse.de ([195.135.220.2]:53725 helo=mx1.suse.de)
	by monty-python.gnu.org with esmtps
	(TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60)
	(envelope-from <agraf@suse.de>) id 1LfFws-0007He-BI
	for qemu-devel@nongnu.org; Thu, 05 Mar 2009 10:57:26 -0500
Message-ID: <49AFF663.6020006@suse.de>
Date: Thu, 05 Mar 2009 16:57:23 +0100
From: Alexander Graf <agraf@suse.de>
MIME-Version: 1.0
Subject: Re: [Qemu-devel] [PATCH 7/7] PPC64: Don't fault at lwsync
References: <1236262454-6293-1-git-send-email-agraf@suse.de>
	<1236262454-6293-7-git-send-email-agraf@suse.de>
	<1236262454-6293-8-git-send-email-agraf@suse.de>
	<200903051507.30592.paul@codesourcery.com>
In-Reply-To: <200903051507.30592.paul@codesourcery.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Reply-To: qemu-devel@nongnu.org
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Paul Brook <paul@codesourcery.com>
Cc: blauwirbel@gmail.com, Alexander Graf <alex@csgraf.de>, qemu-devel@nongnu.org

Paul Brook wrote:
> On Thursday 05 March 2009, Alexander Graf wrote:
>   
>> Right now we can throw a fault on lwsync, even though the fault is
>> actually caused by the instruction after lwsync.
>>
>> I haven't found the magic that messed this up, but for now we can
>> just end the TB on lwsync, forcing the next command to issue faults
>> itself.
>>
>> If anyone knows how to really fix this, please step forward and do
>> so. This only makes things work at all for me :-).
>>     
>
> Where is the subsequent fault coming from? I suspect the real bug is nothing 
> to do with lwsync, and the subsequent fault is actually just corrupting the 
> CPU state. As discussed recently this is the same bug SPARC has with its 
> unassigned access handlers.
>
> Paul
>   

Without the patch I get:

Unable to handle kernel paging request for data at address 0x00000000
Faulting instruction address: 0xc0000000000ba524
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=1024 NUMA PowerMac
Modules linked in:
Supported: Yes
NIP: c0000000000ba524 LR: c000000000775a0c CTR: c0000000007759e8
REGS: c0000000061afb10 TRAP: 0300   Not tainted  (2.6.27.7-9-ppc64)
MSR: 8000000000009032 <EE,ME,IR,DR>  CR: 84000044  XER: 20000000
DAR: 0000000000000000, DSISR: 0000000040000000
TASK = c00000000619d560[1] 'swapper' THREAD: c0000000061ac000 CPU: 0
GPR00: ffffffffffffffff c0000000061afd90 c0000000009bbce8 0000000000000000
GPR04: 0000000000000000 0000000000000000 0000000000000000 c000000000a82c80
GPR08: 0000000000000613 c00000000619d560 c0000000070704c0 c0000000061ac000
GPR12: 0000000088000044 c000000000a82c80 0000000000051b63 0000000000051a41
GPR16: 0000000000051b5b 000000000004003c 0000000000053958 000000000005345e
GPR20: 0000000000052fd4 0000000000063dc8 0000000000063db4 00000000fff0245c
GPR24: 4000000002110000 c0000000007932f8 c000000000b077a8 0000000000000000
GPR28: c0000000009621f0 c0000000007b01c8 c000000000938f18 c0000000007afce0
NIP [c0000000000ba524] .cmpxchg_futex_value_locked+0x38/0x78
LR [c000000000775a0c] .futex_init+0x24/0xac
Call Trace:
[c0000000061afd90] [c0000000007759c0] .init_tstats_procfs+0x2c/0x54
(unreliable)
[c0000000061afe10] [c00000000000944c] .do_one_initcall+0x78/0x194
[c0000000061aff00] [c000000000750440] .kernel_init+0xd0/0x148
[c0000000061aff90] [c00000000002ad84] .kernel_thread+0x4c/0x68
Instruction dump:
39290001 912b0014 7c8407b4 7ca507b4 e92d01b0 e8090520 7fa30040 419d0038
e92d01b0 e8090520 2ba00003 409d0028 <7c2004ac> 7c001828 7c002000 40c20010
---[ end trace 561bb236c800851f ]---
note: swapper[1] exited with preempt_count 1
swapper used greatest stack depth: 9296 bytes left
Kernel panic - not syncing: Attempted to kill init!


Which is this translation block:

NIP c0000000000ba524   LR c000000000775a0c CTR c0000000007759e8 XER 20000000
MSR 8000000000009032 HID0 0000000060000000  HF 8000000000000000 idx 1
TB 00000000 d8b159bb DECR 0007c417
GPR00 ffffffffffffffff c0000000061afd90 c0000000009bbce8 0000000000000000
GPR04 0000000000000000 0000000000000000 0000000000000000 c000000000a82c80
GPR08 0000000000000613 c00000000619d560 c0000000070704c0 c0000000061ac000
GPR12 0000000088000044 c000000000a82c80 0000000000051b63 0000000000051a41
GPR16 0000000000051b5b 000000000004003c 0000000000053958 000000000005345e
GPR20 0000000000052fd4 0000000000063dc8 0000000000063db4 00000000fff0245c
GPR24 4000000002110000 c0000000007932f8 c000000000b077a8 0000000000000000
GPR28 c0000000009621f0 c0000000007b01c8 c000000000938f18 c0000000007afce0
CR 84000044  [ L  G  -  -  -  -  G  G  ]             RES ffffffffffffffff
FPR00 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR04 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR08 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR12 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR16 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR20 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR24 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPR28 0000000000000000 0000000000000000 0000000000000000 0000000000000000
FPSCR 00000000
SRR0 c000000000774950 SRR1 8000000000009032 SDR1 0000000007c00003
IN:
0xc0000000000ba524:  lwsync
0xc0000000000ba528:  lwarx   r0,0,r3
0xc0000000000ba52c:  cmpw    r0,r4
0xc0000000000ba530:  bne-    0xc0000000000ba540


And I seriously have trouble understanding how a data storage exception
could happen on the lwsync opcode. It looks like R3 became 0 from the
guest's point of view after lwsync though - hum.

Alex