From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1LfFwu-0008IX-6p for qemu-devel@nongnu.org; Thu, 05 Mar 2009 10:57:28 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1LfFwt-0008HT-8I for qemu-devel@nongnu.org; Thu, 05 Mar 2009 10:57:27 -0500 Received: from [199.232.76.173] (port=38401 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1LfFwt-0008HK-0R for qemu-devel@nongnu.org; Thu, 05 Mar 2009 10:57:27 -0500 Received: from ns.suse.de ([195.135.220.2]:53725 helo=mx1.suse.de) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1LfFws-0007He-BI for qemu-devel@nongnu.org; Thu, 05 Mar 2009 10:57:26 -0500 Message-ID: <49AFF663.6020006@suse.de> Date: Thu, 05 Mar 2009 16:57:23 +0100 From: Alexander Graf MIME-Version: 1.0 Subject: Re: [Qemu-devel] [PATCH 7/7] PPC64: Don't fault at lwsync References: <1236262454-6293-1-git-send-email-agraf@suse.de> <1236262454-6293-7-git-send-email-agraf@suse.de> <1236262454-6293-8-git-send-email-agraf@suse.de> <200903051507.30592.paul@codesourcery.com> In-Reply-To: <200903051507.30592.paul@codesourcery.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paul Brook Cc: blauwirbel@gmail.com, Alexander Graf , qemu-devel@nongnu.org Paul Brook wrote: > On Thursday 05 March 2009, Alexander Graf wrote: > >> Right now we can throw a fault on lwsync, even though the fault is >> actually caused by the instruction after lwsync. >> >> I haven't found the magic that messed this up, but for now we can >> just end the TB on lwsync, forcing the next command to issue faults >> itself. >> >> If anyone knows how to really fix this, please step forward and do >> so. This only makes things work at all for me :-). >> > > Where is the subsequent fault coming from? I suspect the real bug is nothing > to do with lwsync, and the subsequent fault is actually just corrupting the > CPU state. As discussed recently this is the same bug SPARC has with its > unassigned access handlers. > > Paul > Without the patch I get: Unable to handle kernel paging request for data at address 0x00000000 Faulting instruction address: 0xc0000000000ba524 Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=1024 NUMA PowerMac Modules linked in: Supported: Yes NIP: c0000000000ba524 LR: c000000000775a0c CTR: c0000000007759e8 REGS: c0000000061afb10 TRAP: 0300 Not tainted (2.6.27.7-9-ppc64) MSR: 8000000000009032 CR: 84000044 XER: 20000000 DAR: 0000000000000000, DSISR: 0000000040000000 TASK = c00000000619d560[1] 'swapper' THREAD: c0000000061ac000 CPU: 0 GPR00: ffffffffffffffff c0000000061afd90 c0000000009bbce8 0000000000000000 GPR04: 0000000000000000 0000000000000000 0000000000000000 c000000000a82c80 GPR08: 0000000000000613 c00000000619d560 c0000000070704c0 c0000000061ac000 GPR12: 0000000088000044 c000000000a82c80 0000000000051b63 0000000000051a41 GPR16: 0000000000051b5b 000000000004003c 0000000000053958 000000000005345e GPR20: 0000000000052fd4 0000000000063dc8 0000000000063db4 00000000fff0245c GPR24: 4000000002110000 c0000000007932f8 c000000000b077a8 0000000000000000 GPR28: c0000000009621f0 c0000000007b01c8 c000000000938f18 c0000000007afce0 NIP [c0000000000ba524] .cmpxchg_futex_value_locked+0x38/0x78 LR [c000000000775a0c] .futex_init+0x24/0xac Call Trace: [c0000000061afd90] [c0000000007759c0] .init_tstats_procfs+0x2c/0x54 (unreliable) [c0000000061afe10] [c00000000000944c] .do_one_initcall+0x78/0x194 [c0000000061aff00] [c000000000750440] .kernel_init+0xd0/0x148 [c0000000061aff90] [c00000000002ad84] .kernel_thread+0x4c/0x68 Instruction dump: 39290001 912b0014 7c8407b4 7ca507b4 e92d01b0 e8090520 7fa30040 419d0038 e92d01b0 e8090520 2ba00003 409d0028 <7c2004ac> 7c001828 7c002000 40c20010 ---[ end trace 561bb236c800851f ]--- note: swapper[1] exited with preempt_count 1 swapper used greatest stack depth: 9296 bytes left Kernel panic - not syncing: Attempted to kill init! Which is this translation block: NIP c0000000000ba524 LR c000000000775a0c CTR c0000000007759e8 XER 20000000 MSR 8000000000009032 HID0 0000000060000000 HF 8000000000000000 idx 1 TB 00000000 d8b159bb DECR 0007c417 GPR00 ffffffffffffffff c0000000061afd90 c0000000009bbce8 0000000000000000 GPR04 0000000000000000 0000000000000000 0000000000000000 c000000000a82c80 GPR08 0000000000000613 c00000000619d560 c0000000070704c0 c0000000061ac000 GPR12 0000000088000044 c000000000a82c80 0000000000051b63 0000000000051a41 GPR16 0000000000051b5b 000000000004003c 0000000000053958 000000000005345e GPR20 0000000000052fd4 0000000000063dc8 0000000000063db4 00000000fff0245c GPR24 4000000002110000 c0000000007932f8 c000000000b077a8 0000000000000000 GPR28 c0000000009621f0 c0000000007b01c8 c000000000938f18 c0000000007afce0 CR 84000044 [ L G - - - - G G ] RES ffffffffffffffff FPR00 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR04 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR08 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR12 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR16 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR20 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR24 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPR28 0000000000000000 0000000000000000 0000000000000000 0000000000000000 FPSCR 00000000 SRR0 c000000000774950 SRR1 8000000000009032 SDR1 0000000007c00003 IN: 0xc0000000000ba524: lwsync 0xc0000000000ba528: lwarx r0,0,r3 0xc0000000000ba52c: cmpw r0,r4 0xc0000000000ba530: bne- 0xc0000000000ba540 And I seriously have trouble understanding how a data storage exception could happen on the lwsync opcode. It looks like R3 became 0 from the guest's point of view after lwsync though - hum. Alex