From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1LfG8k-0002oT-Ly for qemu-devel@nongnu.org; Thu, 05 Mar 2009 11:09:42 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1LfG8j-0002o9-US for qemu-devel@nongnu.org; Thu, 05 Mar 2009 11:09:42 -0500 Received: from [199.232.76.173] (port=55095 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1LfG8j-0002o3-QO for qemu-devel@nongnu.org; Thu, 05 Mar 2009 11:09:41 -0500 Received: from mx2.suse.de ([195.135.220.15]:43804) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1LfG8j-0000Lv-5K for qemu-devel@nongnu.org; Thu, 05 Mar 2009 11:09:41 -0500 Message-ID: <49AFF942.6000708@suse.de> Date: Thu, 05 Mar 2009 17:09:38 +0100 From: Alexander Graf MIME-Version: 1.0 Subject: Re: [Qemu-devel] [PATCH 7/7] PPC64: Don't fault at lwsync References: <1236262454-6293-1-git-send-email-agraf@suse.de> <1236262454-6293-7-git-send-email-agraf@suse.de> <1236262454-6293-8-git-send-email-agraf@suse.de> <200903051507.30592.paul@codesourcery.com> <49AFF663.6020006@suse.de> In-Reply-To: <49AFF663.6020006@suse.de> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Cc: blauwirbel@gmail.com, Alexander Graf , Paul Brook Alexander Graf wrote: > Paul Brook wrote: > >> On Thursday 05 March 2009, Alexander Graf wrote: >> >> >>> Right now we can throw a fault on lwsync, even though the fault is >>> actually caused by the instruction after lwsync. >>> >>> I haven't found the magic that messed this up, but for now we can >>> just end the TB on lwsync, forcing the next command to issue faults >>> itself. >>> >>> If anyone knows how to really fix this, please step forward and do >>> so. This only makes things work at all for me :-). >>> >>> >> Where is the subsequent fault coming from? I suspect the real bug is nothing >> to do with lwsync, and the subsequent fault is actually just corrupting the >> CPU state. As discussed recently this is the same bug SPARC has with its >> unassigned access handlers. >> >> Paul >> >> > > Without the patch I get: > > Unable to handle kernel paging request for data at address 0x00000000 > Faulting instruction address: 0xc0000000000ba524 > Oops: Kernel access of bad area, sig: 11 [#1] > SMP NR_CPUS=1024 NUMA PowerMac > Modules linked in: > Supported: Yes > NIP: c0000000000ba524 LR: c000000000775a0c CTR: c0000000007759e8 > REGS: c0000000061afb10 TRAP: 0300 Not tainted (2.6.27.7-9-ppc64) > MSR: 8000000000009032 CR: 84000044 XER: 20000000 > DAR: 0000000000000000, DSISR: 0000000040000000 > TASK = c00000000619d560[1] 'swapper' THREAD: c0000000061ac000 CPU: 0 > GPR00: ffffffffffffffff c0000000061afd90 c0000000009bbce8 0000000000000000 > GPR04: 0000000000000000 0000000000000000 0000000000000000 c000000000a82c80 > GPR08: 0000000000000613 c00000000619d560 c0000000070704c0 c0000000061ac000 > GPR12: 0000000088000044 c000000000a82c80 0000000000051b63 0000000000051a41 > GPR16: 0000000000051b5b 000000000004003c 0000000000053958 000000000005345e > GPR20: 0000000000052fd4 0000000000063dc8 0000000000063db4 00000000fff0245c > GPR24: 4000000002110000 c0000000007932f8 c000000000b077a8 0000000000000000 > GPR28: c0000000009621f0 c0000000007b01c8 c000000000938f18 c0000000007afce0 > NIP [c0000000000ba524] .cmpxchg_futex_value_locked+0x38/0x78 > LR [c000000000775a0c] .futex_init+0x24/0xac > Call Trace: > [c0000000061afd90] [c0000000007759c0] .init_tstats_procfs+0x2c/0x54 > (unreliable) > [c0000000061afe10] [c00000000000944c] .do_one_initcall+0x78/0x194 > [c0000000061aff00] [c000000000750440] .kernel_init+0xd0/0x148 > [c0000000061aff90] [c00000000002ad84] .kernel_thread+0x4c/0x68 > Instruction dump: > 39290001 912b0014 7c8407b4 7ca507b4 e92d01b0 e8090520 7fa30040 419d0038 > e92d01b0 e8090520 2ba00003 409d0028 <7c2004ac> 7c001828 7c002000 40c20010 > ---[ end trace 561bb236c800851f ]--- > note: swapper[1] exited with preempt_count 1 > swapper used greatest stack depth: 9296 bytes left > Kernel panic - not syncing: Attempted to kill init! > > > Which is this translation block: > > NIP c0000000000ba524 LR c000000000775a0c CTR c0000000007759e8 XER 20000000 > MSR 8000000000009032 HID0 0000000060000000 HF 8000000000000000 idx 1 > TB 00000000 d8b159bb DECR 0007c417 > GPR00 ffffffffffffffff c0000000061afd90 c0000000009bbce8 0000000000000000 > GPR04 0000000000000000 0000000000000000 0000000000000000 c000000000a82c80 > GPR08 0000000000000613 c00000000619d560 c0000000070704c0 c0000000061ac000 > GPR12 0000000088000044 c000000000a82c80 0000000000051b63 0000000000051a41 > GPR16 0000000000051b5b 000000000004003c 0000000000053958 000000000005345e > GPR20 0000000000052fd4 0000000000063dc8 0000000000063db4 00000000fff0245c > GPR24 4000000002110000 c0000000007932f8 c000000000b077a8 0000000000000000 > GPR28 c0000000009621f0 c0000000007b01c8 c000000000938f18 c0000000007afce0 > CR 84000044 [ L G - - - - G G ] RES ffffffffffffffff > FPR00 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > FPR04 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > FPR08 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > FPR12 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > FPR16 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > FPR20 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > FPR24 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > FPR28 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > FPSCR 00000000 > SRR0 c000000000774950 SRR1 8000000000009032 SDR1 0000000007c00003 > IN: > 0xc0000000000ba524: lwsync > 0xc0000000000ba528: lwarx r0,0,r3 > 0xc0000000000ba52c: cmpw r0,r4 > 0xc0000000000ba530: bne- 0xc0000000000ba540 > > > And I seriously have trouble understanding how a data storage exception > could happen on the lwsync opcode. It looks like R3 became 0 from the > guest's point of view after lwsync though - hum. > Ah I remember that one now :-). The futex_init function tests if cmpxchg works with NULL values and that's why R3 is 0. It's actually _supposed_ to fault here. But something gets messed up when the fault happens on IP=lwsync instead of IP=lwarx and I haven't really researched into why. Alex