From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.33) id 1CgVzl-00080W-Df for qemu-devel@nongnu.org; Mon, 20 Dec 2004 17:27:13 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.33) id 1CgVzj-0007zG-5X for qemu-devel@nongnu.org; Mon, 20 Dec 2004 17:27:11 -0500 Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.33) id 1CgVzj-0007zD-2w for qemu-devel@nongnu.org; Mon, 20 Dec 2004 17:27:11 -0500 Received: from [64.233.184.201] (helo=wproxy.gmail.com) by monty-python.gnu.org with esmtp (Exim 4.34) id 1CgVlZ-0001Z4-JM for qemu-devel@nongnu.org; Mon, 20 Dec 2004 17:12:33 -0500 Received: by wproxy.gmail.com with SMTP id 63so115960wri for ; Mon, 20 Dec 2004 14:12:31 -0800 (PST) Message-ID: Date: Mon, 20 Dec 2004 23:12:31 +0100 From: Magnus Damm In-Reply-To: Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit References: Subject: [Qemu-devel] Re: [PATCH] CONFIG_MMU_MAP powerpc host support Reply-To: Magnus Damm , qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Piotras Cc: qemu-devel@nongnu.org On Mon, 20 Dec 2004 22:31:34 +0100, Piotras wrote: > Great! > > Do you have an estimate of possible performance gain by introducing > direct pointer to mmu_map for memory read? No, I have no idea. The speed gain is probably host cpu specific. I believe one downside with the current micro operations design is that the compiler is unable to order the host instructions in a optimal way because each micro operation is so damn small... And of course we want them small. But if we could combine the most popular guest instructions into one micro operation (which is not so micro anymore) then the compiler could rearrange things to fully take advantage of the host cpu. It all boils down to some table-based generic guest opcode matching code that does a longest prefix match and supports masking of bitfields... I think it would be very interesting to collect opcode statistics for certain guest operating systems. Or maybe someone already has done that? > I have two ideas for future experimentation. > > There is a trick possible without wasting another register for global > variable: use two copies of CPUState (one for privileged and another > for user mode), then make mmu_map.add_read first member of the > struct. This would introduce guest register coping for user/supervisor > switch, but maybe performance gain would justify this. > > Another idea: if we could align add_read/add_write on 64k boundary, > "addi" could be removed. Yes, both are good ideas IMHO. Would your future experimentation improve x86 performance? If this is powerpc-specific then I think we could use one or two registers (one for read and one for write) and modify these registers each time the processor changes between user and kernel mode. I think the big limitation right now for powerpc is that the good old /* suppress this hack */ never worked out... And I also think we are reaching a limit here, of course we would gain some by reducing the number of instructions but soon there are not many instructions left to remove... So the bottleneck must be somewhere else. / magnus > On Mon, 20 Dec 2004 18:55:21 +0100, Magnus Damm wrote: > > Hello, > > > > This patch adds powerpc host support to the CONFIG_MMU_MAP patch > > written by Piotrek. My patch should be applied on top of > > v1-part[1-3].patch.gz. I have only tested the code with a x86 guest on > > a ppc host running Linux - someone, please test on a host running OSX. > > > > Performance gain reported by nbench: > > > > Memory index: 50% > > Integer index: 44% > > Fp index: 4% > > > > Right now each map-memory access consists of 5-6 powerpc instructions. > > If a direct pointer to mem_map could be kept in a register then we > > would be down to 3-4 instructions per memoy access... > > > > / magnus >