From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1IgPBm-0002HE-F9 for qemu-devel@nongnu.org; Fri, 12 Oct 2007 14:24:46 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1IgPBk-0002Eg-LL for qemu-devel@nongnu.org; Fri, 12 Oct 2007 14:24:45 -0400 Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1IgPBk-0002EU-HH for qemu-devel@nongnu.org; Fri, 12 Oct 2007 14:24:44 -0400 Received: from honiara.magic.fr ([195.154.193.36]) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1IgPBk-0006h9-4Q for qemu-devel@nongnu.org; Fri, 12 Oct 2007 14:24:44 -0400 Received: from [172.17.17.137] (gw.netgem.com [195.68.2.34]) by honiara.magic.fr (8.13.1/8.13.1) with ESMTP id l9CIOe2m019193 for ; Fri, 12 Oct 2007 20:24:40 +0200 Subject: Re: [Qemu-devel] RFC: Code fetch optimisation From: Jocelyn Mayer In-Reply-To: References: <1192178037.9976.259.camel@rapid> Content-Type: text/plain Date: Fri, 12 Oct 2007 20:24:44 +0200 Message-Id: <1192213484.12053.39.camel@jma4.dev.netgem.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Reply-To: l_indien@magic.fr, qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org On Fri, 2007-10-12 at 18:21 +0300, Blue Swirl wrote: > On 10/12/07, J. Mayer wrote: > > Here's a small patch that allow an optimisation for code fetch, at least > > for RISC CPU targets, as suggested by Fabrice Bellard. > > The main idea is that a translated block is never to span over a page > > boundary. As the tb_find_slow routine already gets the physical address > > of the page of code to be translated, the code translator could then > > fetch the code using raw host memory accesses instead of doing it > > through the softmmu routines. > > This patch could also be adapted to RISC CPU targets, with care for the > > last instruction of a page. For now, I did implement it for alpha, arm, > > mips, PowerPC and SH4. > > I don't actually know if the optimsation would bring a sensible speed > > gain or if it will be absolutelly marginal. > > > > Please comment. > > This will not work correctly for execution of MMIO registers, but > maybe that won't work on real hardware either. Who cares. I wonder if this is important or not... But maybe, when retrieving the physical address we could check if it is inside ROM/RAM or an I/O area and in the last case do not give the phys_addr information to the translator. In that case, it would go on using the ldxx_code. I guess if we want to do that, a set of helpers would be appreciated to avoid adding code like: if (phys_pc == 0) opc = ldul_code(virt_pc) else opc = ldul_raw(phys_pc) everywhere... I could also add another check so this set of macro would automatically use ldxx_code if we reach a page boundary, which would then make easy to use this optimisation for CISC/VLE architectures too. I'm not sure of the proper solution to allow executing code from mmio devices. But adding specific accessors to handle the CISC/VLE case is to be done. Something like this might be OK: static inline target_ulong ldl_code_p(unsigned long *start_pc, unsigned long *phys_pc, target_ulong *virt_pc) { target_ulong opc; if ((*start_pc ^ *phys_pc) & TARGET_PAGE_MASK) { /* slow path that may raise an exception */ opc = ldul_code(virt_pc); *start_pc = phys_pc; /* Avoid softmmu call on next load */ } else { /* Optimized path */ opc = ldul_raw(phys_pc); } *phys_pc += sizeof(target_ulong); *virt_pc += sizeof(target_ulong); return opc; } Of course, 8, 16 and 64 (why not ?) bits accessors would be also provided the same way. > Wouldn't it be even more efficient if you moved most of this calculation: > + phys_pc = (unsigned long)phys_ram_base + tb->page_addr[0] + > + (pc_start & ~TARGET_PAGE_MASK); > here: > + tb->page_addr[0] = phys_page1; > ? Maybe. I choosed to do this way because it's exactly the same assignment that is done in tb_link_phys after the gen_intermediate_code function returns. I then though that the safer thing to do was to store the same value so, whatever might happen, the value in the tb structure is never inconsistent. I also guess that it's not so important as the tb is not linked at this point...