From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1IgdkP-0001RS-DZ for qemu-devel@nongnu.org; Sat, 13 Oct 2007 05:57:29 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1IgdkK-0001P9-E6 for qemu-devel@nongnu.org; Sat, 13 Oct 2007 05:57:28 -0400 Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1IgdkK-0001P4-9Z for qemu-devel@nongnu.org; Sat, 13 Oct 2007 05:57:24 -0400 Received: from honiara.magic.fr ([195.154.193.36]) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1IgdkD-0003KP-PU for qemu-devel@nongnu.org; Sat, 13 Oct 2007 05:57:24 -0400 Received: from [192.168.0.2] (ppp-36.net-123.static.magiconline.fr [80.118.184.36]) by honiara.magic.fr (8.13.1/8.13.1) with ESMTP id l9D9v0bu026543 for ; Sat, 13 Oct 2007 11:57:00 +0200 Subject: Re: [Fwd: Re: [Qemu-devel] RFC: Code fetch optimisation] From: "J. Mayer" In-Reply-To: References: <1192230023.9976.291.camel@rapid> Content-Type: text/plain Date: Sat, 13 Oct 2007 11:57:03 +0200 Message-Id: <1192269423.9976.310.camel@rapid> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org On Sat, 2007-10-13 at 10:11 +0300, Blue Swirl wrote: > On 10/13/07, J. Mayer wrote: > > -------- Forwarded Message -------- > > > From: Jocelyn Mayer > > > Reply-To: l_indien@magic.fr, qemu-devel@nongnu.org > > > To: qemu-devel@nongnu.org > > > Subject: Re: [Qemu-devel] RFC: Code fetch optimisation > > > Date: Fri, 12 Oct 2007 20:24:44 +0200 > > > > > > On Fri, 2007-10-12 at 18:21 +0300, Blue Swirl wrote: > > > > On 10/12/07, J. Mayer wrote: > > > > > Here's a small patch that allow an optimisation for code fetch, at least > > > > > for RISC CPU targets, as suggested by Fabrice Bellard. > > > > > The main idea is that a translated block is never to span over a page > > > > > boundary. As the tb_find_slow routine already gets the physical address > > > > > of the page of code to be translated, the code translator could then > > > > > fetch the code using raw host memory accesses instead of doing it > > > > > through the softmmu routines. > > > > > This patch could also be adapted to RISC CPU targets, with care for the > > > > > last instruction of a page. For now, I did implement it for alpha, arm, > > > > > mips, PowerPC and SH4. > > > > > I don't actually know if the optimsation would bring a sensible speed > > > > > gain or if it will be absolutelly marginal. > > > > > > > > > > Please comment. > > > > > > > > This will not work correctly for execution of MMIO registers, but > > > > maybe that won't work on real hardware either. Who cares. > > > > > > I wonder if this is important or not... But maybe, when retrieving the > > > physical address we could check if it is inside ROM/RAM or an I/O area > > > and in the last case do not give the phys_addr information to the > > > translator. In that case, it would go on using the ldxx_code. I guess if > > > we want to do that, a set of helpers would be appreciated to avoid > > > adding code like: > > > if (phys_pc == 0) > > > opc = ldul_code(virt_pc) > > > else > > > opc = ldul_raw(phys_pc) > > > everywhere... I could also add another check so this set of macro would > > > automatically use ldxx_code if we reach a page boundary, which would > > > then make easy to use this optimisation for CISC/VLE architectures too. > > > > > > I'm not sure of the proper solution to allow executing code from mmio > > > devices. But adding specific accessors to handle the CISC/VLE case is to > > > be done. > > > > [...] > > > > I did update my patch following this way and it's now able to run x86 > > and PowerPC targets. > > PowerPC is the easy case, x86 is maybe the worst... Well, I'm not really > > sure of what I've done for Sparc, but other targets should be safe. > > It broke Sparc, delay slot handling makes things complicated. The > updated patch passes my tests. OK. I will take a look of how you solved this issue. > For extra performance, I bypassed the ldl_code_p. On Sparc, > instructions can't be split between two pages. Isn't translation > always contained to the same page for all targets like Sparc? Yes, for RISC targets running 32 bits mode, we always stop translation when we reach the end of a code page. The problem comes with CISC architectures, like x86 or m68k, or RISC architecture running 16/32 bits code, like ARM in thumb mode or PowerPC in VLE mode. In all those case, there can be instructions spanning on 2 pages, then we need the ldx_code_p functions. My idea of always using the ldx_code_p function is that we may have the occasion to make it more cleaver and make the slow case handle code execution in mmio areas, when it will be possible. -- J. Mayer Never organized