From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1IhSNh-0001Zj-Ly for qemu-devel@nongnu.org; Mon, 15 Oct 2007 12:01:25 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1IhSNf-0001Yt-PI for qemu-devel@nongnu.org; Mon, 15 Oct 2007 12:01:25 -0400 Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1IhSNf-0001YT-FR for qemu-devel@nongnu.org; Mon, 15 Oct 2007 12:01:23 -0400 Received: from mail.codesourcery.com ([65.74.133.4]) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1IhSNe-0000ZR-R3 for qemu-devel@nongnu.org; Mon, 15 Oct 2007 12:01:23 -0400 From: Paul Brook Subject: Re: [Qemu-devel] RFC: Code fetch optimisation Date: Mon, 15 Oct 2007 17:01:16 +0100 References: <1192362267.9976.383.camel@rapid> <200710150330.55998.paul@codesourcery.com> <1192450140.9976.411.camel@rapid> In-Reply-To: <1192450140.9976.411.camel@rapid> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Message-Id: <200710151701.17822.paul@codesourcery.com> Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Cc: "J. Mayer" > > > + unsigned long phys_pc; > > > + unsigned long phys_pc_start; > > > > These are ram offsets, not physical addresses. I recommend naming them = as > > such to avoid confusion. > > Well, those are host addresses. Fabrice even suggested me to replace > them with void * to prevent confusion, but I kept using unsigned long > because the _p functions API do not use pointers. As those values are > defined as phys_ram_base + offset, those are likely to be host address, > not RAM offset, and are used directly to dereference host pointers in > the ldxxx_p functions. Did I miss something ? You are correct, they are host addresses. I still think calling them phys_p= c=20 is confusing. It took me a while to convince myself that "unsigned long" wa= s=20 an appropriate type (ignoring 64-bit windows hosts for now). How about host_pc? > > > + =A0 =A0/* Avoid softmmu access on next load */ > > > + =A0 =A0/* XXX: dont: phys PC is not correct anymore > > > + =A0 =A0 * =A0 =A0 =A0We could call get_phys_addr_code(env, pc); and= remove the > > > else + =A0 =A0 * =A0 =A0 =A0condition, here. > > > + =A0 =A0 */ > > > + =A0 =A0//*start_pc =3D phys_pc; > > > > The commented out code is completely bogus, please remove it. The comme= nt > > is also somewhat misleading/incorrect. The else would still be required > > for accesses that span a page boundary. > > I guess trying to optimize this case retrieving the physical address > would not bring any optimization as in fact only the last translated > instruction of a TB (then only a few code loads) may hit this case. VLE targets (x86, m68k) can translate almost a full page of instructions, a= nd=20 a page boundary can be anywhere within that block. Once we've spanned=20 multiple pages there's not point stopping translation immediately. We may a= s=20 well translate as many instructions as we can on the second page. I'd guess most TB are much smaller than a page, so on average only a few=20 instructions are going to come after the page boundary. > I'd like to keep a comment here to show that it may not be a good idea > (or may not be as simple as it seems at first sight) to try to do more > optimisation here, but you're right this comment is not correct. Agreed. > > The code itself looks ok, though I'd be surprised if it made a > > significant difference. We're always going to hit the fast-path TLB > > lookup case anyway. > > It seems that the generated code for the code fetch is much more > efficient than the one generated when we get when using the softmmu > routines. But it's true we do not get any significant performance boost. > As it was previously mentioned, the idea of the patch is more a 'don't > do unneeded things during code translation' than a great performance > improvment. OTOH it does make the the code more complicated. I'm agnostic about whether= =20 this patch should be applied. Paul