From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1IhSNh-0001Zj-Ly
	for qemu-devel@nongnu.org; Mon, 15 Oct 2007 12:01:25 -0400
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1IhSNf-0001Yt-PI
	for qemu-devel@nongnu.org; Mon, 15 Oct 2007 12:01:25 -0400
Received: from [199.232.76.173] (helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1IhSNf-0001YT-FR
	for qemu-devel@nongnu.org; Mon, 15 Oct 2007 12:01:23 -0400
Received: from mail.codesourcery.com ([65.74.133.4])
	by monty-python.gnu.org with esmtps
	(TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60)
	(envelope-from <paul@codesourcery.com>) id 1IhSNe-0000ZR-R3
	for qemu-devel@nongnu.org; Mon, 15 Oct 2007 12:01:23 -0400
From: Paul Brook <paul@codesourcery.com>
Subject: Re: [Qemu-devel] RFC: Code fetch optimisation
Date: Mon, 15 Oct 2007 17:01:16 +0100
References: <1192362267.9976.383.camel@rapid>
	<200710150330.55998.paul@codesourcery.com>
	<1192450140.9976.411.camel@rapid>
In-Reply-To: <1192450140.9976.411.camel@rapid>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
Message-Id: <200710151701.17822.paul@codesourcery.com>
Reply-To: qemu-devel@nongnu.org
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: qemu-devel@nongnu.org
Cc: "J. Mayer" <l_indien@magic.fr>

> > > +    unsigned long phys_pc;
> > > +    unsigned long phys_pc_start;
> >
> > These are ram offsets, not physical addresses. I recommend naming them =
as
> > such to avoid confusion.
>
> Well, those are host addresses. Fabrice even suggested me to replace
> them with void * to prevent confusion, but I kept using unsigned long
> because the _p functions API do not use pointers. As those values are
> defined as phys_ram_base + offset, those are likely to be host address,
> not RAM offset, and are used directly to dereference host pointers in
> the ldxxx_p functions. Did I miss something ?

You are correct, they are host addresses. I still think calling them phys_p=
c=20
is confusing. It took me a while to convince myself that "unsigned long" wa=
s=20
an appropriate type (ignoring 64-bit windows hosts for now).

How about host_pc?

> > > + =A0 =A0/* Avoid softmmu access on next load */
> > > + =A0 =A0/* XXX: dont: phys PC is not correct anymore
> > > + =A0 =A0 * =A0 =A0 =A0We could call get_phys_addr_code(env, pc); and=
 remove the
> > > else + =A0 =A0 * =A0 =A0 =A0condition, here.
> > > + =A0 =A0 */
> > > + =A0 =A0//*start_pc =3D phys_pc;
> >
> > The commented out code is completely bogus, please remove it. The comme=
nt
> > is also somewhat misleading/incorrect. The else would still be required
> > for accesses that span a page boundary.
>
> I guess trying to optimize this case retrieving the physical address
> would not bring any optimization as in fact only the last translated
> instruction of a TB (then only a few code loads) may hit this case.

VLE targets (x86, m68k) can translate almost a full page of instructions, a=
nd=20
a page boundary can be anywhere within that block. Once we've spanned=20
multiple pages there's not point stopping translation immediately. We may a=
s=20
well translate as many instructions as we can on the second page.

I'd guess most TB are much smaller than a page, so on average only a few=20
instructions are going to come after the page boundary.

> I'd like to keep a comment here to show that it may not be a good idea
> (or may not be as simple as it seems at first sight) to try to do more
> optimisation here, but you're right this comment is not correct.

Agreed.

> > The code itself looks ok, though I'd be surprised if it made a
> > significant difference. We're always going to hit the fast-path TLB
> > lookup case anyway.
>
> It seems that the generated code for the code fetch is much more
> efficient than the one generated when we get when using the softmmu
> routines. But it's true we do not get any significant performance boost.
> As it was previously mentioned, the idea of the patch is more a 'don't
> do unneeded things during code translation' than a great performance
> improvment.

OTOH it does make the the code more complicated. I'm agnostic about whether=
=20
this patch should be applied.

Paul