From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.33)
	id 1Cm4YZ-0005zw-SL
	for qemu-devel@nongnu.org; Wed, 05 Jan 2005 01:22:07 -0500
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.33)
	id 1Cm4YY-0005zO-Rv
	for qemu-devel@nongnu.org; Wed, 05 Jan 2005 01:22:07 -0500
Received: from [199.232.76.173] (helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.33) id 1Cm4YY-0005zE-DY
	for qemu-devel@nongnu.org; Wed, 05 Jan 2005 01:22:06 -0500
Received: from [64.233.184.205] (helo=wproxy.gmail.com)
	by monty-python.gnu.org with esmtp (Exim 4.34) id 1Cm4DS-0007yb-E0
	for qemu-devel@nongnu.org; Wed, 05 Jan 2005 01:00:18 -0500
Received: by wproxy.gmail.com with SMTP id 68so108691wri
	for <qemu-devel@nongnu.org>; Tue, 04 Jan 2005 22:00:16 -0800 (PST)
Message-ID: <cd8ecdef050104220036efa970@mail.gmail.com>
Date: Wed, 5 Jan 2005 01:00:15 -0500
From: Karl Magdsick <kmagnum@gmail.com>
Subject: Re: [Qemu-devel] Endian and userspace issues
In-Reply-To: <p0610051ebe011a73c421@24.20.233.105>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
References: <p06100518be009d266600@24.20.233.105>
	<200501042016.03910.paul@codesourcery.com>
	<BE34C181-3D23-417C-9BC4-780C9A2945F7@mac.com>
	<p0610051ebe011a73c421@24.20.233.105>
Reply-To: Karl Magdsick <kmagnum@gmail.com>, qemu-devel@nongnu.org
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: qemu-devel@nongnu.org

The difficulty comes when, for instance, you have a struct containing
a u_int16_t, followed by a int32_t[2], followed by a u_int8_t[2].  You
pass a pointer to this struct to an AES or TwoFish encryption
implementation that takes a void*.  Internally, this void* is treated
as a u_int32_t*.  If the struct were little endian, the most
significant byte of the u_int16_t would end up being in the middle of
the first 32-bit word as viewed by the encryption function.  The
trouble is that the necessary type casts in C/C++ don't follow through
to the machine code.  You can guess where the typecasts come, but it
becomes difficult.

Now, the way one would first think to implement x86 emulation on a
big-endian architecture would be to implement strait-forward access of
int8_ts and insert thunks for int16_ts, int32_ts, floats, doubles, and
long doubles.  In other words, the "normal" way to emulate a CPU on
top of a CPU of the oposite endianess will result in code optimized
for 8-bit memory accesses.

I imagine that the VirtualPC "big-endian x86" implementation really
isn't big-endian, but it instead optimizes the memory layout for
aligned 32-bit access.  I imagine the memory is viewed as an array of
alligned 32-bit big-endian words instead of an array of bytes. 
Loading (and storing) int32_ts and floats requires no endian changes
as long as the memory accesses are 32-bit aligned.  Loading (or
storing) an int8_t would require a simple thunk of XORing its address
with 0x3 and loading an aligned int16_t would require a simple thunk
of XORing its address with 0x2.  Doubles, long doubles, and unaligned
accesses require more complicated thunks.  However, aligned int32_ts
are the fastest accesses on 32-bit x86 CPUs, so performance tuned code
will use aligned int32_ts wherever possible.  Optimizing memory access
for aligned 32-bit access is likely a net positive gain.

If you're emulating an architecture that doesn't allow unaligned
memory accesses, then the gains are even greater.

Called native libraries would still need to be aware of the
non-standard memory layout, or else the emulatior will need to have
knowledge of the APIs and insert extra thunks in the function calls
between emulated and native code.

If you were emulating only user-space, I imagine you could insert
alternative implementations of calloc() and malloc() that set aside
some space for access accounting.  You could then see if the first
access to the allocated memory was an aligned 32-bit access and mark
the allocated buffer as optimized for 32-bit aligned access, otherwise
you would use "normal" emulation.  The accounting overhead and
complexity would likely make this "mixed endian memory layout"
emulation more trouble than it's worth.  With system emulation, you
could do something similar with accounting on a per-page or
per-kilobyte basis.


-Karl

On Tue, 4 Jan 2005 20:17:13 -0800, anarkhos@vfemail.net
<anarkhos@vfemail.net> wrote:
> At 8:11 PM -0800 1/4/05, John Davidorff Pell wrote:
> >I think that part of what he is suggesting is that the code that is little endian be translated to Big endian before execution. This would make the running binary "native" in memory, and so could continue to be closely integrated with its linked libraries.
> 
> Yes, that was what I was suggesting. If this were done I don't even see the need for thunks at all.
> 
> 
> _______________________________________________
> Qemu-devel mailing list
> Qemu-devel@nongnu.org
> http://lists.nongnu.org/mailman/listinfo/qemu-devel
>