From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1IkSpF-0000Qq-CB
	for qemu-devel@nongnu.org; Tue, 23 Oct 2007 19:06:17 -0400
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1IkSpE-0000Os-Cj
	for qemu-devel@nongnu.org; Tue, 23 Oct 2007 19:06:16 -0400
Received: from [199.232.76.173] (helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1IkSpE-0000Oe-7O
	for qemu-devel@nongnu.org; Tue, 23 Oct 2007 19:06:16 -0400
Received: from honiara.magic.fr ([195.154.193.36])
	by monty-python.gnu.org with esmtps
	(TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60)
	(envelope-from <l_indien@magic.fr>) id 1IkSpD-0000E9-No
	for qemu-devel@nongnu.org; Tue, 23 Oct 2007 19:06:16 -0400
Subject: Re: [Qemu-devel] PreP kernels boot using Qemu
From: "J. Mayer" <l_indien@magic.fr>
In-Reply-To: <471E6ED2.6020003@aurel32.net>
References: <1193038567.16781.108.camel@rapid>
	<20071022162810.GA12778@hall.aurel32.net>
	<1193087522.16781.121.camel@rapid> <471D1E98.50303@aurel32.net>
	<1193092572.16781.128.camel@rapid>
	<20071023114737.GD25397@networkno.de>
	<1193176403.16781.189.camel@rapid>  <471E6ED2.6020003@aurel32.net>
Content-Type: text/plain; charset=ISO-8859-15
Date: Wed, 24 Oct 2007 01:06:07 +0200
Message-Id: <1193180767.16781.215.camel@rapid>
Mime-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Reply-To: qemu-devel@nongnu.org
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: qemu-devel@nongnu.org
Cc: Aurelien Jarno <aurelien@aurel32.net>


On Tue, 2007-10-23 at 23:59 +0200, Aurelien Jarno wrote:
> J. Mayer a =E9crit :
> > On Tue, 2007-10-23 at 12:47 +0100, Thiemo Seufer wrote:
> >> J. Mayer wrote:
> >>> On Tue, 2007-10-23 at 00:05 +0200, Aurelien Jarno wrote:
> >>>> J. Mayer a =E9crit :
> >>>>> On Mon, 2007-10-22 at 18:28 +0200, Aurelien Jarno wrote:
> >>>>>> On Mon, Oct 22, 2007 at 09:36:07AM +0200, J. Mayer wrote:
> >>>>>>> Hi all,
> >>>>>>>
> >>>>>>> I've been investigating more about PreP kernel boot using Qemu =
and I
> >>>>>>> achieved to boot 2.4.35, 2.6.12 and 2.6.22 kernels using Qemu C=
VS and
> >>>>>>> unmodified OHW.
> >>> [...]
> >>>>>> - The "floating point" problem I reported during the week-end do=
es
> >>>> not
> >>>>>>   exists, probably because of the switch from powerpc to ppc. I
> >>>> still=20
> >>>>>>   don't know if it is a kernel problem or a QEMU problem (or bot=
h).
> >>>>> There may be issues with the floating point emulation, especially=
 if
> >>>>> some kernel or programs relies on the FPSCR (floating-point statu=
s)
> >>>>> register which is never updated in Qemu.
> >>>>>
> >>>> Is there any technical reason behind that, or is it just a lack of
> >>>> time?
> >>> I can say  both:
> >>> for most program, using floating point arithmetic ala "fast-math", =
it's
> >>> not necessary to maintain a precise FPU state, as those program wil=
l
> >>> never raise any FPU exception, never generate NaNs, infinites, ...
> >>> The other reason is that it would need to check every FPU insn argu=
ments
> >>> and results at run time and treat all special cases following the a=
ctual
> >>> PowerPC implementations behavior if we want to get a precise emulat=
ion.
> >>> This behavior could be for example selected at compile time: then o=
ne
> >>> would have the choice to have a quick FPU emulation model or a prec=
ise
> >>> one.
> >> For mips I chose the middle ground: The emulation is architecturally
> >> correct but may not reflect FPU behaviour of the specific silicon.
> >> E.g. one effect is that in certain cases the emulation computes valu=
es
> >> close to underflow, while real hardware would throw the (mips FPU
> >> specific) unimplemented exception.
> >>
> >> For most cases this should be good enough, since only specialized
> >> software will rely on a specific implementation's oddities.
> >=20
> > Well, what you've done for Mips is exactly what I called the "precise
> > emulation" and is far slower than the "fast math" emulation I got for
> > PowerPC. I was wrong talking about "PowerPC implementations" when I
> > should have said "PowerPC specification"; but there should be no
> > difference between the two (or it's not a PowerPC CPU...) because the
> > POWER/PowerPC specification describes very precisely the behavior of =
the
> > FPU.
> > The "fast math" model relies on the native-softmmu code and is sufici=
ent
> > for most applications. But there are a few instructions that should
> > always take care (or maybe at least reset) the FPSCR register, which =
is
> > not done in the current code.
> >=20
>=20
> Then I guess it is what has been done on the SPARC target: after each F=
P
> instruction, check_ieee_exceptions() is called to accumulate the IEEE
> exceptions and generate real exceptions if they are enabled.
>=20
> That doesn't look really complex, but I agree that could slow down a bi=
t
> the emulation. I will get a closer look in two or three weeks.

It's not so complex. What would greatly slow down the emulation is that
you need to use the softfloat model instead of the softfloat-native one
for this to produce the expected result.
The PowerPC "fadd" instruction just compiles with 3 insns on amd64,
using the "fast math" model:
movlpd 0x1b8(%r14),%xmm0 ; /* Load env->ft0 into a MMX register */
addsd  0x1b0(%r14),%xmm0 ; /* Add env->ft1 */
movsd  %xmm0,0x1b0(%r14) ; /* Store the result into env->ft0 */
With the "precise" model, you need to:
1/ Clear the floating point flags
2/ Load operands from env->ft0 & env->ft1 into host registers
3/ Call the float64_add function
4/ Store the result into env->ft0
5/ Compute the architecture specific FPU flags
which will lead to execute much more code for each FPU operation and
will consume much more space in the TB buffer.

It's a good idea to allow the use of such a precise model, when you want
to use specific applications that rely on the FPU to properly handle
NaNs, infinities and properly generate exceptions. But, as it's not
needed by most applications, having a "fast math" model is also great to
have a quicker emulation. I said it would be great to allow the choice
of the model at compile time but it could in fact be choosen at
run-time, just tweaking the code translator (which should not lead to
any performance penalty for the "fast" model case) and compiling twice
the FPU micro-operations, once with the CONFIG_SOFTFLOAT defined, once
without. This way, the Qemu user could easily choose between "fast" or
"precise" models, just changing a switch on the command line.

--=20
J. Mayer <l_indien@magic.fr>
Never organized