From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([208.118.235.92]:59292)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <paolo.bonzini@gmail.com>) id 1UgrPY-0007MQ-Jg
	for qemu-devel@nongnu.org; Mon, 27 May 2013 03:00:09 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <paolo.bonzini@gmail.com>) id 1UgrPT-0003CX-BY
	for qemu-devel@nongnu.org; Mon, 27 May 2013 03:00:04 -0400
Received: from mail-ea0-x230.google.com ([2a00:1450:4013:c01::230]:61945)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <paolo.bonzini@gmail.com>) id 1UgrPT-0003CT-5H
	for qemu-devel@nongnu.org; Mon, 27 May 2013 02:59:59 -0400
Received: by mail-ea0-f176.google.com with SMTP id k11so3796688eaj.35
	for <qemu-devel@nongnu.org>; Sun, 26 May 2013 23:59:58 -0700 (PDT)
Sender: Paolo Bonzini <paolo.bonzini@gmail.com>
Message-ID: <51A30466.6020703@redhat.com>
Date: Mon, 27 May 2013 08:59:50 +0200
From: Paolo Bonzini <pbonzini@redhat.com>
MIME-Version: 1.0
References: <CALBwSP0u6MxDH6Adt3XV_iVf7hqvqYFw8nWAUGz12iXdGGu9Cw@mail.gmail.com>
	<51A10BCA.6000800@suse.de>
	<CALBwSP39qnyENkvj4PtySHYnfVc7tkc6wM0nAxDsw5AtiEG=FA@mail.gmail.com>
	<CAFEAcA9Ce0XrEJb+CTSDCaU=1C2JmcHz4ZMt+aY-nr-mR3X4rg@mail.gmail.com>
	<CALBwSP1s+U57w_qgR66qwqFhXorJjkcAjBMHpYafu=Ne7KoBgg@mail.gmail.com>
In-Reply-To: <CALBwSP1s+U57w_qgR66qwqFhXorJjkcAjBMHpYafu=Ne7KoBgg@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] Potential to accelerate QEMU for specific
	architectures
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Lior Vernia <liorvern@gmail.com>
Cc: Peter Maydell <peter.maydell@linaro.org>, =?UTF-8?B?QW5kcmVhcyBGw6RyYmVy?= <afaerber@suse.de>, qemu-devel@nongnu.org, =?UTF-8?B?6Zmz6Z+L5Lu7?= <chenwj@iis.sinica.edu.tw>, Richard Henderson <rth@twiddle.net>

Il 26/05/2013 18:35, Lior Vernia ha scritto:
> What about no to the first bullet but yes to the second (just x86 on
> ARM)? Any room for significant improvement in that case, starting from
> the foundations of QEMU?

You could write a target-specific translator, yes.  But first of all I
would answer whether you're using 32- or 64-bit, and run some profiling
to see what is the hotspot in your case.

I know that in some scenarios helpers for SSE take a considerable amount
of time (5-10%).  You could look at adding SIMD data types to TCG, and
map them to Neon operations or even to fully-unrolled loops.

As other works, ahead-of-time translation can also do a lot more
optimizations, including very aggressive dead-code elimination.  For
example, again considering SSE, something like

     pcmpeqw  %xmm0, %xmm1
     pmovmskb %xmm1, %eax
     test     %eax, %eax
     jz       ...

will be translated to a slow sequence in QEMU due to the expensive
pmovmskb.  A custom code generator can observe that %eax is dead and use
a better translation of this idiom.

Also, floating-point emulation is always done in software in QEMU due to
different representations (and due to the 80-bit floating-point
registers mostly used by 32-bit x86).  This is going to be slow no
matter what.

Paolo