From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:59292) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UgrPY-0007MQ-Jg for qemu-devel@nongnu.org; Mon, 27 May 2013 03:00:09 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UgrPT-0003CX-BY for qemu-devel@nongnu.org; Mon, 27 May 2013 03:00:04 -0400 Received: from mail-ea0-x230.google.com ([2a00:1450:4013:c01::230]:61945) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UgrPT-0003CT-5H for qemu-devel@nongnu.org; Mon, 27 May 2013 02:59:59 -0400 Received: by mail-ea0-f176.google.com with SMTP id k11so3796688eaj.35 for ; Sun, 26 May 2013 23:59:58 -0700 (PDT) Sender: Paolo Bonzini Message-ID: <51A30466.6020703@redhat.com> Date: Mon, 27 May 2013 08:59:50 +0200 From: Paolo Bonzini MIME-Version: 1.0 References: <51A10BCA.6000800@suse.de> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] Potential to accelerate QEMU for specific architectures List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Lior Vernia Cc: Peter Maydell , =?UTF-8?B?QW5kcmVhcyBGw6RyYmVy?= , qemu-devel@nongnu.org, =?UTF-8?B?6Zmz6Z+L5Lu7?= , Richard Henderson Il 26/05/2013 18:35, Lior Vernia ha scritto: > What about no to the first bullet but yes to the second (just x86 on > ARM)? Any room for significant improvement in that case, starting from > the foundations of QEMU? You could write a target-specific translator, yes. But first of all I would answer whether you're using 32- or 64-bit, and run some profiling to see what is the hotspot in your case. I know that in some scenarios helpers for SSE take a considerable amount of time (5-10%). You could look at adding SIMD data types to TCG, and map them to Neon operations or even to fully-unrolled loops. As other works, ahead-of-time translation can also do a lot more optimizations, including very aggressive dead-code elimination. For example, again considering SSE, something like pcmpeqw %xmm0, %xmm1 pmovmskb %xmm1, %eax test %eax, %eax jz ... will be translated to a slow sequence in QEMU due to the expensive pmovmskb. A custom code generator can observe that %eax is dead and use a better translation of this idiom. Also, floating-point emulation is always done in software in QEMU due to different representations (and due to the 80-bit floating-point registers mostly used by 32-bit x86). This is going to be slow no matter what. Paolo