From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37538) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WvTDR-0002E7-7v for qemu-devel@nongnu.org; Fri, 13 Jun 2014 11:16:34 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WvTDK-0003JN-PQ for qemu-devel@nongnu.org; Fri, 13 Jun 2014 11:16:29 -0400 Received: from static.88-198-71-155.clients.your-server.de ([88.198.71.155]:38741 helo=socrates.bennee.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WvTDK-0003Iz-J9 for qemu-devel@nongnu.org; Fri, 13 Jun 2014 11:16:22 -0400 References: From: Alex =?utf-8?Q?Benn=C3=A9e?= Date: Fri, 13 Jun 2014 15:56:12 +0100 In-reply-to: Message-ID: <874mzoon52.fsf@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Subject: Re: [Qemu-devel] [Discuss] Qemu TCG-IR VS LLVM IR List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Chaos Shu Cc: qemu-devel@nongnu.org Chaos Shu writes: > Hi all > > Recently I am investigating is there better BT solution? I got two kinds of > popular method. > According to their finally test[1][2]. Seems that LLVM IR method is slower > than Qemu's TCG-IR. But according last reply from linaro engineer once work > in Transitive, the QuickTransit is much better in performance, it uses IR > and DAG just as LLVM IR does. I think your focusing too much on one aspect of the design differences of the two translators. While IR based approaches do make some things easier they introduce other problems you need to solve. Typically when you build a DAG you get automatic dead code elimination if say a register is defined with a new value having not been used for something else. Suddenly you need a mechanism to deal resolving exception state for signals that arrived between the two definitions. In QEMU the block is just re-translated without any optimisation. Just because QEMU doesn't use IR doesn't mean it can't optimise the operations - the result might be a little less elegant but it can do it. Even then you need to ask yourself if changing the entire tcg engine gains you enough. Looking at a quick perf dump on my current work: 37.72% perf-28173.map [.] 0x00007fd52184181a 18.25% qemu-system-aarch64 [.] cpu_arm_exec 7.80% qemu-system-aarch64 [.] phys_page_find 4.54% qemu-system-aarch64 [.] get_phys_addr_lpae 3.72% qemu-system-aarch64 [.] address_space_translate_internal 3.35% qemu-system-aarch64 [.] address_space_translate 2.11% qemu-system-aarch64 [.] tlb_set_page ... So less than 50% of the time is spent in translated code. This suggests there are plenty of other places we could look for performance improvements, that's before we talk about tackling things like safely using threads and utilising more than one core on TCG based system emulation. That 37% figure isn't overly helpful either. We need to look at what the break down is for hot-blocks (the 80/20 rule) and if the current tcg can improve. > And what's more, I found result from ICT/Loongson, they work on Qemu-TCG > years and opt on IR and devote much to hardware register mapping and > peephole-like opt on generated code after TCG, and finally seems to get a > good-ending. Don't misunderstand me these llvm experiments are very interesting and offer potential avenues to explore. But if you really want to want to the compare the approaches I suspect it would be better to build an IR based translator from scratch with some thought to design rather than trying to bolt it on to a different system > Those two directions, which one is better? I mean which one can be the > finally product level app in future arm/x86 competition. > > > > > > [1]: https://code.google.com/p/llvm-qemu/wiki/Status > > [2]: > http://infoscience.epfl.ch/record/149975/files/x86-llvm-translator-chipounov > _2.pdf -- Alex Bennée