From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:58435) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TRHfX-0005oL-LC for qemu-devel@nongnu.org; Thu, 25 Oct 2012 03:15:56 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TRHfT-00053V-Al for qemu-devel@nongnu.org; Thu, 25 Oct 2012 03:15:55 -0400 Received: from mailout4.samsung.com ([203.254.224.34]:33552) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TRHfS-00052t-RC for qemu-devel@nongnu.org; Thu, 25 Oct 2012 03:15:51 -0400 Received: from epcpsbgm1.samsung.com (epcpsbgm1 [203.254.230.26]) by mailout4.samsung.com (Oracle Communications Messaging Server 7u4-24.01(7.0.4.24.0) 64bit (built Nov 17 2011)) with ESMTP id <0MCF00H2NTHYIER0@mailout4.samsung.com> for qemu-devel@nongnu.org; Thu, 25 Oct 2012 16:15:45 +0900 (KST) Received: from [172.21.111.108] ([182.198.1.3]) by mmp2.samsung.com (Oracle Communications Messaging Server 7u4-24.01 (7.0.4.24.0) 64bit (built Nov 17 2011)) with ESMTPA id <0MCF002ABTI9I9M1@mmp2.samsung.com> for qemu-devel@nongnu.org; Thu, 25 Oct 2012 16:15:45 +0900 (KST) Date: Thu, 25 Oct 2012 16:15:58 +0900 From: Yeongkyoon Lee In-reply-to: <1350716743-2812-1-git-send-email-yeongkyoon.lee@samsung.com> Message-id: <5088E72E.9020109@samsung.com> MIME-version: 1.0 Content-type: text/plain; charset=UTF-8; format=flowed Content-transfer-encoding: QUOTED-PRINTABLE References: <1350716743-2812-1-git-send-email-yeongkyoon.lee@samsung.com> Subject: Re: [Qemu-devel] [RESEND PATCH v6 0/3] tcg: enhance code generation quality for qemu_ld/st IRs List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Yeongkyoon Lee Cc: blauwirbel@gmail.com, qemu-devel@nongnu.org, aurelien@aurel32.net, rth@twiddle.net On 2012=EB=85=84 10=EC=9B=94 20=EC=9D=BC 16:05, Yeongkyoon Lee wrote: > Let me resend this patch, because it looks ignored except for the c= omment from Richard Henderson for which I've replied. > > Here is the 6th version of the series optimizing TCG qemu_ld/st cod= e generation. > > v6: > - Remove an extra argument of return addr from MMU helpers > Instead, embed the fast path addr to the slow path for helpers= to use it > - Change some bitwise operations to bitfielyeongkyoonds of struc= ture > - Change the name of function which handles finalization of TB c= ode generation > > v5: > - Remove RFC tag > > v4: > - Remove CONFIG_SOFTMMU pre-condition from configure > - Instead, add some CONFIG_SOFTMMU condition to TCG sources > - Remove some unnecessary comments > > v3: > - Support CONFIG_TCG_PASS_AREG0 > (expected to get more performance enhancement than others) > - Remove the configure option "--enable-ldst-optimization"" > - Make the optimization as default on i386 and x86_64 hosts > - Fix some mistyping and apply checkpatch.pl before committing > - Test i386, arm and sparc softmmu targets on i386 and x86_64 ho= sts > - Test linux-user-test-0.3 > > v2: > - Follow the submit rule of qemu > > v1: > - Initial commit request > > I think the generated codes from qemu_ld/st IRs are relatively heav= y, which are > up to 12 instructions for TLB hit case on i386 host. > This patch series enhance the code quality of TCG qemu_ld/st IRs by= reducing > jump and enhancing locality. > Main idea is simple and has been already described in the comments = in > tcg-target.c, which separates slow path (TLB miss case), and genera= tes it at the > end of TB. > > For example, the generated code from qemu_ld changes as follow. > Before: > (1) TLB check > (2) If hit fall through, else jump to TLB miss case (5) > (3) TLB hit case: Load value from host memory > (4) Jump to next code (6) > (5) TLB miss case: call MMU helper > (6) ... (next code) > > After: > (1) TLB check > (2) If hit fall through, else jump to TLB miss case (5) > (3) TLB hit case: Load value from host memory > (4) ... (next code) > ... > (5) TLB miss case: call MMU helper > (6) Jump to (8) > (7) [embedded addr of (4)] <- never executed but read by MMU helper= s > (8) Return to next code (4) > > Following is some performance results measured based on qemu 1.0. > Although there was measurement error, the results was not negligibl= e. > > * EEMBC CoreMark (before -> after) > - Guest: i386, Linux (Tizen platform) > - Host: Intel Core2 Quad 2.4GHz, 2GB RAM, Linux > - Results: 1135.6 -> 1179.9 (+3.9%) > > * nbench (before -> after) > - Guest: i386, Linux (linux-0.2.img included in QEMU source) > - Host: Intel Core2 Quad 2.4GHz, 2GB RAM, Linux > - Results > . MEMORY INDEX: 1.6782 -> 1.6818 (+0.2%) > . INTEGER INDEX: 1.8258 -> 1.877 (+2.8%) > . FLOATING-POINT INDEX: 0.5944 -> 0.5954 (+0.2%) > > Summarized features: > - The changes are wrapped by macro "CONFIG_QEMU_LDST_OPTIMIZATION= " and > they are enabled by default on i386/x86_64 hosts > - Forced removal of the macro will cause compilation error on i38= 6/x86_64 hosts > - No implementations other than i386/x86_64 hosts yet > > In addition, I have tried to remove the generated codes of calling = MMU helpers > for TLB miss case from end of TB, however, have not found good solu= tion yet. > In my opinion, TLB hit case performance could be degraded if removi= ng the > calling codes, because it needs to set runtime parameters, such as,= data, > mmu index and return address, in register or stack though they are = not used > in TLB hit case. > This remains as a further issue. > > Yeongkyoon Lee (3): > configure: Add CONFIG_QEMU_LDST_OPTIMIZATION for TCG qemu_ld/st > optimization > tcg: Add extended GETPC mechanism for MMU helpers with ldst > optimization > tcg: Optimize qemu_ld/st by generating slow paths at the end of = a > block > > configure | 6 + > exec-all.h | 36 +++++ > exec.c | 11 ++ > softmmu_template.h | 16 +- > tcg/i386/tcg-target.c | 415 +++++++++++++++++++++++++++++++++---= ------------- > tcg/tcg.c | 12 ++ > tcg/tcg.h | 30 ++++ > 7 files changed, 385 insertions(+), 141 deletions(-) > > -- > 1.7.5.4 > Ping?