From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:53096) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TSiou-00063U-Ul for qemu-devel@nongnu.org; Mon, 29 Oct 2012 02:27:34 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TSiot-0007dL-D5 for qemu-devel@nongnu.org; Mon, 29 Oct 2012 02:27:32 -0400 Received: from mailout3.w1.samsung.com ([210.118.77.13]:44025) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TSiot-0007d1-3i for qemu-devel@nongnu.org; Mon, 29 Oct 2012 02:27:31 -0400 Received: from eusync4.samsung.com (mailout3.w1.samsung.com [210.118.77.13]) by mailout3.w1.samsung.com (Oracle Communications Messaging Server 7u4-24.01(7.0.4.24.0) 64bit (built Nov 17 2011)) with ESMTP id <0MCN00AWL5YICW20@mailout3.w1.samsung.com> for qemu-devel@nongnu.org; Mon, 29 Oct 2012 06:27:54 +0000 (GMT) Received: from [106.109.8.9] by eusync4.samsung.com (Oracle Communications Messaging Server 7u4-24.01(7.0.4.24.0) 64bit (built Nov 17 2011)) with ESMTPA id <0MCN0073G5XOLI20@eusync4.samsung.com> for qemu-devel@nongnu.org; Mon, 29 Oct 2012 06:27:26 +0000 (GMT) Message-id: <508E21CB.5060506@samsung.com> Date: Mon, 29 Oct 2012 10:27:23 +0400 From: Evgeny Voevodin MIME-version: 1.0 References: <1350973278-2236-1-git-send-email-e.voevodin@samsung.com> <5088E020.1000305@samsung.com> <508A2E84.8010000@samsung.com> In-reply-to: Content-type: text/plain; charset=UTF-8; format=flowed Content-transfer-encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v2 0/7] TCG global variables clean-up List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Blue Swirl Cc: edgar.iglesias@gmail.com, kyungmin.park@samsung.com, qemu-devel@nongnu.org, aurelien@aurel32.net, rth@twiddle.net On 10/27/2012 06:34 PM, Blue Swirl wrote: > On Fri, Oct 26, 2012 at 6:32 AM, Evgeny Voevodin wrote: >> Today I made more precise testing with usage of --enable-profiler. >> >> Here is the test procedure: >> 1. Boot Linux Kernel 5 times. >> 2. For each iteration wait while "JIT cycles" is stable for ~10 seconds >> 3. Write down the "cycles/op" >> >> Here are the results: >> >> Before clean-up: >> min: 731.9 >> max: 735.8 >> avg: 734.3 >> standard deviation: ~2 = 0.3% >> Avarage cycles/op = 734 +- 2 >> >> After clean-up: >> min: 747.2 >> max: 751.7 >> avg: 750.5 >> standard deviation: ~2 = 0.3% >> Avarage cycles/op = 750 +- 2 >> Slow-down of TCG code generation = 2.2% >> >> >> After clean-up with TCGContext *const tcg_cur_ctx: >> min: 730.6 >> max: 733.2 >> avg: 728.7 >> standard deviation: ~2 = 0.3% >> Avarage cycles/op = 729 +- 2 >> Slow-down of TCG code generation = 0% >> >> I suggest to define tcg_cur_ctx as TCGContext *const. >> Then we will get rid of TCG code generation slow-down and also >> will have no usage of global variables. > How does this compare with the original version without pointers? I > think that version may be safer to be assumed to be optimized by the > compiler. I did more testing with different gcc versions and different patch series: gcc verion v1 clean-up, no pointer v2 clean-up, const pointer master gcc-4.4 754.3 752.1 769.8 gcc-4.5 770.8 779.8 774.8 gcc-4.6 731.8 729.8 737 Conclusion: - First clean-up series without pointer operates faster than master in all cases. It's probably because data is cached more efficiently. - Second clean-up series with constant pointer operates faster than master in the case of gcc-4.4 and gcc-4.6. In the case of gcc-4.5 it seems that const pointer is not optimised as I assumed. I think that it's worth to generate third series without pointer and with code clean-up included in second. How do you think? >> >> On 10/25/2012 10:45 AM, Evgeny Voevodin wrote: >>> Here are the results of tests before and after this patch series was >>> applied: >>> >>> * EEMBC CoreMark (before -> after) >>> - Guest: Exynos4210 ARMv7, Linux (Custom buildroot image) >>> - Host: Intel(R) Core(TM) i5 CPU 750 @ 2.67GHz, 4GB RAM, Linux >>> - Results: 1148.105626 -> 1161.440186 (+1.16%) >>> >>> * nbench (before -> after) >>> - Guest: Exynos4210 ARMv7, Linux (Custom buildroot image) >>> - Host: Intel(R) Core(TM) i5 CPU 750 @ 2.67GHz, 4GB RAM, Linux >>> - Results >>> . MEMORY INDEX: 1.864 -> 1.862 (-0.11%) >>> . INTEGER INDEX: 2.518 -> 2.523 (+0.2%) >>> . FLOATING-POINT INDEX: 0.385 -> 0.394 (+2.34%) >>> >>> >>> Those tests show that it became even faster :)) >>> >>> But I'm quite sceptical about such results. >>> The thing is that in case of nbench it prints the warning if results are >>> not 95% statistically accurate. >>> So we can be sure that nbench result is 95% accurate. >>> And it's obvious that result shown above are in the scope of this >>> accuracy. >>> I don't know the accuracy of CoreMark. >>> >>> So, the main decision we can make that this patch series didn't >>> introduce any slow-down comparable to inaccuracy of the measurement. >>> >>> Is this enough? >>> >>> On 10/23/2012 10:21 AM, Evgeny Voevodin wrote: >>>> This set of patches moves global variables to tcg_ctx: >>>> gen_opc_ptr >>>> gen_opparam_ptr >>>> gen_opc_buf >>>> gen_opparam_buf >>>> >>>> Build tested for all targets. >>>> Execution tested on ARM. >>>> >>>> I didn't notice any slow-down of kernel boot after this set was applied. >>>> >>>> Changelog: >>>> v1->v2: >>>> Introduced TCGContext *tcg_cur_ctx global to use in those places where >>>> we don't >>>> have an interface to pass pointer to tcg_ctx. >>>> Code style clean-up >>>> >>>> Evgeny (2): >>>> tcg/tcg.h: Duplicate global TCG variables in TCGContext >>>> TCG: Remove unused global variables >>>> >>>> Evgeny Voevodin (5): >>>> translate-all.c: Introduce TCGContext *tcg_cur_ctx >>>> TCG: Use gen_opc_ptr from context instead of global variable. >>>> TCG: Use gen_opparam_ptr from context instead of global variable. >>>> TCG: Use gen_opc_buf from context instead of global variable. >>>> TCG: Use gen_opparam_buf from context instead of global variable. >>>> >>>> gen-icount.h | 2 +- >>>> target-alpha/translate.c | 10 +- >>>> target-arm/translate.c | 10 +- >>>> target-cris/translate.c | 13 +- >>>> target-i386/translate.c | 10 +- >>>> target-lm32/translate.c | 13 +- >>>> target-m68k/translate.c | 10 +- >>>> target-microblaze/translate.c | 13 +- >>>> target-mips/translate.c | 11 +- >>>> target-openrisc/translate.c | 13 +- >>>> target-ppc/translate.c | 11 +- >>>> target-s390x/translate.c | 11 +- >>>> target-sh4/translate.c | 10 +- >>>> target-sparc/translate.c | 10 +- >>>> target-unicore32/translate.c | 10 +- >>>> target-xtensa/translate.c | 8 +- >>>> tcg/optimize.c | 62 ++++---- >>>> tcg/tcg-op.h | 324 >>>> ++++++++++++++++++++--------------------- >>>> tcg/tcg.c | 85 ++++++----- >>>> tcg/tcg.h | 11 +- >>>> translate-all.c | 4 +- >>>> 21 files changed, 328 insertions(+), 323 deletions(-) >>>> >>> >> >> -- >> Kind regards, >> Evgeny Voevodin, >> Technical Leader, >> Mobile Group, >> Samsung Moscow Research Center, >> e-mail: e.voevodin@samsung.com -- Kind regards, Evgeny Voevodin, Technical Leader, Mobile Group, Samsung Moscow Research Center, e-mail: e.voevodin@samsung.com