From: "Alex Bennée" <alex.bennee@linaro.org>
To: "Emilio G. Cota" <cota@braap.org>
Cc: qemu-trivial@nongnu.org, Stefan Weil <sw@weilnetz.de>,
qemu-devel@nongnu.org, Richard Henderson <rth@twiddle.net>
Subject: Re: [Qemu-trivial] [Qemu-devel] [PATCH] tcg: optimise memory layout of TCGTemp
Date: Fri, 27 Mar 2015 09:55:03 +0000 [thread overview]
Message-ID: <87y4mibw94.fsf@linaro.org> (raw)
In-Reply-To: <1427313048-26772-1-git-send-email-cota@braap.org>
Emilio G. Cota <cota@braap.org> writes:
> This brings down the size of the struct from 56 to 32 bytes on 64-bit,
> and to 16 bytes on 32-bit.
Have you been able to measure any performance improvement with these new
structures? In theory, if aligned with cache lines, performance should
improve but real numbers would be nice.
>
> The appended adds macros to prevent us from mistakenly overflowing
> the bitfields when more elements are added to the corresponding
> enums/macros.
I can see the defines but I can't see any checks. Should we be able to
do a compile time check if TCG_TYPE_COUNT doesn't fit into
TCG_TYPE_NR_BITS?
>
> Note that reg/mem_reg need only 6 bits (for ia64) but for performance
> is probably better to align them to a byte address.
>
> Given that TCGTemp is used in large arrays this leads to a few KBs
> of savings. However, unpacking the bits takes additional code, so
> the net effect depends on the target (host is x86_64):
>
> Before:
> $ find . -name 'tcg.o' | xargs size
> text data bss dec hex filename
> 41131 29800 88 71019 1156b ./aarch64-softmmu/tcg/tcg.o
> 37969 29416 96 67481 10799 ./x86_64-linux-user/tcg/tcg.o
> 39354 28816 96 68266 10aaa ./arm-linux-user/tcg/tcg.o
> 40802 29096 88 69986 11162 ./arm-softmmu/tcg/tcg.o
> 39417 29672 88 69177 10e39 ./x86_64-softmmu/tcg/tcg.o
>
> After:
> $ find . -name 'tcg.o' | xargs size
> text data bss dec hex filename
> 41187 29800 88 71075 115a3 ./aarch64-softmmu/tcg/tcg.o
> 37777 29416 96 67289 106d9 ./x86_64-linux-user/tcg/tcg.o
> 39162 28816 96 68074 109ea ./arm-linux-user/tcg/tcg.o
> 40858 29096 88 70042 1119a ./arm-softmmu/tcg/tcg.o
> 39473 29672 88 69233 10e71 ./x86_64-softmmu/tcg/tcg.o
>
> Suggested-by: Stefan Weil <sw@weilnetz.de>
> Suggested-by: Richard Henderson <rth@twiddle.net>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> ---
> tcg/tcg.h | 22 +++++++++++++---------
> 1 file changed, 13 insertions(+), 9 deletions(-)
>
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index add7f75..71ae7b2 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -193,7 +193,7 @@ typedef struct TCGPool {
> typedef enum TCGType {
> TCG_TYPE_I32,
> TCG_TYPE_I64,
> - TCG_TYPE_COUNT, /* number of different types */
> + TCG_TYPE_COUNT, /* number of different types, see TCG_TYPE_NR_BITS */
>
> /* An alias for the size of the host register. */
> #if TCG_TARGET_REG_BITS == 32
> @@ -217,6 +217,9 @@ typedef enum TCGType {
> #endif
> } TCGType;
>
> +/* used for bitfield packing to save space */
> +#define TCG_TYPE_NR_BITS 1
> +
> /* Constants for qemu_ld and qemu_st for the Memory Operation field. */
> typedef enum TCGMemOp {
> MO_8 = 0,
> @@ -421,16 +424,14 @@ static inline TCGCond tcg_high_cond(TCGCond c)
> #define TEMP_VAL_REG 1
> #define TEMP_VAL_MEM 2
> #define TEMP_VAL_CONST 3
> +#define TEMP_VAL_NR_BITS 2
A similar compile time check could be added here.
>
> -/* XXX: optimize memory layout */
> typedef struct TCGTemp {
> - TCGType base_type;
> - TCGType type;
> - int val_type;
> - int reg;
> - tcg_target_long val;
> - int mem_reg;
> - intptr_t mem_offset;
> + unsigned int reg:8;
> + unsigned int mem_reg:8;
> + unsigned int val_type:TEMP_VAL_NR_BITS;
> + unsigned int base_type:TCG_TYPE_NR_BITS;
> + unsigned int type:TCG_TYPE_NR_BITS;
> unsigned int fixed_reg:1;
> unsigned int mem_coherent:1;
> unsigned int mem_allocated:1;
> @@ -438,6 +439,9 @@ typedef struct TCGTemp {
> basic blocks. Otherwise, it is not
> preserved across basic blocks. */
> unsigned int temp_allocated:1; /* never used for code gen */
> +
> + tcg_target_long val;
> + intptr_t mem_offset;
> const char *name;
> } TCGTemp;
--
Alex Bennée
WARNING: multiple messages have this Message-ID (diff)
From: "Alex Bennée" <alex.bennee@linaro.org>
To: "Emilio G. Cota" <cota@braap.org>
Cc: qemu-trivial@nongnu.org, Stefan Weil <sw@weilnetz.de>,
qemu-devel@nongnu.org, Richard Henderson <rth@twiddle.net>
Subject: Re: [Qemu-devel] [PATCH] tcg: optimise memory layout of TCGTemp
Date: Fri, 27 Mar 2015 09:55:03 +0000 [thread overview]
Message-ID: <87y4mibw94.fsf@linaro.org> (raw)
In-Reply-To: <1427313048-26772-1-git-send-email-cota@braap.org>
Emilio G. Cota <cota@braap.org> writes:
> This brings down the size of the struct from 56 to 32 bytes on 64-bit,
> and to 16 bytes on 32-bit.
Have you been able to measure any performance improvement with these new
structures? In theory, if aligned with cache lines, performance should
improve but real numbers would be nice.
>
> The appended adds macros to prevent us from mistakenly overflowing
> the bitfields when more elements are added to the corresponding
> enums/macros.
I can see the defines but I can't see any checks. Should we be able to
do a compile time check if TCG_TYPE_COUNT doesn't fit into
TCG_TYPE_NR_BITS?
>
> Note that reg/mem_reg need only 6 bits (for ia64) but for performance
> is probably better to align them to a byte address.
>
> Given that TCGTemp is used in large arrays this leads to a few KBs
> of savings. However, unpacking the bits takes additional code, so
> the net effect depends on the target (host is x86_64):
>
> Before:
> $ find . -name 'tcg.o' | xargs size
> text data bss dec hex filename
> 41131 29800 88 71019 1156b ./aarch64-softmmu/tcg/tcg.o
> 37969 29416 96 67481 10799 ./x86_64-linux-user/tcg/tcg.o
> 39354 28816 96 68266 10aaa ./arm-linux-user/tcg/tcg.o
> 40802 29096 88 69986 11162 ./arm-softmmu/tcg/tcg.o
> 39417 29672 88 69177 10e39 ./x86_64-softmmu/tcg/tcg.o
>
> After:
> $ find . -name 'tcg.o' | xargs size
> text data bss dec hex filename
> 41187 29800 88 71075 115a3 ./aarch64-softmmu/tcg/tcg.o
> 37777 29416 96 67289 106d9 ./x86_64-linux-user/tcg/tcg.o
> 39162 28816 96 68074 109ea ./arm-linux-user/tcg/tcg.o
> 40858 29096 88 70042 1119a ./arm-softmmu/tcg/tcg.o
> 39473 29672 88 69233 10e71 ./x86_64-softmmu/tcg/tcg.o
>
> Suggested-by: Stefan Weil <sw@weilnetz.de>
> Suggested-by: Richard Henderson <rth@twiddle.net>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> ---
> tcg/tcg.h | 22 +++++++++++++---------
> 1 file changed, 13 insertions(+), 9 deletions(-)
>
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index add7f75..71ae7b2 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -193,7 +193,7 @@ typedef struct TCGPool {
> typedef enum TCGType {
> TCG_TYPE_I32,
> TCG_TYPE_I64,
> - TCG_TYPE_COUNT, /* number of different types */
> + TCG_TYPE_COUNT, /* number of different types, see TCG_TYPE_NR_BITS */
>
> /* An alias for the size of the host register. */
> #if TCG_TARGET_REG_BITS == 32
> @@ -217,6 +217,9 @@ typedef enum TCGType {
> #endif
> } TCGType;
>
> +/* used for bitfield packing to save space */
> +#define TCG_TYPE_NR_BITS 1
> +
> /* Constants for qemu_ld and qemu_st for the Memory Operation field. */
> typedef enum TCGMemOp {
> MO_8 = 0,
> @@ -421,16 +424,14 @@ static inline TCGCond tcg_high_cond(TCGCond c)
> #define TEMP_VAL_REG 1
> #define TEMP_VAL_MEM 2
> #define TEMP_VAL_CONST 3
> +#define TEMP_VAL_NR_BITS 2
A similar compile time check could be added here.
>
> -/* XXX: optimize memory layout */
> typedef struct TCGTemp {
> - TCGType base_type;
> - TCGType type;
> - int val_type;
> - int reg;
> - tcg_target_long val;
> - int mem_reg;
> - intptr_t mem_offset;
> + unsigned int reg:8;
> + unsigned int mem_reg:8;
> + unsigned int val_type:TEMP_VAL_NR_BITS;
> + unsigned int base_type:TCG_TYPE_NR_BITS;
> + unsigned int type:TCG_TYPE_NR_BITS;
> unsigned int fixed_reg:1;
> unsigned int mem_coherent:1;
> unsigned int mem_allocated:1;
> @@ -438,6 +439,9 @@ typedef struct TCGTemp {
> basic blocks. Otherwise, it is not
> preserved across basic blocks. */
> unsigned int temp_allocated:1; /* never used for code gen */
> +
> + tcg_target_long val;
> + intptr_t mem_offset;
> const char *name;
> } TCGTemp;
--
Alex Bennée
next prev parent reply other threads:[~2015-03-27 9:54 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-21 6:27 [Qemu-trivial] [PATCH] tcg: pack TCGTemp to reduce size by 8 bytes Emilio G. Cota
2015-03-21 6:27 ` [Qemu-devel] " Emilio G. Cota
2015-03-23 21:42 ` [Qemu-trivial] " Stefan Weil
2015-03-23 21:42 ` Stefan Weil
2015-03-24 1:07 ` [Qemu-trivial] " Richard Henderson
2015-03-24 1:07 ` Richard Henderson
2015-03-25 19:50 ` [Qemu-trivial] [PATCH] tcg: optimise memory layout of TCGTemp Emilio G. Cota
2015-03-25 19:50 ` [Qemu-devel] " Emilio G. Cota
2015-03-27 9:55 ` Alex Bennée [this message]
2015-03-27 9:55 ` Alex Bennée
2015-03-27 21:09 ` [Qemu-trivial] " Emilio G. Cota
2015-03-27 21:09 ` Emilio G. Cota
2015-03-30 9:55 ` [Qemu-trivial] " Laurent Desnogues
2015-03-30 9:55 ` Laurent Desnogues
2015-03-27 14:58 ` [Qemu-trivial] " Richard Henderson
2015-03-27 14:58 ` [Qemu-devel] " Richard Henderson
-- strict thread matches above, loose matches on Subject: below --
2015-03-29 21:52 [Qemu-trivial] " Richard Henderson
2015-03-30 5:33 ` Stefan Weil
2015-03-30 5:43 ` Stefan Weil
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87y4mibw94.fsf@linaro.org \
--to=alex.bennee@linaro.org \
--cc=cota@braap.org \
--cc=qemu-devel@nongnu.org \
--cc=qemu-trivial@nongnu.org \
--cc=rth@twiddle.net \
--cc=sw@weilnetz.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.