* [Qemu-devel] [PATCH] tcg: pack TCGTemp to reduce size by 8 bytes @ 2015-03-21 6:27 Emilio G. Cota 2015-03-23 21:42 ` Stefan Weil 0 siblings, 1 reply; 8+ messages in thread From: Emilio G. Cota @ 2015-03-21 6:27 UTC (permalink / raw) To: qemu-devel; +Cc: qemu-trivial, Richard Henderson This brings down the size of the struct from 56 to 48 bytes. Signed-off-by: Emilio G. Cota <cota@braap.org> --- tcg/tcg.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tcg/tcg.h b/tcg/tcg.h index add7f75..3276924 100644 --- a/tcg/tcg.h +++ b/tcg/tcg.h @@ -429,8 +429,8 @@ typedef struct TCGTemp { int val_type; int reg; tcg_target_long val; - int mem_reg; intptr_t mem_offset; + int mem_reg; unsigned int fixed_reg:1; unsigned int mem_coherent:1; unsigned int mem_allocated:1; -- 1.9.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH] tcg: pack TCGTemp to reduce size by 8 bytes 2015-03-21 6:27 [Qemu-devel] [PATCH] tcg: pack TCGTemp to reduce size by 8 bytes Emilio G. Cota @ 2015-03-23 21:42 ` Stefan Weil 2015-03-24 1:07 ` Richard Henderson 0 siblings, 1 reply; 8+ messages in thread From: Stefan Weil @ 2015-03-23 21:42 UTC (permalink / raw) To: Emilio G. Cota, qemu-devel; +Cc: qemu-trivial, Richard Henderson Am 21.03.2015 um 07:27 schrieb Emilio G. Cota: > This brings down the size of the struct from 56 to 48 bytes. > > Signed-off-by: Emilio G. Cota <cota@braap.org> > --- > tcg/tcg.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/tcg/tcg.h b/tcg/tcg.h > index add7f75..3276924 100644 > --- a/tcg/tcg.h > +++ b/tcg/tcg.h > @@ -429,8 +429,8 @@ typedef struct TCGTemp { > int val_type; > int reg; > tcg_target_long val; > - int mem_reg; > intptr_t mem_offset; > + int mem_reg; > unsigned int fixed_reg:1; > unsigned int mem_coherent:1; > unsigned int mem_allocated:1; Reviewed-by: Stefan Weil <sw@weilnetz.de> TCGContext includes an array of TCGTemp, so it is even reduced by 4 KiB (good for caching), and tcg.o now uses 55364 instead of 56116 bytes (maybe faster, too). Further optimizations are possible. TCGTemp can be reduced to 32 bytes as the output of pahole shows: struct TCGTemp { TCGTempVal val_type:8; /* 0:24 4 */ unsigned int reg:8; /* 0:16 4 */ unsigned int mem_reg:8; /* 0: 8 4 */ /* Bitfield combined with next fields */ _Bool fixed_reg:1; /* 3: 7 1 */ _Bool mem_coherent:1; /* 3: 6 1 */ _Bool mem_allocated:1; /* 3: 5 1 */ _Bool temp_local:1; /* 3: 4 1 */ _Bool temp_allocated:1; /* 3: 3 1 */ /* XXX 3 bits hole, try to pack */ TCGType base_type:16; /* 4:16 4 */ TCGType type:16; /* 4: 0 4 */ tcg_target_long val; /* 8 8 */ intptr_t mem_offset; /* 16 8 */ const char * name; /* 24 8 */ /* size: 32, cachelines: 1, members: 13 */ /* bit holes: 1, sum bit holes: 3 bits */ /* last cacheline: 32 bytes */ }; Here I used a new enum type for val_type and reduced some values to 8 or 16 bit. I also put the two most often used values at the beginning, so they can be addressed without or with a small offset ("often" in the code, no runtime data available). Are such optimizations useful? Stefan ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH] tcg: pack TCGTemp to reduce size by 8 bytes 2015-03-23 21:42 ` Stefan Weil @ 2015-03-24 1:07 ` Richard Henderson 2015-03-25 19:50 ` [Qemu-devel] [PATCH] tcg: optimise memory layout of TCGTemp Emilio G. Cota 0 siblings, 1 reply; 8+ messages in thread From: Richard Henderson @ 2015-03-24 1:07 UTC (permalink / raw) To: Stefan Weil, Emilio G. Cota, qemu-devel; +Cc: qemu-trivial On 03/23/2015 02:42 PM, Stefan Weil wrote: > Further optimizations are possible. TCGTemp can be reduced to 32 bytes as the > output > of pahole shows: > > struct TCGTemp { > TCGTempVal val_type:8; /* 0:24 4 */ Need only be 2 bits. > unsigned int reg:8; /* 0:16 4 */ > unsigned int mem_reg:8; /* 0: 8 4 */ Need only be 6 (ia64) bits, but an aligned 8-bit slot probably performs best. > > /* Bitfield combined with next fields */ > > _Bool fixed_reg:1; /* 3: 7 1 */ > _Bool mem_coherent:1; /* 3: 6 1 */ > _Bool mem_allocated:1; /* 3: 5 1 */ > _Bool temp_local:1; /* 3: 4 1 */ > _Bool temp_allocated:1; /* 3: 3 1 */ > > /* XXX 3 bits hole, try to pack */ > > TCGType base_type:16; /* 4:16 4 */ > TCGType type:16; /* 4: 0 4 */ Need only be 1 bit, honestly, but 2 bits might be easier to arrange. Anyway, you're down to 23 bits from the word, or 16 bytes on a 32-bit host. It's no better than the 32 bytes you got for a 64-bit host though. > tcg_target_long val; /* 8 8 */ > intptr_t mem_offset; /* 16 8 */ > const char * name; /* 24 8 */ > > /* size: 32, cachelines: 1, members: 13 */ > /* bit holes: 1, sum bit holes: 3 bits */ > /* last cacheline: 32 bytes */ > }; > > Here I used a new enum type for val_type and reduced some values to 8 or 16 bit. > I also put the two most often used values at the beginning, so they can be > addressed without or with a small offset ("often" in the code, no runtime > data available). > > Are such optimizations useful? Yes, I think so. Especially because of the rather large arrays we build. r~ ^ permalink raw reply [flat|nested] 8+ messages in thread
* [Qemu-devel] [PATCH] tcg: optimise memory layout of TCGTemp 2015-03-24 1:07 ` Richard Henderson @ 2015-03-25 19:50 ` Emilio G. Cota 2015-03-27 9:55 ` Alex Bennée 2015-03-27 14:58 ` Richard Henderson 0 siblings, 2 replies; 8+ messages in thread From: Emilio G. Cota @ 2015-03-25 19:50 UTC (permalink / raw) To: Stefan Weil, Richard Henderson; +Cc: qemu-trivial, qemu-devel This brings down the size of the struct from 56 to 32 bytes on 64-bit, and to 16 bytes on 32-bit. The appended adds macros to prevent us from mistakenly overflowing the bitfields when more elements are added to the corresponding enums/macros. Note that reg/mem_reg need only 6 bits (for ia64) but for performance is probably better to align them to a byte address. Given that TCGTemp is used in large arrays this leads to a few KBs of savings. However, unpacking the bits takes additional code, so the net effect depends on the target (host is x86_64): Before: $ find . -name 'tcg.o' | xargs size text data bss dec hex filename 41131 29800 88 71019 1156b ./aarch64-softmmu/tcg/tcg.o 37969 29416 96 67481 10799 ./x86_64-linux-user/tcg/tcg.o 39354 28816 96 68266 10aaa ./arm-linux-user/tcg/tcg.o 40802 29096 88 69986 11162 ./arm-softmmu/tcg/tcg.o 39417 29672 88 69177 10e39 ./x86_64-softmmu/tcg/tcg.o After: $ find . -name 'tcg.o' | xargs size text data bss dec hex filename 41187 29800 88 71075 115a3 ./aarch64-softmmu/tcg/tcg.o 37777 29416 96 67289 106d9 ./x86_64-linux-user/tcg/tcg.o 39162 28816 96 68074 109ea ./arm-linux-user/tcg/tcg.o 40858 29096 88 70042 1119a ./arm-softmmu/tcg/tcg.o 39473 29672 88 69233 10e71 ./x86_64-softmmu/tcg/tcg.o Suggested-by: Stefan Weil <sw@weilnetz.de> Suggested-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> --- tcg/tcg.h | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-) diff --git a/tcg/tcg.h b/tcg/tcg.h index add7f75..71ae7b2 100644 --- a/tcg/tcg.h +++ b/tcg/tcg.h @@ -193,7 +193,7 @@ typedef struct TCGPool { typedef enum TCGType { TCG_TYPE_I32, TCG_TYPE_I64, - TCG_TYPE_COUNT, /* number of different types */ + TCG_TYPE_COUNT, /* number of different types, see TCG_TYPE_NR_BITS */ /* An alias for the size of the host register. */ #if TCG_TARGET_REG_BITS == 32 @@ -217,6 +217,9 @@ typedef enum TCGType { #endif } TCGType; +/* used for bitfield packing to save space */ +#define TCG_TYPE_NR_BITS 1 + /* Constants for qemu_ld and qemu_st for the Memory Operation field. */ typedef enum TCGMemOp { MO_8 = 0, @@ -421,16 +424,14 @@ static inline TCGCond tcg_high_cond(TCGCond c) #define TEMP_VAL_REG 1 #define TEMP_VAL_MEM 2 #define TEMP_VAL_CONST 3 +#define TEMP_VAL_NR_BITS 2 -/* XXX: optimize memory layout */ typedef struct TCGTemp { - TCGType base_type; - TCGType type; - int val_type; - int reg; - tcg_target_long val; - int mem_reg; - intptr_t mem_offset; + unsigned int reg:8; + unsigned int mem_reg:8; + unsigned int val_type:TEMP_VAL_NR_BITS; + unsigned int base_type:TCG_TYPE_NR_BITS; + unsigned int type:TCG_TYPE_NR_BITS; unsigned int fixed_reg:1; unsigned int mem_coherent:1; unsigned int mem_allocated:1; @@ -438,6 +439,9 @@ typedef struct TCGTemp { basic blocks. Otherwise, it is not preserved across basic blocks. */ unsigned int temp_allocated:1; /* never used for code gen */ + + tcg_target_long val; + intptr_t mem_offset; const char *name; } TCGTemp; -- 1.9.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH] tcg: optimise memory layout of TCGTemp 2015-03-25 19:50 ` [Qemu-devel] [PATCH] tcg: optimise memory layout of TCGTemp Emilio G. Cota @ 2015-03-27 9:55 ` Alex Bennée 2015-03-27 21:09 ` Emilio G. Cota 2015-03-27 14:58 ` Richard Henderson 1 sibling, 1 reply; 8+ messages in thread From: Alex Bennée @ 2015-03-27 9:55 UTC (permalink / raw) To: Emilio G. Cota; +Cc: qemu-trivial, Stefan Weil, qemu-devel, Richard Henderson Emilio G. Cota <cota@braap.org> writes: > This brings down the size of the struct from 56 to 32 bytes on 64-bit, > and to 16 bytes on 32-bit. Have you been able to measure any performance improvement with these new structures? In theory, if aligned with cache lines, performance should improve but real numbers would be nice. > > The appended adds macros to prevent us from mistakenly overflowing > the bitfields when more elements are added to the corresponding > enums/macros. I can see the defines but I can't see any checks. Should we be able to do a compile time check if TCG_TYPE_COUNT doesn't fit into TCG_TYPE_NR_BITS? > > Note that reg/mem_reg need only 6 bits (for ia64) but for performance > is probably better to align them to a byte address. > > Given that TCGTemp is used in large arrays this leads to a few KBs > of savings. However, unpacking the bits takes additional code, so > the net effect depends on the target (host is x86_64): > > Before: > $ find . -name 'tcg.o' | xargs size > text data bss dec hex filename > 41131 29800 88 71019 1156b ./aarch64-softmmu/tcg/tcg.o > 37969 29416 96 67481 10799 ./x86_64-linux-user/tcg/tcg.o > 39354 28816 96 68266 10aaa ./arm-linux-user/tcg/tcg.o > 40802 29096 88 69986 11162 ./arm-softmmu/tcg/tcg.o > 39417 29672 88 69177 10e39 ./x86_64-softmmu/tcg/tcg.o > > After: > $ find . -name 'tcg.o' | xargs size > text data bss dec hex filename > 41187 29800 88 71075 115a3 ./aarch64-softmmu/tcg/tcg.o > 37777 29416 96 67289 106d9 ./x86_64-linux-user/tcg/tcg.o > 39162 28816 96 68074 109ea ./arm-linux-user/tcg/tcg.o > 40858 29096 88 70042 1119a ./arm-softmmu/tcg/tcg.o > 39473 29672 88 69233 10e71 ./x86_64-softmmu/tcg/tcg.o > > Suggested-by: Stefan Weil <sw@weilnetz.de> > Suggested-by: Richard Henderson <rth@twiddle.net> > Signed-off-by: Emilio G. Cota <cota@braap.org> > --- > tcg/tcg.h | 22 +++++++++++++--------- > 1 file changed, 13 insertions(+), 9 deletions(-) > > diff --git a/tcg/tcg.h b/tcg/tcg.h > index add7f75..71ae7b2 100644 > --- a/tcg/tcg.h > +++ b/tcg/tcg.h > @@ -193,7 +193,7 @@ typedef struct TCGPool { > typedef enum TCGType { > TCG_TYPE_I32, > TCG_TYPE_I64, > - TCG_TYPE_COUNT, /* number of different types */ > + TCG_TYPE_COUNT, /* number of different types, see TCG_TYPE_NR_BITS */ > > /* An alias for the size of the host register. */ > #if TCG_TARGET_REG_BITS == 32 > @@ -217,6 +217,9 @@ typedef enum TCGType { > #endif > } TCGType; > > +/* used for bitfield packing to save space */ > +#define TCG_TYPE_NR_BITS 1 > + > /* Constants for qemu_ld and qemu_st for the Memory Operation field. */ > typedef enum TCGMemOp { > MO_8 = 0, > @@ -421,16 +424,14 @@ static inline TCGCond tcg_high_cond(TCGCond c) > #define TEMP_VAL_REG 1 > #define TEMP_VAL_MEM 2 > #define TEMP_VAL_CONST 3 > +#define TEMP_VAL_NR_BITS 2 A similar compile time check could be added here. > > -/* XXX: optimize memory layout */ > typedef struct TCGTemp { > - TCGType base_type; > - TCGType type; > - int val_type; > - int reg; > - tcg_target_long val; > - int mem_reg; > - intptr_t mem_offset; > + unsigned int reg:8; > + unsigned int mem_reg:8; > + unsigned int val_type:TEMP_VAL_NR_BITS; > + unsigned int base_type:TCG_TYPE_NR_BITS; > + unsigned int type:TCG_TYPE_NR_BITS; > unsigned int fixed_reg:1; > unsigned int mem_coherent:1; > unsigned int mem_allocated:1; > @@ -438,6 +439,9 @@ typedef struct TCGTemp { > basic blocks. Otherwise, it is not > preserved across basic blocks. */ > unsigned int temp_allocated:1; /* never used for code gen */ > + > + tcg_target_long val; > + intptr_t mem_offset; > const char *name; > } TCGTemp; -- Alex Bennée ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH] tcg: optimise memory layout of TCGTemp 2015-03-27 9:55 ` Alex Bennée @ 2015-03-27 21:09 ` Emilio G. Cota 2015-03-30 9:55 ` Laurent Desnogues 0 siblings, 1 reply; 8+ messages in thread From: Emilio G. Cota @ 2015-03-27 21:09 UTC (permalink / raw) To: Alex Bennée, Richard Henderson; +Cc: qemu-trivial, Stefan Weil, qemu-devel On Fri, Mar 27, 2015 at 09:55:03 +0000, Alex Bennée wrote: > Have you been able to measure any performance improvement with these new > structures? In theory, if aligned with cache lines, performance should > improve but real numbers would be nice. I haven't benchmarked anything, which makes me very uneasy. All I've checked is that the system boots, and FWIW I appreciate no difference in boot time. Is there a benchmark suite to test TCG changes? Until proper benchmarking I wouldn't want to see this merged. For now I propose to merge the initial change (remove 8-byte hole in 64-bit), which is uncontroversial. > > The appended adds macros to prevent us from mistakenly overflowing > > the bitfields when more elements are added to the corresponding > > enums/macros. > > I can see the defines but I can't see any checks. Should we be able to > do a compile time check if TCG_TYPE_COUNT doesn't fit into > TCG_TYPE_NR_BITS? > > +#define TEMP_VAL_NR_BITS 2 > > A similar compile time check could be added here. Ack, addressed below. On Fri, Mar 27, 2015 at 07:58:06 -0700, Richard Henderson wrote: > On 03/25/2015 12:50 PM, Emilio G. Cota wrote: > > +#define TCG_TYPE_NR_BITS 1 > > I'd rather you moved TCG_TYPE_COUNT out of the enum and into a define. Perhaps > even as (1 << TCG_TYPE_NR_BITS). (snip) > > +#define TEMP_VAL_NR_BITS 2 > > And make this an enumeration. > > > typedef struct TCGTemp { (snip) > > + unsigned int base_type:TCG_TYPE_NR_BITS; > > + unsigned int type:TCG_TYPE_NR_BITS; > > And do *not* change these from the enumeration to an unsigned int. > > I know why you did this -- to keep the compiler from warning that the TCGType > enum didn't fit in the bitfield, because of TCG_TYPE_COUNT being an enumerator, > rather than an unrelated number. Except that's exactly the warning we want to > keep, on the off-chance that someone modifies the enums without modifying the > _NR_BITS defines. Agreed, please see below. Thanks, E. [No signoff due to lack of provable perf improvement, see above.] diff --git a/tcg/tcg.h b/tcg/tcg.h index add7f75..afd3f94 100644 --- a/tcg/tcg.h +++ b/tcg/tcg.h @@ -193,7 +193,6 @@ typedef struct TCGPool { typedef enum TCGType { TCG_TYPE_I32, TCG_TYPE_I64, - TCG_TYPE_COUNT, /* number of different types */ /* An alias for the size of the host register. */ #if TCG_TARGET_REG_BITS == 32 @@ -217,6 +216,10 @@ typedef enum TCGType { #endif } TCGType; +/* used for bitfield packing to save space */ +#define TCG_TYPE_NR_BITS 1 +#define TCG_TYPE_COUNT BIT(TCG_TYPE_NR_BITS) + /* Constants for qemu_ld and qemu_st for the Memory Operation field. */ typedef enum TCGMemOp { MO_8 = 0, @@ -417,20 +420,21 @@ static inline TCGCond tcg_high_cond(TCGCond c) } } -#define TEMP_VAL_DEAD 0 -#define TEMP_VAL_REG 1 -#define TEMP_VAL_MEM 2 -#define TEMP_VAL_CONST 3 +typedef enum TCGTempVal { + TEMP_VAL_DEAD, + TEMP_VAL_REG, + TEMP_VAL_MEM, + TEMP_VAL_CONST, +} TCGTempVal; + +#define TEMP_VAL_NR_BITS 2 -/* XXX: optimize memory layout */ typedef struct TCGTemp { - TCGType base_type; - TCGType type; - int val_type; - int reg; - tcg_target_long val; - int mem_reg; - intptr_t mem_offset; + unsigned int reg:8; + unsigned int mem_reg:8; + TCGTempVal val_type:TEMP_VAL_NR_BITS; + TCGType base_type:TCG_TYPE_NR_BITS; + TCGType type:TCG_TYPE_NR_BITS; unsigned int fixed_reg:1; unsigned int mem_coherent:1; unsigned int mem_allocated:1; @@ -438,6 +442,9 @@ typedef struct TCGTemp { basic blocks. Otherwise, it is not preserved across basic blocks. */ unsigned int temp_allocated:1; /* never used for code gen */ + + tcg_target_long val; + intptr_t mem_offset; const char *name; } TCGTemp; ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH] tcg: optimise memory layout of TCGTemp 2015-03-27 21:09 ` Emilio G. Cota @ 2015-03-30 9:55 ` Laurent Desnogues 0 siblings, 0 replies; 8+ messages in thread From: Laurent Desnogues @ 2015-03-30 9:55 UTC (permalink / raw) To: Emilio G. Cota Cc: qemu-trivial, Stefan Weil, Alex Bennée, qemu-devel@nongnu.org, Richard Henderson Hello, On Fri, Mar 27, 2015 at 10:09 PM, Emilio G. Cota <cota@braap.org> wrote: > On Fri, Mar 27, 2015 at 09:55:03 +0000, Alex Bennée wrote: >> Have you been able to measure any performance improvement with these new >> structures? In theory, if aligned with cache lines, performance should >> improve but real numbers would be nice. > > I haven't benchmarked anything, which makes me very uneasy. All > I've checked is that the system boots, and FWIW I appreciate no > difference in boot time. > > Is there a benchmark suite to test TCG changes? > > Until proper benchmarking I wouldn't want to see this merged. For now I > propose to merge the initial change (remove 8-byte hole in 64-bit), > which is uncontroversial. I tested the patch attached to your mail and saw no performance difference on an ARM image booting Linux and then running Sunspider with Google v8. I also tested on one of the 176.gcc inputs with QEMU ARM user mode and again saw no difference. Thanks, Laurent >> > The appended adds macros to prevent us from mistakenly overflowing >> > the bitfields when more elements are added to the corresponding >> > enums/macros. >> >> I can see the defines but I can't see any checks. Should we be able to >> do a compile time check if TCG_TYPE_COUNT doesn't fit into >> TCG_TYPE_NR_BITS? > >> > +#define TEMP_VAL_NR_BITS 2 >> >> A similar compile time check could be added here. > > Ack, addressed below. > > On Fri, Mar 27, 2015 at 07:58:06 -0700, Richard Henderson wrote: >> On 03/25/2015 12:50 PM, Emilio G. Cota wrote: >> > +#define TCG_TYPE_NR_BITS 1 >> >> I'd rather you moved TCG_TYPE_COUNT out of the enum and into a define. Perhaps >> even as (1 << TCG_TYPE_NR_BITS). > (snip) >> > +#define TEMP_VAL_NR_BITS 2 >> >> And make this an enumeration. >> >> > typedef struct TCGTemp { > (snip) >> > + unsigned int base_type:TCG_TYPE_NR_BITS; >> > + unsigned int type:TCG_TYPE_NR_BITS; >> >> And do *not* change these from the enumeration to an unsigned int. >> >> I know why you did this -- to keep the compiler from warning that the TCGType >> enum didn't fit in the bitfield, because of TCG_TYPE_COUNT being an enumerator, >> rather than an unrelated number. Except that's exactly the warning we want to >> keep, on the off-chance that someone modifies the enums without modifying the >> _NR_BITS defines. > > Agreed, please see below. > > Thanks, > > E. > > [No signoff due to lack of provable perf improvement, see above.] > > diff --git a/tcg/tcg.h b/tcg/tcg.h > index add7f75..afd3f94 100644 > --- a/tcg/tcg.h > +++ b/tcg/tcg.h > @@ -193,7 +193,6 @@ typedef struct TCGPool { > typedef enum TCGType { > TCG_TYPE_I32, > TCG_TYPE_I64, > - TCG_TYPE_COUNT, /* number of different types */ > > /* An alias for the size of the host register. */ > #if TCG_TARGET_REG_BITS == 32 > @@ -217,6 +216,10 @@ typedef enum TCGType { > #endif > } TCGType; > > +/* used for bitfield packing to save space */ > +#define TCG_TYPE_NR_BITS 1 > +#define TCG_TYPE_COUNT BIT(TCG_TYPE_NR_BITS) > + > /* Constants for qemu_ld and qemu_st for the Memory Operation field. */ > typedef enum TCGMemOp { > MO_8 = 0, > @@ -417,20 +420,21 @@ static inline TCGCond tcg_high_cond(TCGCond c) > } > } > > -#define TEMP_VAL_DEAD 0 > -#define TEMP_VAL_REG 1 > -#define TEMP_VAL_MEM 2 > -#define TEMP_VAL_CONST 3 > +typedef enum TCGTempVal { > + TEMP_VAL_DEAD, > + TEMP_VAL_REG, > + TEMP_VAL_MEM, > + TEMP_VAL_CONST, > +} TCGTempVal; > + > +#define TEMP_VAL_NR_BITS 2 > > -/* XXX: optimize memory layout */ > typedef struct TCGTemp { > - TCGType base_type; > - TCGType type; > - int val_type; > - int reg; > - tcg_target_long val; > - int mem_reg; > - intptr_t mem_offset; > + unsigned int reg:8; > + unsigned int mem_reg:8; > + TCGTempVal val_type:TEMP_VAL_NR_BITS; > + TCGType base_type:TCG_TYPE_NR_BITS; > + TCGType type:TCG_TYPE_NR_BITS; > unsigned int fixed_reg:1; > unsigned int mem_coherent:1; > unsigned int mem_allocated:1; > @@ -438,6 +442,9 @@ typedef struct TCGTemp { > basic blocks. Otherwise, it is not > preserved across basic blocks. */ > unsigned int temp_allocated:1; /* never used for code gen */ > + > + tcg_target_long val; > + intptr_t mem_offset; > const char *name; > } TCGTemp; > > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH] tcg: optimise memory layout of TCGTemp 2015-03-25 19:50 ` [Qemu-devel] [PATCH] tcg: optimise memory layout of TCGTemp Emilio G. Cota 2015-03-27 9:55 ` Alex Bennée @ 2015-03-27 14:58 ` Richard Henderson 1 sibling, 0 replies; 8+ messages in thread From: Richard Henderson @ 2015-03-27 14:58 UTC (permalink / raw) To: Emilio G. Cota, Stefan Weil; +Cc: qemu-trivial, qemu-devel On 03/25/2015 12:50 PM, Emilio G. Cota wrote: > This brings down the size of the struct from 56 to 32 bytes on 64-bit, > and to 16 bytes on 32-bit. > > The appended adds macros to prevent us from mistakenly overflowing > the bitfields when more elements are added to the corresponding > enums/macros. > > Note that reg/mem_reg need only 6 bits (for ia64) but for performance > is probably better to align them to a byte address. > > Given that TCGTemp is used in large arrays this leads to a few KBs > of savings. However, unpacking the bits takes additional code, so > the net effect depends on the target (host is x86_64): > > Before: > $ find . -name 'tcg.o' | xargs size > text data bss dec hex filename > 41131 29800 88 71019 1156b ./aarch64-softmmu/tcg/tcg.o > 37969 29416 96 67481 10799 ./x86_64-linux-user/tcg/tcg.o > 39354 28816 96 68266 10aaa ./arm-linux-user/tcg/tcg.o > 40802 29096 88 69986 11162 ./arm-softmmu/tcg/tcg.o > 39417 29672 88 69177 10e39 ./x86_64-softmmu/tcg/tcg.o > > After: > $ find . -name 'tcg.o' | xargs size > text data bss dec hex filename > 41187 29800 88 71075 115a3 ./aarch64-softmmu/tcg/tcg.o > 37777 29416 96 67289 106d9 ./x86_64-linux-user/tcg/tcg.o > 39162 28816 96 68074 109ea ./arm-linux-user/tcg/tcg.o > 40858 29096 88 70042 1119a ./arm-softmmu/tcg/tcg.o > 39473 29672 88 69233 10e71 ./x86_64-softmmu/tcg/tcg.o > > Suggested-by: Stefan Weil <sw@weilnetz.de> > Suggested-by: Richard Henderson <rth@twiddle.net> > Signed-off-by: Emilio G. Cota <cota@braap.org> > --- > tcg/tcg.h | 22 +++++++++++++--------- > 1 file changed, 13 insertions(+), 9 deletions(-) > > diff --git a/tcg/tcg.h b/tcg/tcg.h > index add7f75..71ae7b2 100644 > --- a/tcg/tcg.h > +++ b/tcg/tcg.h > @@ -193,7 +193,7 @@ typedef struct TCGPool { > typedef enum TCGType { > TCG_TYPE_I32, > TCG_TYPE_I64, > - TCG_TYPE_COUNT, /* number of different types */ > + TCG_TYPE_COUNT, /* number of different types, see TCG_TYPE_NR_BITS */ > > /* An alias for the size of the host register. */ > #if TCG_TARGET_REG_BITS == 32 > @@ -217,6 +217,9 @@ typedef enum TCGType { > #endif > } TCGType; > > +/* used for bitfield packing to save space */ > +#define TCG_TYPE_NR_BITS 1 I'd rather you moved TCG_TYPE_COUNT out of the enum and into a define. Perhaps even as (1 << TCG_TYPE_NR_BITS). > @@ -421,16 +424,14 @@ static inline TCGCond tcg_high_cond(TCGCond c) > #define TEMP_VAL_REG 1 > #define TEMP_VAL_MEM 2 > #define TEMP_VAL_CONST 3 > +#define TEMP_VAL_NR_BITS 2 And make this an enumeration. > typedef struct TCGTemp { > - TCGType base_type; > - TCGType type; > - int val_type; > - int reg; > - tcg_target_long val; > - int mem_reg; > - intptr_t mem_offset; > + unsigned int reg:8; > + unsigned int mem_reg:8; > + unsigned int val_type:TEMP_VAL_NR_BITS; > + unsigned int base_type:TCG_TYPE_NR_BITS; > + unsigned int type:TCG_TYPE_NR_BITS; And do *not* change these from the enumeration to an unsigned int. I know why you did this -- to keep the compiler from warning that the TCGType enum didn't fit in the bitfield, because of TCG_TYPE_COUNT being an enumerator, rather than an unrelated number. Except that's exactly the warning we want to keep, on the off-chance that someone modifies the enums without modifying the _NR_BITS defines. r~ ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2015-03-30 9:55 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-03-21 6:27 [Qemu-devel] [PATCH] tcg: pack TCGTemp to reduce size by 8 bytes Emilio G. Cota 2015-03-23 21:42 ` Stefan Weil 2015-03-24 1:07 ` Richard Henderson 2015-03-25 19:50 ` [Qemu-devel] [PATCH] tcg: optimise memory layout of TCGTemp Emilio G. Cota 2015-03-27 9:55 ` Alex Bennée 2015-03-27 21:09 ` Emilio G. Cota 2015-03-30 9:55 ` Laurent Desnogues 2015-03-27 14:58 ` Richard Henderson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).