All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [Qemu-trivial] [Qemu-devel] [PATCH] tcg: optimise memory layout of TCGTemp
@ 2015-03-29 21:52 ` Richard Henderson
  0 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2015-03-29 21:52 UTC (permalink / raw)
  To: Emilio G. Cota; +Cc: qemu-trivial, Stefan Weil, Alex Bennée, qemu-devel

On Mar 27, 2015 14:09, "Emilio G. Cota" <cota@braap.org> wrote:
>
> On Fri, Mar 27, 2015 at 09:55:03 +0000, Alex Bennée wrote: 
> > Have you been able to measure any performance improvement with these new 
> > structures? In theory, if aligned with cache lines, performance should 
> > improve but real numbers would be nice. 
>
> I haven't benchmarked anything, which makes me very uneasy. All 
> I've checked is that the system boots, and FWIW I appreciate no 
> difference in boot time.

No decrease in boot time is good. We /know/ we're saving memory, after all.
 
>
> Is there a benchmark suite to test TCG changes? 

No, sorry. 


r~

^ permalink raw reply	[flat|nested] 17+ messages in thread
* Re: [Qemu-trivial] [PATCH] tcg: optimise memory layout of TCGTemp
  2015-03-25 19:50 ` [Qemu-trivial] [PATCH] tcg: optimise memory layout of TCGTemp Emilio G. Cota
@ 2015-03-27 14:58 Richard Henderson
  0 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2015-03-27 14:58 UTC (permalink / raw)
  To: Emilio G. Cota, Stefan Weil; +Cc: qemu-trivial, qemu-devel

On 03/25/2015 12:50 PM, Emilio G. Cota wrote:
> This brings down the size of the struct from 56 to 32 bytes on 64-bit,
> and to 16 bytes on 32-bit.
> 
> The appended adds macros to prevent us from mistakenly overflowing
> the bitfields when more elements are added to the corresponding
> enums/macros.
> 
> Note that reg/mem_reg need only 6 bits (for ia64) but for performance
> is probably better to align them to a byte address.
> 
> Given that TCGTemp is used in large arrays this leads to a few KBs
> of savings. However, unpacking the bits takes additional code, so
> the net effect depends on the target (host is x86_64):
> 
> Before:
> $ find . -name 'tcg.o' | xargs size
>    text    data     bss     dec     hex filename
>   41131   29800      88   71019   1156b ./aarch64-softmmu/tcg/tcg.o
>   37969   29416      96   67481   10799 ./x86_64-linux-user/tcg/tcg.o
>   39354   28816      96   68266   10aaa ./arm-linux-user/tcg/tcg.o
>   40802   29096      88   69986   11162 ./arm-softmmu/tcg/tcg.o
>   39417   29672      88   69177   10e39 ./x86_64-softmmu/tcg/tcg.o
> 
> After:
> $ find . -name 'tcg.o' | xargs size
>    text    data     bss     dec     hex filename
>   41187   29800      88   71075   115a3 ./aarch64-softmmu/tcg/tcg.o
>   37777   29416      96   67289   106d9 ./x86_64-linux-user/tcg/tcg.o
>   39162   28816      96   68074   109ea ./arm-linux-user/tcg/tcg.o
>   40858   29096      88   70042   1119a ./arm-softmmu/tcg/tcg.o
>   39473   29672      88   69233   10e71 ./x86_64-softmmu/tcg/tcg.o
> 
> Suggested-by: Stefan Weil <sw@weilnetz.de>
> Suggested-by: Richard Henderson <rth@twiddle.net>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> ---
>  tcg/tcg.h | 22 +++++++++++++---------
>  1 file changed, 13 insertions(+), 9 deletions(-)
> 
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index add7f75..71ae7b2 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -193,7 +193,7 @@ typedef struct TCGPool {
>  typedef enum TCGType {
>      TCG_TYPE_I32,
>      TCG_TYPE_I64,
> -    TCG_TYPE_COUNT, /* number of different types */
> +    TCG_TYPE_COUNT, /* number of different types, see TCG_TYPE_NR_BITS */
>  
>      /* An alias for the size of the host register.  */
>  #if TCG_TARGET_REG_BITS == 32
> @@ -217,6 +217,9 @@ typedef enum TCGType {
>  #endif
>  } TCGType;
>  
> +/* used for bitfield packing to save space */
> +#define TCG_TYPE_NR_BITS 1

I'd rather you moved TCG_TYPE_COUNT out of the enum and into a define.  Perhaps
even as (1 << TCG_TYPE_NR_BITS).

> @@ -421,16 +424,14 @@ static inline TCGCond tcg_high_cond(TCGCond c)
>  #define TEMP_VAL_REG   1
>  #define TEMP_VAL_MEM   2
>  #define TEMP_VAL_CONST 3
> +#define TEMP_VAL_NR_BITS 2

And make this an enumeration.

>  typedef struct TCGTemp {
> -    TCGType base_type;
> -    TCGType type;
> -    int val_type;
> -    int reg;
> -    tcg_target_long val;
> -    int mem_reg;
> -    intptr_t mem_offset;
> +    unsigned int reg:8;
> +    unsigned int mem_reg:8;
> +    unsigned int val_type:TEMP_VAL_NR_BITS;
> +    unsigned int base_type:TCG_TYPE_NR_BITS;
> +    unsigned int type:TCG_TYPE_NR_BITS;

And do *not* change these from the enumeration to an unsigned int.

I know why you did this -- to keep the compiler from warning that the TCGType
enum didn't fit in the bitfield, because of TCG_TYPE_COUNT being an enumerator,
rather than an unrelated number.  Except that's exactly the warning we want to
keep, on the off-chance that someone modifies the enums without modifying the
_NR_BITS defines.


r~


^ permalink raw reply	[flat|nested] 17+ messages in thread
* Re: [Qemu-trivial] [Qemu-devel] [PATCH] tcg: pack TCGTemp to reduce size by 8 bytes
@ 2015-03-24  1:07 Richard Henderson
  2015-03-25 19:50 ` [Qemu-trivial] [PATCH] tcg: optimise memory layout of TCGTemp Emilio G. Cota
  0 siblings, 1 reply; 17+ messages in thread
From: Richard Henderson @ 2015-03-24  1:07 UTC (permalink / raw)
  To: Stefan Weil, Emilio G. Cota, qemu-devel; +Cc: qemu-trivial

On 03/23/2015 02:42 PM, Stefan Weil wrote:
> Further optimizations are possible. TCGTemp can be reduced to 32 bytes as the
> output
> of pahole shows:
> 
> struct TCGTemp {
>         TCGTempVal                 val_type:8; /*     0:24  4 */

Need only be 2 bits.

>         unsigned int               reg:8; /*     0:16  4 */
>         unsigned int               mem_reg:8; /*     0: 8  4 */

Need only be  6 (ia64) bits, but an aligned 8-bit slot probably performs best.

> 
>         /* Bitfield combined with next fields */
> 
>         _Bool                      fixed_reg:1; /*     3: 7  1 */
>         _Bool                      mem_coherent:1; /*     3: 6  1 */
>         _Bool                      mem_allocated:1; /*     3: 5  1 */
>         _Bool                      temp_local:1; /*     3: 4  1 */
>         _Bool                      temp_allocated:1; /*     3: 3  1 */
> 
>         /* XXX 3 bits hole, try to pack */
> 
>         TCGType                    base_type:16; /*     4:16  4 */
>         TCGType                    type:16; /*     4: 0  4 */

Need only be 1 bit, honestly, but 2 bits might be easier to arrange.  Anyway,
you're down to 23 bits from the word, or 16 bytes on a 32-bit host.  It's no
better than the 32 bytes you got for a 64-bit host though.


>         tcg_target_long            val; /*     8     8 */
>         intptr_t                   mem_offset; /*    16     8 */
>         const char  *              name; /*    24     8 */
> 
>         /* size: 32, cachelines: 1, members: 13 */
>         /* bit holes: 1, sum bit holes: 3 bits */
>         /* last cacheline: 32 bytes */
> };
> 
> Here I used a new enum type for val_type and reduced some values to 8 or 16 bit.
> I also put the two most often used values at the beginning, so they can be
> addressed without or with a small offset ("often" in the code, no runtime
> data available).
> 
> Are such optimizations useful?

Yes, I think so.  Especially because of the rather large arrays we build.


r~


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2015-04-07 14:59 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-03-29 21:52 [Qemu-trivial] [Qemu-devel] [PATCH] tcg: optimise memory layout of TCGTemp Richard Henderson
2015-03-29 21:52 ` Richard Henderson
2015-03-30  5:33 ` [Qemu-trivial] " Stefan Weil
2015-03-30  5:33   ` Stefan Weil
2015-03-30  5:43 ` [Qemu-trivial] " Stefan Weil
2015-03-30  5:43   ` Stefan Weil
2015-04-03  0:07   ` [Qemu-trivial] [PATCH v2] " Emilio G. Cota
2015-04-03  0:07     ` [Qemu-devel] " Emilio G. Cota
2015-04-03  8:13     ` [Qemu-trivial] " Stefan Weil
2015-04-03  8:13       ` [Qemu-devel] " Stefan Weil
2015-04-03 14:17     ` [Qemu-trivial] " Richard Henderson
2015-04-03 14:17       ` [Qemu-devel] " Richard Henderson
2015-04-07 14:59     ` [Qemu-trivial] " Alex Bennée
2015-04-07 14:59       ` [Qemu-devel] " Alex Bennée
  -- strict thread matches above, loose matches on Subject: below --
2015-03-27 14:58 [Qemu-trivial] [PATCH] " Richard Henderson
2015-03-24  1:07 [Qemu-trivial] [Qemu-devel] [PATCH] tcg: pack TCGTemp to reduce size by 8 bytes Richard Henderson
2015-03-25 19:50 ` [Qemu-trivial] [PATCH] tcg: optimise memory layout of TCGTemp Emilio G. Cota
2015-03-27  9:55   ` [Qemu-trivial] [Qemu-devel] " Alex Bennée
2015-03-27 21:09     ` Emilio G. Cota
2015-03-30  9:55       ` Laurent Desnogues

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.