qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] Using new TCG Vector infrastructure in PowerPC
@ 2018-03-07 10:03 Nikunj A Dadhania
  2018-03-15 20:02 ` Richard Henderson
  0 siblings, 1 reply; 5+ messages in thread
From: Nikunj A Dadhania @ 2018-03-07 10:03 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

Hi Richard,

I was working to get TCG vector support for PowerPC[1]. Started with
converting logical operations like vector AND/OR/XOR and compare
instructions. Found some inconsistency during my testing on x86 laptop
emulating PowerPC:

zero = 0000000000000000 0000000000000000 
max = ffffffffffffffff ffffffffffffffff 

1) tcg_gen_andc_vec - vandc in PPC

New API result:
andc(zero, max) -  (zero & ~max ) = 0000000000000000 0000000000000000
andc(max, zero) -  (max & ~zero ) = ffffffffffffffff ffffffffffffffff
andc(max, max)  -  (max & ~max ) = ffffffffffffffff ffffffffffffffff    -->WRONG
andc(zero, zero)-  (zero & ~zero ) = 0000000000000000 0000000000000000 

Expected result:
andc(zero, max)  (zero & ~max ) = 0000000000000000 0000000000000000 
andc(max, zero)  (max & ~zero ) = ffffffffffffffff ffffffffffffffff 
andc(max, max)   (max & ~max ) = 0000000000000000 0000000000000000 
andc(zero, zero) (zero & ~zero ) = 0000000000000000 0000000000000000 

2) tcg_gen_or_vec - vor in PPC

New API result:
(zero | max ) = 0000000000000000 0000000000000000   -----> WRONG
(max | max ) = ffffffffffffffff ffffffffffffffff 
(zero | zero ) = 0000000000000000 0000000000000000

Expected result:
(zero | max ) = ffffffffffffffff ffffffffffffffff 
(max | max ) = ffffffffffffffff ffffffffffffffff 
(zero | zero ) = 0000000000000000 0000000000000000 

3) tcg_gen_cmp_vec(TCG_COND_EQ) - vcmpequ* in PPC

New API result(all incorrect):

vcmpequb (zero == zero ) = 0000000000000000 0000000000000000 
vcmpequh (zero == zero ) = 0000000000000000 0000000000000000 
vcmpequw (zero == zero ) = 0000000000000000 0000000000000000 
vcmpequd (zero == zero ) = 0000000000000000 0000000000000000

Expected result:
vcmpequb (zero == zero ) = ffffffffffffffff ffffffffffffffff 
vcmpequh (zero == zero ) = ffffffffffffffff ffffffffffffffff 
vcmpequw (zero == zero ) = ffffffffffffffff ffffffffffffffff 
vcmpequd (zero == zero ) = ffffffffffffffff ffffffffffffffff

Do you see something that I am missing here ?

Regards,
Nikunj

1. PowerPC TCG vector infrastructure implementation
https://github.com/nikunjad/qemu/tree/ppc_vec_0

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] Using new TCG Vector infrastructure in PowerPC
  2018-03-07 10:03 [Qemu-devel] Using new TCG Vector infrastructure in PowerPC Nikunj A Dadhania
@ 2018-03-15 20:02 ` Richard Henderson
  2018-03-16  4:08   ` Nikunj A Dadhania
  0 siblings, 1 reply; 5+ messages in thread
From: Richard Henderson @ 2018-03-15 20:02 UTC (permalink / raw)
  To: Nikunj A Dadhania, qemu-devel

On 03/07/2018 06:03 PM, Nikunj A Dadhania wrote:
> Hi Richard,
> 
> I was working to get TCG vector support for PowerPC[1]. Started with
> converting logical operations like vector AND/OR/XOR and compare
> instructions. Found some inconsistency during my testing on x86 laptop
> emulating PowerPC:

Great.

Well, the problem is that you cannot use TCG generic vectors and TCG global
variables to access the same memory.

Thus your first step must be to remove all references to cpu_avrh and cpu_avrl.
 These can be replaced by translator helpers that perform an explicit
tcg_gen_ld_i64 or tcg_gen_st_i64 to the proper memory locations.

Only after that's done can you begin converting other references to use the
host vectors.  Otherwise, the tcg optimizer will do Bad and Unpredictable
Things, which may well have produced the incorrect results that you saw.

I'll note that it's probably worth rearranging all of {fpr,vsr,avr} to the more
logical configuration presented by Power7 (?) such that it's one array of 64 x
128-bit registers.


r~

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] Using new TCG Vector infrastructure in PowerPC
  2018-03-15 20:02 ` Richard Henderson
@ 2018-03-16  4:08   ` Nikunj A Dadhania
  2018-03-16  7:01     ` Richard Henderson
  0 siblings, 1 reply; 5+ messages in thread
From: Nikunj A Dadhania @ 2018-03-16  4:08 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

Richard Henderson <richard.henderson@linaro.org> writes:

> On 03/07/2018 06:03 PM, Nikunj A Dadhania wrote:
>> Hi Richard,
>> 
>> I was working to get TCG vector support for PowerPC[1]. Started with
>> converting logical operations like vector AND/OR/XOR and compare
>> instructions. Found some inconsistency during my testing on x86 laptop
>> emulating PowerPC:
>
> Great.
>
> Well, the problem is that you cannot use TCG generic vectors and TCG global
> variables to access the same memory.

Interesting, wasn't aware of this.

> Thus your first step must be to remove all references to cpu_avrh and cpu_avrl.
>  These can be replaced by translator helpers that perform an explicit
> tcg_gen_ld_i64 or tcg_gen_st_i64 to the proper memory locations.
>
> Only after that's done can you begin converting other references to use the
> host vectors.  Otherwise, the tcg optimizer will do Bad and Unpredictable
> Things, which may well have produced the incorrect results that you saw.
>
> I'll note that it's probably worth rearranging all of {fpr,vsr,avr} to the more
> logical configuration presented by Power7 (?) such that it's one array of 64 x
> 128-bit registers.

I have following for taking care of making VSRs contiguous 128bits. This
has touched lot of code even out of tcg directory. So I currently have
32 AVRs (128bits) and 32 VSRs (128 bits).

@@ -1026,8 +1027,8 @@ struct CPUPPCState {
 
     /* Floating point execution context */
     float_status fp_status;
-    /* floating point registers */
-    float64 fpr[32];
+    /* floating point registers multiplexed with vsr */
+
     /* floating point status and control register */
     target_ulong fpscr;
 
@@ -1078,8 +1079,8 @@ struct CPUPPCState {
     /* Altivec registers */
     ppc_avr_t avr[32];
     uint32_t vscr;
-    /* VSX registers */
-    uint64_t vsr[32];
+    /* 32 (128bit)- VSX registers */
+    ppc_avr_t vsr[32];
     /* SPE registers */
     uint64_t spe_acc;
     uint32_t spe_fscr;

Regards,
Nikunj

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] Using new TCG Vector infrastructure in PowerPC
  2018-03-16  4:08   ` Nikunj A Dadhania
@ 2018-03-16  7:01     ` Richard Henderson
  2018-03-16  9:33       ` Nikunj A Dadhania
  0 siblings, 1 reply; 5+ messages in thread
From: Richard Henderson @ 2018-03-16  7:01 UTC (permalink / raw)
  To: Nikunj A Dadhania, qemu-devel

On 03/16/2018 12:08 PM, Nikunj A Dadhania wrote:
> @@ -1078,8 +1079,8 @@ struct CPUPPCState {
>      /* Altivec registers */
>      ppc_avr_t avr[32];
>      uint32_t vscr;
> -    /* VSX registers */
> -    uint64_t vsr[32];
> +    /* 32 (128bit)- VSX registers */
> +    ppc_avr_t vsr[32];

Another thing that needs to happen is to make ppc_avr_t to be 16-byte aligned
(this is documented in tcg-gvec-op.h, I believe).

This is easily accomplished by adding QEMU_ALIGNED(16) to the first union
member.  And then you'd like to put vsr adjacent to avr so that you're not
adding another alignment hole.


r~

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] Using new TCG Vector infrastructure in PowerPC
  2018-03-16  7:01     ` Richard Henderson
@ 2018-03-16  9:33       ` Nikunj A Dadhania
  0 siblings, 0 replies; 5+ messages in thread
From: Nikunj A Dadhania @ 2018-03-16  9:33 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

Richard Henderson <richard.henderson@linaro.org> writes:

> On 03/16/2018 12:08 PM, Nikunj A Dadhania wrote:
>> @@ -1078,8 +1079,8 @@ struct CPUPPCState {
>>      /* Altivec registers */
>>      ppc_avr_t avr[32];
>>      uint32_t vscr;
>> -    /* VSX registers */
>> -    uint64_t vsr[32];
>> +    /* 32 (128bit)- VSX registers */
>> +    ppc_avr_t vsr[32];
>
> Another thing that needs to happen is to make ppc_avr_t to be 16-byte aligned
> (this is documented in tcg-gvec-op.h, I believe).
>
> This is easily accomplished by adding QEMU_ALIGNED(16) to the first union
> member.  And then you'd like to put vsr adjacent to avr so that you're not
> adding another alignment hole.

Sure, will do that.

Regards
Nikunj

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-03-16  9:33 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-03-07 10:03 [Qemu-devel] Using new TCG Vector infrastructure in PowerPC Nikunj A Dadhania
2018-03-15 20:02 ` Richard Henderson
2018-03-16  4:08   ` Nikunj A Dadhania
2018-03-16  7:01     ` Richard Henderson
2018-03-16  9:33       ` Nikunj A Dadhania

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).