qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* chacha20-s390 broken in 8.2.0 in TCG on s390x
@ 2023-12-21 14:51 Michael Tokarev
  2024-01-03  0:22 ` Richard Henderson
  0 siblings, 1 reply; 10+ messages in thread
From: Michael Tokarev @ 2023-12-21 14:51 UTC (permalink / raw)
  To: QEMU Developers; +Cc: Richard Henderson

When running current kernel on s390x in tcg mode *on s390x hw*, the following
is generated when loading crypto selftest module (it gets loaded automatically):

[   10.546690] alg: skcipher: chacha20-s390 encryption test failed (wrong result) on test vector 1, cfg="in-place (one sglist)"
[   10.546914] alg: self-tests for chacha20 using chacha20-s390 failed (rc=-22)
[   10.546969] ------------[ cut here ]------------
[   10.546998] alg: self-tests for chacha20 using chacha20-s390 failed (rc=-22)
[   10.547182] WARNING: CPU: 1 PID: 109 at crypto/testmgr.c:5936 alg_test+0x55a/0x5b8
[   10.547510] Modules linked in: net_failover chacha_s390(+) libchacha virtio_blk(+) failover
[   10.547854] CPU: 1 PID: 109 Comm: cryptomgr_test Not tainted 6.5.0-5-s390x #1  Debian 6.5.13-1
[   10.548002] Hardware name: QEMU 8561 QEMU (KVM/Linux)
[   10.548101] Krnl PSW : 0704c00180000000 00000000005df8fe (alg_test+0x55e/0x5b8)
[   10.548207]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
[   10.548291] Krnl GPRS: 0000000000000000 0000000001286408 00000000005df8fa 0000000001286408
[   10.548337]            000000000014bf14 00000000001c6ba8 0000000001838b3c 0000000000000005
[   10.548475]            00000000025a4880 00000000025a4800 ffffffffffffffea 00000000ffffffea
[   10.548521]            000000003e649200 00000000ffffffff 00000000005df8fa 000003800016bcf8
[   10.549504] Krnl Code: 00000000005df8ee: c020003b5828	larl	%r2,0000000000d4a93e
[   10.549504]            00000000005df8f4: c0e5ffdb62d2	brasl	%r14,000000000014be98
[   10.549504]           #00000000005df8fa: af000000		mc	0,0
[   10.549504]           >00000000005df8fe: a7f4fee6		brc	15,00000000005df6ca
[   10.549504]            00000000005df902: b9040042		lgr	%r4,%r2
[   10.549504]            00000000005df906: b9040039		lgr	%r3,%r9
[   10.549504]            00000000005df90a: c020003b57df	larl	%r2,0000000000d4a8c8
[   10.549504]            00000000005df910: 18bd		lr	%r11,%r13
[   10.550004] Call Trace:
[   10.550375]  [<00000000005df8fe>] alg_test+0x55e/0x5b8
[   10.550467] ([<00000000005df8fa>] alg_test+0x55a/0x5b8)
[   10.550489]  [<00000000005d9fbc>] cryptomgr_test+0x34/0x60
[   10.550514]  [<000000000017d004>] kthread+0x124/0x130
[   10.550539]  [<0000000000103124>] __ret_from_fork+0x3c/0x50
[   10.550562]  [<0000000000b1dfca>] ret_from_fork+0xa/0x30
[   10.550611] Last Breaking-Event-Address:
[   10.550626]  [<000000000014bf20>] __warn_printk+0x88/0x110
[   10.550723] ---[ end trace 0000000000000000 ]---

git bisect points to this commit:

commit ab84dc398b3b702b0c692538b947ef65dbbdf52f
Author: Richard Henderson <richard.henderson@linaro.org>
Date:   Wed Aug 23 23:04:24 2023 -0700

     tcg/optimize: Optimize env memory operations

So far, this seems to work on amd64 host, but fails on s390x host -
where this has been observed so far.  Maybe it also fails in some
other combinations too, I don't yet know.  Just finished bisecting
it on s390x.

FWIW,

/mjt


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: chacha20-s390 broken in 8.2.0 in TCG on s390x
  2023-12-21 14:51 chacha20-s390 broken in 8.2.0 in TCG on s390x Michael Tokarev
@ 2024-01-03  0:22 ` Richard Henderson
  2024-01-03  8:54   ` Michael Tokarev
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Henderson @ 2024-01-03  0:22 UTC (permalink / raw)
  To: Michael Tokarev, QEMU Developers

On 12/22/23 01:51, Michael Tokarev wrote:
> When running current kernel on s390x in tcg mode *on s390x hw*, the following
> is generated when loading crypto selftest module (it gets loaded automatically):
> 
> [   10.546690] alg: skcipher: chacha20-s390 encryption test failed (wrong result) on test 
> vector 1, cfg="in-place (one sglist)"
> [   10.546914] alg: self-tests for chacha20 using chacha20-s390 failed (rc=-22)
> [   10.546969] ------------[ cut here ]------------
> [   10.546998] alg: self-tests for chacha20 using chacha20-s390 failed (rc=-22)
> [   10.547182] WARNING: CPU: 1 PID: 109 at crypto/testmgr.c:5936 alg_test+0x55a/0x5b8
> [   10.547510] Modules linked in: net_failover chacha_s390(+) libchacha virtio_blk(+) 
> failover
> [   10.547854] CPU: 1 PID: 109 Comm: cryptomgr_test Not tainted 6.5.0-5-s390x #1  Debian 
> 6.5.13-1
> [   10.548002] Hardware name: QEMU 8561 QEMU (KVM/Linux)
> [   10.548101] Krnl PSW : 0704c00180000000 00000000005df8fe (alg_test+0x55e/0x5b8)
> [   10.548207]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> [   10.548291] Krnl GPRS: 0000000000000000 0000000001286408 00000000005df8fa 0000000001286408
> [   10.548337]            000000000014bf14 00000000001c6ba8 0000000001838b3c 0000000000000005
> [   10.548475]            00000000025a4880 00000000025a4800 ffffffffffffffea 00000000ffffffea
> [   10.548521]            000000003e649200 00000000ffffffff 00000000005df8fa 000003800016bcf8
> [   10.549504] Krnl Code: 00000000005df8ee: c020003b5828    larl    %r2,0000000000d4a93e
> [   10.549504]            00000000005df8f4: c0e5ffdb62d2    brasl    %r14,000000000014be98
> [   10.549504]           #00000000005df8fa: af000000        mc    0,0
> [   10.549504]           >00000000005df8fe: a7f4fee6        brc    15,00000000005df6ca
> [   10.549504]            00000000005df902: b9040042        lgr    %r4,%r2
> [   10.549504]            00000000005df906: b9040039        lgr    %r3,%r9
> [   10.549504]            00000000005df90a: c020003b57df    larl    %r2,0000000000d4a8c8
> [   10.549504]            00000000005df910: 18bd        lr    %r11,%r13
> [   10.550004] Call Trace:
> [   10.550375]  [<00000000005df8fe>] alg_test+0x55e/0x5b8
> [   10.550467] ([<00000000005df8fa>] alg_test+0x55a/0x5b8)
> [   10.550489]  [<00000000005d9fbc>] cryptomgr_test+0x34/0x60
> [   10.550514]  [<000000000017d004>] kthread+0x124/0x130
> [   10.550539]  [<0000000000103124>] __ret_from_fork+0x3c/0x50
> [   10.550562]  [<0000000000b1dfca>] ret_from_fork+0xa/0x30
> [   10.550611] Last Breaking-Event-Address:
> [   10.550626]  [<000000000014bf20>] __warn_printk+0x88/0x110
> [   10.550723] ---[ end trace 0000000000000000 ]---
> 
> git bisect points to this commit:
> 
> commit ab84dc398b3b702b0c692538b947ef65dbbdf52f
> Author: Richard Henderson <richard.henderson@linaro.org>
> Date:   Wed Aug 23 23:04:24 2023 -0700
> 
>      tcg/optimize: Optimize env memory operations
> 
> So far, this seems to work on amd64 host, but fails on s390x host -
> where this has been observed so far.  Maybe it also fails in some
> other combinations too, I don't yet know.  Just finished bisecting
> it on s390x.

I haven't been able to build a reproducer for this.
Have you an image or kernel you can share?


r~



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: chacha20-s390 broken in 8.2.0 in TCG on s390x
  2024-01-03  0:22 ` Richard Henderson
@ 2024-01-03  8:54   ` Michael Tokarev
  2024-01-03 11:53     ` Philippe Mathieu-Daudé
  0 siblings, 1 reply; 10+ messages in thread
From: Michael Tokarev @ 2024-01-03  8:54 UTC (permalink / raw)
  To: Richard Henderson, QEMU Developers

03.01.2024 03:22, Richard Henderson wrote:
> On 12/22/23 01:51, Michael Tokarev wrote:
...
>> git bisect points to this commit:
>>
>> commit ab84dc398b3b702b0c692538b947ef65dbbdf52f
>> Author: Richard Henderson <richard.henderson@linaro.org>
>> Date:   Wed Aug 23 23:04:24 2023 -0700
>>
>>      tcg/optimize: Optimize env memory operations
>>
>> So far, this seems to work on amd64 host, but fails on s390x host -
>> where this has been observed so far.  Maybe it also fails in some
>> other combinations too, I don't yet know.  Just finished bisecting
>> it on s390x.
> 
> I haven't been able to build a reproducer for this.
> Have you an image or kernel you can share?

Sure.

Here's my actual testing "image": http://www.corpit.ru/mjt/tmp/s390x-chacha.tar.gz

It contains vmlinuz and initrd - generated on a debian s390x system using standard
debian tools.

Actual command line I used when doing bisection:

  ~/qemu/b/qemu-system-s390x -append "root=/dev/vda rw" -nographic -smp 2 -drive format=raw,file=vmlinuz,if=virtio -no-user-config -m 1G -kernel 
vmlinuz -initrd initrd -snapshot

This command has unrelated stuff, one of which is using of vmlinuz as the hdd
image (in my initial test it was real filesystem image, but it doesn't really
matter), - I don't need this filesystem to be mounted, the prob is visible before
the mount when crypto modules are loaded.

All it needs is to load crypto stuff, - in particular it runs some selftests
at this point.

But please note once again: it works just fine on amd64 hw.  Where it breaks
is the actual s390x *host*, - I did all my tests on a debian s390x porterbox,
an actual s390x machine.

Thanks,

/mjt


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: chacha20-s390 broken in 8.2.0 in TCG on s390x
  2024-01-03  8:54   ` Michael Tokarev
@ 2024-01-03 11:53     ` Philippe Mathieu-Daudé
  2024-01-03 14:01       ` Philippe Mathieu-Daudé
  0 siblings, 1 reply; 10+ messages in thread
From: Philippe Mathieu-Daudé @ 2024-01-03 11:53 UTC (permalink / raw)
  To: Michael Tokarev, Richard Henderson, QEMU Developers; +Cc: qemu-s390x

Hi Richard,

On 3/1/24 09:54, Michael Tokarev wrote:
> 03.01.2024 03:22, Richard Henderson wrote:
>> On 12/22/23 01:51, Michael Tokarev wrote:
> ...
>>> git bisect points to this commit:
>>>
>>> commit ab84dc398b3b702b0c692538b947ef65dbbdf52f
>>> Author: Richard Henderson <richard.henderson@linaro.org>
>>> Date:   Wed Aug 23 23:04:24 2023 -0700
>>>
>>>      tcg/optimize: Optimize env memory operations
>>>
>>> So far, this seems to work on amd64 host, but fails on s390x host -
>>> where this has been observed so far.  Maybe it also fails in some
>>> other combinations too, I don't yet know.  Just finished bisecting
>>> it on s390x.
>>
>> I haven't been able to build a reproducer for this.
>> Have you an image or kernel you can share?
> 
> Sure.
> 
> Here's my actual testing "image": 
> http://www.corpit.ru/mjt/tmp/s390x-chacha.tar.gz
> 
> It contains vmlinuz and initrd - generated on a debian s390x system 
> using standard
> debian tools.
> 
> Actual command line I used when doing bisection:
> 
>   ~/qemu/b/qemu-system-s390x -append "root=/dev/vda rw" -nographic -smp 
> 2 -drive format=raw,file=vmlinuz,if=virtio -no-user-config -m 1G -kernel 
> vmlinuz -initrd initrd -snapshot

I had a quick look at the reproducer and reduced the code
area to:

void tcg_optimize(TCGContext *s)
{
     ...
         switch (opc) {
         case INDEX_op_ld_vec:
             done = fold_tcg_ld_memcopy(&ctx, op);


static bool fold_tcg_ld_memcopy(OptContext *ctx, TCGOp *op)
{
     ...
     if (src && src->base_type == type) {
         return tcg_opt_gen_mov(ctx, op, temp_arg(dst), temp_arg(src));
     }


static bool tcg_opt_gen_mov(OptContext *ctx, TCGOp *op, TCGArg dst, 
TCGArg src)
{
     ...
     switch (ctx->type) {
     case TCG_TYPE_V128:
         new_op = INDEX_op_mov_vec;


By disabling this optimization, the test succeeds.

Looking at commit 4caad79f8d ("tcg/s390x: Support 128-bit load/store")
and remembering the constraints change on PPC LQ in
https://lore.kernel.org/qemu-devel/20240102013456.131846-1-richard.henderson@linaro.org/
I wondered if LPQ constraints are correct, but I disabled
TCG_TARGET_HAS_qemu_ldst_i128 and the bug persists (so
re-enabled).

Then disabling TCG_TARGET_HAS_v64 and TCG_TARGET_HAS_v128 the bug
disappears.

The problematic chacha20 guest code could be:

Restarting code generation with smaller translation block (max 86 insns)
----------------
IN:
0x3ff80025a62:  eb67 f030 0024  stmg     %r6, %r7, 0x30(%r15)
0x3ff80025a68:  a719 ff60       lghi     %r1, -0xa0
0x3ff80025a6c:  b904 000f       lgr      %r0, %r15
0x3ff80025a70:  41f1 f000       la       %r15, 0(%r1, %r15)
0x3ff80025a74:  e300 f000 0024  stg      %r0, 0(%r15)
0x3ff80025a7a:  c070 0000 12c3  larl     %r7, -0x7ffd8000
0x3ff80025a80:  a708 000a       lhi      %r0, 0xa
0x3ff80025a84:  e789 5000 0c36  .byte    0xe7, 0x89, 0x50, 0x00, 0x0c, 0x36
0x3ff80025a8a:  e7a0 6000 0806  .byte    0xe7, 0xa0, 0x60, 0x00, 0x08, 0x06
0x3ff80025a90:  e7bf 7000 4c36  .byte    0xe7, 0xbf, 0x70, 0x00, 0x4c, 0x36
0x3ff80025a96:  e70b 0000 0456  .byte    0xe7, 0x0b, 0x00, 0x00, 0x04, 0x56
0x3ff80025a9c:  e718 0000 0456  .byte    0xe7, 0x18, 0x00, 0x00, 0x04, 0x56
0x3ff80025aa2:  e74b 0000 0456  .byte    0xe7, 0x4b, 0x00, 0x00, 0x04, 0x56
0x3ff80025aa8:  e758 0000 0456  .byte    0xe7, 0x58, 0x00, 0x00, 0x04, 0x56
0x3ff80025aae:  e78b 0000 0456  .byte    0xe7, 0x8b, 0x00, 0x00, 0x04, 0x56
0x3ff80025ab4:  e798 0000 0456  .byte    0xe7, 0x98, 0x00, 0x00, 0x04, 0x56
0x3ff80025aba:  e7cb 0000 0456  .byte    0xe7, 0xcb, 0x00, 0x00, 0x04, 0x56
0x3ff80025ac0:  e7d8 0000 0456  .byte    0xe7, 0xd8, 0x00, 0x00, 0x04, 0x56
0x3ff80025ac6:  e70b 0000 0c56  .byte    0xe7, 0x0b, 0x00, 0x00, 0x0c, 0x56
0x3ff80025acc:  e718 0000 0c56  .byte    0xe7, 0x18, 0x00, 0x00, 0x0c, 0x56
0x3ff80025ad2:  e74b 0000 0c56  .byte    0xe7, 0x4b, 0x00, 0x00, 0x0c, 0x56
0x3ff80025ad8:  e758 0000 0c56  .byte    0xe7, 0x58, 0x00, 0x00, 0x0c, 0x56
0x3ff80025ade:  e73a 0000 0456  .byte    0xe7, 0x3a, 0x00, 0x00, 0x04, 0x56
0x3ff80025ae4:  e77a c000 26f3  .byte    0xe7, 0x7a, 0xc0, 0x00, 0x26, 0xf3
0x3ff80025aea:  e7ba d000 26f3  .byte    0xe7, 0xba, 0xd0, 0x00, 0x26, 0xf3
0x3ff80025af0:  e7fa e000 26f3  .byte    0xe7, 0xfa, 0xe0, 0x00, 0x26, 0xf3
0x3ff80025af6:  e73b d000 2af3  .byte    0xe7, 0x3b, 0xd0, 0x00, 0x2a, 0xf3
0x3ff80025afc:  e77b e000 2af3  .byte    0xe7, 0x7b, 0xe0, 0x00, 0x2a, 0xf3
0x3ff80025b02:  e729 0000 0456  .byte    0xe7, 0x29, 0x00, 0x00, 0x04, 0x56
0x3ff80025b08:  e769 0000 0456  .byte    0xe7, 0x69, 0x00, 0x00, 0x04, 0x56
0x3ff80025b0e:  e7a9 0000 0456  .byte    0xe7, 0xa9, 0x00, 0x00, 0x04, 0x56
0x3ff80025b14:  e7e9 0000 0456  .byte    0xe7, 0xe9, 0x00, 0x00, 0x04, 0x56
0x3ff80025b1a:  e729 0000 0c56  .byte    0xe7, 0x29, 0x00, 0x00, 0x0c, 0x56
0x3ff80025b20:  e769 0000 0c56  .byte    0xe7, 0x69, 0x00, 0x00, 0x0c, 0x56
0x3ff80025b26:  e7c7 0000 0856  .byte    0xe7, 0xc7, 0x00, 0x00, 0x08, 0x56
0x3ff80025b2c:  e7db 0000 0856  .byte    0xe7, 0xdb, 0x00, 0x00, 0x08, 0x56
0x3ff80025b32:  e7ef 0000 0856  .byte    0xe7, 0xef, 0x00, 0x00, 0x08, 0x56
0x3ff80025b38:  e700 1000 20f3  .byte    0xe7, 0x00, 0x10, 0x00, 0x20, 0xf3
0x3ff80025b3e:  e744 5000 20f3  .byte    0xe7, 0x44, 0x50, 0x00, 0x20, 0xf3
0x3ff80025b44:  e788 9000 20f3  .byte    0xe7, 0x88, 0x90, 0x00, 0x20, 0xf3
0x3ff80025b4a:  e7cc d000 20f3  .byte    0xe7, 0xcc, 0xd0, 0x00, 0x20, 0xf3
0x3ff80025b50:  e700 1000 2ef3  .byte    0xe7, 0x00, 0x10, 0x00, 0x2e, 0xf3
0x3ff80025b56:  e744 5000 2ef3  .byte    0xe7, 0x44, 0x50, 0x00, 0x2e, 0xf3
0x3ff80025b5c:  e733 0000 006d  .byte    0xe7, 0x33, 0x00, 0x00, 0x00, 0x6d
0x3ff80025b62:  e777 4000 006d  .byte    0xe7, 0x77, 0x40, 0x00, 0x00, 0x6d
0x3ff80025b68:  e7bb 8000 006d  .byte    0xe7, 0xbb, 0x80, 0x00, 0x00, 0x6d
0x3ff80025b6e:  e7ff c000 006d  .byte    0xe7, 0xff, 0xc0, 0x00, 0x00, 0x6d
0x3ff80025b74:  e733 0000 0e6d  .byte    0xe7, 0x33, 0x00, 0x00, 0x0e, 0x6d
0x3ff80025b7a:  e777 4000 0e6d  .byte    0xe7, 0x77, 0x40, 0x00, 0x0e, 0x6d
0x3ff80025b80:  e733 0010 2033  .byte    0xe7, 0x33, 0x00, 0x10, 0x20, 0x33
0x3ff80025b86:  e777 0010 2033  .byte    0xe7, 0x77, 0x00, 0x10, 0x20, 0x33
0x3ff80025b8c:  e7bb 0010 2033  .byte    0xe7, 0xbb, 0x00, 0x10, 0x20, 0x33
0x3ff80025b92:  e7ff 0010 2033  .byte    0xe7, 0xff, 0x00, 0x10, 0x20, 0x33
0x3ff80025b98:  e733 0010 2c33  .byte    0xe7, 0x33, 0x00, 0x10, 0x2c, 0x33
0x3ff80025b9e:  e777 0010 2c33  .byte    0xe7, 0x77, 0x00, 0x10, 0x2c, 0x33
0x3ff80025ba4:  e722 3000 20f3  .byte    0xe7, 0x22, 0x30, 0x00, 0x20, 0xf3
0x3ff80025baa:  e766 7000 20f3  .byte    0xe7, 0x66, 0x70, 0x00, 0x20, 0xf3
0x3ff80025bb0:  e7aa b000 20f3  .byte    0xe7, 0xaa, 0xb0, 0x00, 0x20, 0xf3
0x3ff80025bb6:  e7ee f000 20f3  .byte    0xe7, 0xee, 0xf0, 0x00, 0x20, 0xf3
0x3ff80025bbc:  e722 3000 2ef3  .byte    0xe7, 0x22, 0x30, 0x00, 0x2e, 0xf3
0x3ff80025bc2:  e766 7000 2ef3  .byte    0xe7, 0x66, 0x70, 0x00, 0x2e, 0xf3
0x3ff80025bc8:  e711 2000 006d  .byte    0xe7, 0x11, 0x20, 0x00, 0x00, 0x6d
0x3ff80025bce:  e755 6000 006d  .byte    0xe7, 0x55, 0x60, 0x00, 0x00, 0x6d
0x3ff80025bd4:  e799 a000 006d  .byte    0xe7, 0x99, 0xa0, 0x00, 0x00, 0x6d
0x3ff80025bda:  e7dd e000 006d  .byte    0xe7, 0xdd, 0xe0, 0x00, 0x00, 0x6d
0x3ff80025be0:  e711 2000 0e6d  .byte    0xe7, 0x11, 0x20, 0x00, 0x0e, 0x6d
0x3ff80025be6:  e755 6000 0e6d  .byte    0xe7, 0x55, 0x60, 0x00, 0x0e, 0x6d
0x3ff80025bec:  e711 000c 2033  .byte    0xe7, 0x11, 0x00, 0x0c, 0x20, 0x33
0x3ff80025bf2:  e755 000c 2033  .byte    0xe7, 0x55, 0x00, 0x0c, 0x20, 0x33
0x3ff80025bf8:  e799 000c 2033  .byte    0xe7, 0x99, 0x00, 0x0c, 0x20, 0x33
0x3ff80025bfe:  e7dd 000c 2033  .byte    0xe7, 0xdd, 0x00, 0x0c, 0x20, 0x33
0x3ff80025c04:  e711 000c 2c33  .byte    0xe7, 0x11, 0x00, 0x0c, 0x2c, 0x33
0x3ff80025c0a:  e755 000c 2c33  .byte    0xe7, 0x55, 0x00, 0x0c, 0x2c, 0x33
0x3ff80025c10:  e700 1000 20f3  .byte    0xe7, 0x00, 0x10, 0x00, 0x20, 0xf3
0x3ff80025c16:  e744 5000 20f3  .byte    0xe7, 0x44, 0x50, 0x00, 0x20, 0xf3
0x3ff80025c1c:  e788 9000 20f3  .byte    0xe7, 0x88, 0x90, 0x00, 0x20, 0xf3
0x3ff80025c22:  e7cc d000 20f3  .byte    0xe7, 0xcc, 0xd0, 0x00, 0x20, 0xf3
0x3ff80025c28:  e700 1000 2ef3  .byte    0xe7, 0x00, 0x10, 0x00, 0x2e, 0xf3
0x3ff80025c2e:  e744 5000 2ef3  .byte    0xe7, 0x44, 0x50, 0x00, 0x2e, 0xf3
0x3ff80025c34:  e733 0000 006d  .byte    0xe7, 0x33, 0x00, 0x00, 0x00, 0x6d
0x3ff80025c3a:  e777 4000 006d  .byte    0xe7, 0x77, 0x40, 0x00, 0x00, 0x6d
0x3ff80025c40:  e7bb 8000 006d  .byte    0xe7, 0xbb, 0x80, 0x00, 0x00, 0x6d
0x3ff80025c46:  e7ff c000 006d  .byte    0xe7, 0xff, 0xc0, 0x00, 0x00, 0x6d
0x3ff80025c4c:  e733 0000 0e6d  .byte    0xe7, 0x33, 0x00, 0x00, 0x0e, 0x6d
0x3ff80025c52:  e777 4000 0e6d  .byte    0xe7, 0x77, 0x40, 0x00, 0x0e, 0x6d
0x3ff80025c58:  e733 0008 2033  .byte    0xe7, 0x33, 0x00, 0x08, 0x20, 0x33

Regards,

Phil.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: chacha20-s390 broken in 8.2.0 in TCG on s390x
  2024-01-03 11:53     ` Philippe Mathieu-Daudé
@ 2024-01-03 14:01       ` Philippe Mathieu-Daudé
  2024-01-03 14:37         ` Philippe Mathieu-Daudé
  0 siblings, 1 reply; 10+ messages in thread
From: Philippe Mathieu-Daudé @ 2024-01-03 14:01 UTC (permalink / raw)
  To: Michael Tokarev, Richard Henderson, QEMU Developers
  Cc: qemu-s390x, David Hildenbrand

On 3/1/24 12:53, Philippe Mathieu-Daudé wrote:
> Hi Richard,
> 
> On 3/1/24 09:54, Michael Tokarev wrote:
>> 03.01.2024 03:22, Richard Henderson wrote:
>>> On 12/22/23 01:51, Michael Tokarev wrote:
>> ...
>>>> git bisect points to this commit:
>>>>
>>>> commit ab84dc398b3b702b0c692538b947ef65dbbdf52f
>>>> Author: Richard Henderson <richard.henderson@linaro.org>
>>>> Date:   Wed Aug 23 23:04:24 2023 -0700
>>>>
>>>>      tcg/optimize: Optimize env memory operations
>>>>
>>>> So far, this seems to work on amd64 host, but fails on s390x host -
>>>> where this has been observed so far.  Maybe it also fails in some
>>>> other combinations too, I don't yet know.  Just finished bisecting
>>>> it on s390x.
>>>
>>> I haven't been able to build a reproducer for this.
>>> Have you an image or kernel you can share?
>>
>> Sure.
>>
>> Here's my actual testing "image": 
>> http://www.corpit.ru/mjt/tmp/s390x-chacha.tar.gz
>>
>> It contains vmlinuz and initrd - generated on a debian s390x system 
>> using standard
>> debian tools.
>>
>> Actual command line I used when doing bisection:
>>
>>   ~/qemu/b/qemu-system-s390x -append "root=/dev/vda rw" -nographic 
>> -smp 2 -drive format=raw,file=vmlinuz,if=virtio -no-user-config -m 1G 
>> -kernel vmlinuz -initrd initrd -snapshot
> 
> I had a quick look at the reproducer and reduced the code
> area to:
> 
> void tcg_optimize(TCGContext *s)
> {
>      ...
>          switch (opc) {
>          case INDEX_op_ld_vec:
>              done = fold_tcg_ld_memcopy(&ctx, op);
> 
> 
> static bool fold_tcg_ld_memcopy(OptContext *ctx, TCGOp *op)
> {
>      ...
>      if (src && src->base_type == type) {
>          return tcg_opt_gen_mov(ctx, op, temp_arg(dst), temp_arg(src));
>      }
> 
> 
> static bool tcg_opt_gen_mov(OptContext *ctx, TCGOp *op, TCGArg dst, 
> TCGArg src)
> {
>      ...
>      switch (ctx->type) {
>      case TCG_TYPE_V128:
>          new_op = INDEX_op_mov_vec;
> 
> 
> By disabling this optimization, the test succeeds.
> 
> Looking at commit 4caad79f8d ("tcg/s390x: Support 128-bit load/store")
> and remembering the constraints change on PPC LQ in
> https://lore.kernel.org/qemu-devel/20240102013456.131846-1-richard.henderson@linaro.org/
> I wondered if LPQ constraints are correct, but I disabled
> TCG_TARGET_HAS_qemu_ldst_i128 and the bug persists (so
> re-enabled).
> 
> Then disabling TCG_TARGET_HAS_v64 and TCG_TARGET_HAS_v128 the bug
> disappears.

Reducing a bit further, it works when disabling rotli_vec opcode
(commit 22cb37b417 "tcg/s390x: Implement vector shift operations"):

---
diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index fbee43d3b0..5f147661e8 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -2918,3 +2918,5 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType 
type, unsigned vece)
      case INDEX_op_orc_vec:
+        return 1;
      case INDEX_op_rotli_vec:
+        return TCG_TARGET_HAS_roti_vec;
      case INDEX_op_rotls_vec:
diff --git a/tcg/s390x/tcg-target.h b/tcg/s390x/tcg-target.h
index e69b0d2ddd..5c18146a40 100644
--- a/tcg/s390x/tcg-target.h
+++ b/tcg/s390x/tcg-target.h
@@ -152,3 +152,3 @@ extern uint64_t s390_facilities[3];
  #define TCG_TARGET_HAS_abs_vec        1
-#define TCG_TARGET_HAS_roti_vec       1
+#define TCG_TARGET_HAS_roti_vec       0
  #define TCG_TARGET_HAS_rots_vec       1
---


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: chacha20-s390 broken in 8.2.0 in TCG on s390x
  2024-01-03 14:01       ` Philippe Mathieu-Daudé
@ 2024-01-03 14:37         ` Philippe Mathieu-Daudé
  2024-01-03 22:51           ` Richard Henderson
  0 siblings, 1 reply; 10+ messages in thread
From: Philippe Mathieu-Daudé @ 2024-01-03 14:37 UTC (permalink / raw)
  To: Michael Tokarev, Richard Henderson, QEMU Developers
  Cc: qemu-s390x, David Hildenbrand

On 3/1/24 15:01, Philippe Mathieu-Daudé wrote:
> On 3/1/24 12:53, Philippe Mathieu-Daudé wrote:
>> Hi Richard,
>>
>> On 3/1/24 09:54, Michael Tokarev wrote:
>>> 03.01.2024 03:22, Richard Henderson wrote:
>>>> On 12/22/23 01:51, Michael Tokarev wrote:
>>> ...
>>>>> git bisect points to this commit:
>>>>>
>>>>> commit ab84dc398b3b702b0c692538b947ef65dbbdf52f
>>>>> Author: Richard Henderson <richard.henderson@linaro.org>
>>>>> Date:   Wed Aug 23 23:04:24 2023 -0700
>>>>>
>>>>>      tcg/optimize: Optimize env memory operations
>>>>>
>>>>> So far, this seems to work on amd64 host, but fails on s390x host -
>>>>> where this has been observed so far.  Maybe it also fails in some
>>>>> other combinations too, I don't yet know.  Just finished bisecting
>>>>> it on s390x.
>>>>
>>>> I haven't been able to build a reproducer for this.
>>>> Have you an image or kernel you can share?
>>>
>>> Sure.
>>>
>>> Here's my actual testing "image": 
>>> http://www.corpit.ru/mjt/tmp/s390x-chacha.tar.gz
>>>
>>> It contains vmlinuz and initrd - generated on a debian s390x system 
>>> using standard
>>> debian tools.
>>>
>>> Actual command line I used when doing bisection:
>>>
>>>   ~/qemu/b/qemu-system-s390x -append "root=/dev/vda rw" -nographic 
>>> -smp 2 -drive format=raw,file=vmlinuz,if=virtio -no-user-config -m 1G 
>>> -kernel vmlinuz -initrd initrd -snapshot


> Reducing a bit further, it works when disabling rotli_vec opcode
> (commit 22cb37b417 "tcg/s390x: Implement vector shift operations"):
> 
> ---
> diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
> index fbee43d3b0..5f147661e8 100644
> --- a/tcg/s390x/tcg-target.c.inc
> +++ b/tcg/s390x/tcg-target.c.inc
> @@ -2918,3 +2918,5 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType 
> type, unsigned vece)
>       case INDEX_op_orc_vec:
> +        return 1;
>       case INDEX_op_rotli_vec:
> +        return TCG_TARGET_HAS_roti_vec;
>       case INDEX_op_rotls_vec:
> diff --git a/tcg/s390x/tcg-target.h b/tcg/s390x/tcg-target.h
> index e69b0d2ddd..5c18146a40 100644
> --- a/tcg/s390x/tcg-target.h
> +++ b/tcg/s390x/tcg-target.h
> @@ -152,3 +152,3 @@ extern uint64_t s390_facilities[3];
>   #define TCG_TARGET_HAS_abs_vec        1
> -#define TCG_TARGET_HAS_roti_vec       1
> +#define TCG_TARGET_HAS_roti_vec       0
>   #define TCG_TARGET_HAS_rots_vec       1
> ---

Finally changing the constraints on op_rotli_vec seems to fix it:

---
diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index fbee43d3b0..b3456fe857 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -3264,13 +3264,13 @@ static TCGConstraintSetIndex 
tcg_target_op_def(TCGOpcode op)
      case INDEX_op_ld_vec:
      case INDEX_op_dupm_vec:
+    case INDEX_op_rotli_vec:
          return C_O1_I1(v, r);
      case INDEX_op_dup_vec:
          return C_O1_I1(v, vr);
      case INDEX_op_abs_vec:
      case INDEX_op_neg_vec:
      case INDEX_op_not_vec:
-    case INDEX_op_rotli_vec:
      case INDEX_op_sari_vec:
      case INDEX_op_shli_vec:
      case INDEX_op_shri_vec:
      case INDEX_op_s390_vuph_vec:
      case INDEX_op_s390_vupl_vec:
          return C_O1_I1(v, v);
---

But I'm outside of my comfort zone so not really sure what I'm doing...
(I was inspired by the "the instruction verll only allows immediates up
to 32 bits." comment from
https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg317099.html)


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: chacha20-s390 broken in 8.2.0 in TCG on s390x
  2024-01-03 14:37         ` Philippe Mathieu-Daudé
@ 2024-01-03 22:51           ` Richard Henderson
  2024-01-17 11:53             ` Michael Tokarev
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Henderson @ 2024-01-03 22:51 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, Michael Tokarev, QEMU Developers
  Cc: qemu-s390x, David Hildenbrand

On 1/4/24 01:37, Philippe Mathieu-Daudé wrote:
> Finally changing the constraints on op_rotli_vec seems to fix it:
> 
> ---
> diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
> index fbee43d3b0..b3456fe857 100644
> --- a/tcg/s390x/tcg-target.c.inc
> +++ b/tcg/s390x/tcg-target.c.inc
> @@ -3264,13 +3264,13 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
>       case INDEX_op_ld_vec:
>       case INDEX_op_dupm_vec:
> +    case INDEX_op_rotli_vec:
>           return C_O1_I1(v, r);
>       case INDEX_op_dup_vec:
>           return C_O1_I1(v, vr);
>       case INDEX_op_abs_vec:
>       case INDEX_op_neg_vec:
>       case INDEX_op_not_vec:
> -    case INDEX_op_rotli_vec:
>       case INDEX_op_sari_vec:
>       case INDEX_op_shli_vec:
>       case INDEX_op_shri_vec:
>       case INDEX_op_s390_vuph_vec:
>       case INDEX_op_s390_vupl_vec:
>           return C_O1_I1(v, v);

Definitely not correct, since VERLL requires a vector input to be rotated.

> But I'm outside of my comfort zone so not really sure what I'm doing...
> (I was inspired by the "the instruction verll only allows immediates up
> to 32 bits." comment from
> https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg317099.html)

That message is confused.  The immediate in VERLL is 12 bits (with only 6 bits ever used 
for MO_64).  Dunno where "32 bits" comes from.


r~


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: chacha20-s390 broken in 8.2.0 in TCG on s390x
  2024-01-03 22:51           ` Richard Henderson
@ 2024-01-17 11:53             ` Michael Tokarev
  2024-01-17 13:46               ` Alex Bennée
  2024-01-17 14:00               ` Philippe Mathieu-Daudé
  0 siblings, 2 replies; 10+ messages in thread
From: Michael Tokarev @ 2024-01-17 11:53 UTC (permalink / raw)
  To: Richard Henderson, Philippe Mathieu-Daudé, QEMU Developers
  Cc: qemu-s390x, David Hildenbrand

04.01.2024 01:51, Richard Henderson :
> On 1/4/24 01:37, Philippe Mathieu-Daudé wrote:
>> Finally changing the constraints on op_rotli_vec seems to fix it:
>>
>> ---
>> diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
>> index fbee43d3b0..b3456fe857 100644
>> --- a/tcg/s390x/tcg-target.c.inc
>> +++ b/tcg/s390x/tcg-target.c.inc
>> @@ -3264,13 +3264,13 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
>>       case INDEX_op_ld_vec:
>>       case INDEX_op_dupm_vec:
>> +    case INDEX_op_rotli_vec:
>>           return C_O1_I1(v, r);
>>       case INDEX_op_dup_vec:
>>           return C_O1_I1(v, vr);
>>       case INDEX_op_abs_vec:
>>       case INDEX_op_neg_vec:
>>       case INDEX_op_not_vec:
>> -    case INDEX_op_rotli_vec:
>>       case INDEX_op_sari_vec:
>>       case INDEX_op_shli_vec:
>>       case INDEX_op_shri_vec:
>>       case INDEX_op_s390_vuph_vec:
>>       case INDEX_op_s390_vupl_vec:
>>           return C_O1_I1(v, v);
> 
> Definitely not correct, since VERLL requires a vector input to be rotated.
> 
>> But I'm outside of my comfort zone so not really sure what I'm doing...
>> (I was inspired by the "the instruction verll only allows immediates up
>> to 32 bits." comment from
>> https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg317099.html)
> 
> That message is confused.  The immediate in VERLL is 12 bits (with only 6 bits ever used for MO_64).  Dunno where "32 bits" comes from.

So, what do we have here in the end?
Should we fix this on qemu side?

This thread stopped quite some time ago, with problematic
instruction found but no solution..

Thanks,

/mjt



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: chacha20-s390 broken in 8.2.0 in TCG on s390x
  2024-01-17 11:53             ` Michael Tokarev
@ 2024-01-17 13:46               ` Alex Bennée
  2024-01-17 14:00               ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 10+ messages in thread
From: Alex Bennée @ 2024-01-17 13:46 UTC (permalink / raw)
  To: Michael Tokarev
  Cc: Richard Henderson, Philippe Mathieu-Daudé, QEMU Developers,
	qemu-s390x, David Hildenbrand

Michael Tokarev <mjt@tls.msk.ru> writes:

> 04.01.2024 01:51, Richard Henderson :
>> On 1/4/24 01:37, Philippe Mathieu-Daudé wrote:
>>> Finally changing the constraints on op_rotli_vec seems to fix it:
>>>
>>> ---
>>> diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
>>> index fbee43d3b0..b3456fe857 100644
>>> --- a/tcg/s390x/tcg-target.c.inc
>>> +++ b/tcg/s390x/tcg-target.c.inc
>>> @@ -3264,13 +3264,13 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
>>>       case INDEX_op_ld_vec:
>>>       case INDEX_op_dupm_vec:
>>> +    case INDEX_op_rotli_vec:
>>>           return C_O1_I1(v, r);
>>>       case INDEX_op_dup_vec:
>>>           return C_O1_I1(v, vr);
>>>       case INDEX_op_abs_vec:
>>>       case INDEX_op_neg_vec:
>>>       case INDEX_op_not_vec:
>>> -    case INDEX_op_rotli_vec:
>>>       case INDEX_op_sari_vec:
>>>       case INDEX_op_shli_vec:
>>>       case INDEX_op_shri_vec:
>>>       case INDEX_op_s390_vuph_vec:
>>>       case INDEX_op_s390_vupl_vec:
>>>           return C_O1_I1(v, v);
>> Definitely not correct, since VERLL requires a vector input to be
>> rotated.
>> 
>>> But I'm outside of my comfort zone so not really sure what I'm doing...
>>> (I was inspired by the "the instruction verll only allows immediates up
>>> to 32 bits." comment from
>>> https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg317099.html)
>> That message is confused.  The immediate in VERLL is 12 bits (with
>> only 6 bits ever used for MO_64).  Dunno where "32 bits" comes from.
>
> So, what do we have here in the end?
> Should we fix this on qemu side?

I think the thinking is we should disable the optimisation for the 8.2
stable while figuring out the true fix for 9.0.

>
> This thread stopped quite some time ago, with problematic
> instruction found but no solution..
>
> Thanks,
>
> /mjt

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: chacha20-s390 broken in 8.2.0 in TCG on s390x
  2024-01-17 11:53             ` Michael Tokarev
  2024-01-17 13:46               ` Alex Bennée
@ 2024-01-17 14:00               ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 10+ messages in thread
From: Philippe Mathieu-Daudé @ 2024-01-17 14:00 UTC (permalink / raw)
  To: Michael Tokarev, Richard Henderson, QEMU Developers
  Cc: qemu-s390x, David Hildenbrand

On 17/1/24 12:53, Michael Tokarev wrote:
> 04.01.2024 01:51, Richard Henderson :
>> On 1/4/24 01:37, Philippe Mathieu-Daudé wrote:
>>> Finally changing the constraints on op_rotli_vec seems to fix it:
>>>
>>> ---
>>> diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
>>> index fbee43d3b0..b3456fe857 100644
>>> --- a/tcg/s390x/tcg-target.c.inc
>>> +++ b/tcg/s390x/tcg-target.c.inc
>>> @@ -3264,13 +3264,13 @@ static TCGConstraintSetIndex 
>>> tcg_target_op_def(TCGOpcode op)
>>>       case INDEX_op_ld_vec:
>>>       case INDEX_op_dupm_vec:
>>> +    case INDEX_op_rotli_vec:
>>>           return C_O1_I1(v, r);
>>>       case INDEX_op_dup_vec:
>>>           return C_O1_I1(v, vr);
>>>       case INDEX_op_abs_vec:
>>>       case INDEX_op_neg_vec:
>>>       case INDEX_op_not_vec:
>>> -    case INDEX_op_rotli_vec:
>>>       case INDEX_op_sari_vec:
>>>       case INDEX_op_shli_vec:
>>>       case INDEX_op_shri_vec:
>>>       case INDEX_op_s390_vuph_vec:
>>>       case INDEX_op_s390_vupl_vec:
>>>           return C_O1_I1(v, v);
>>
>> Definitely not correct, since VERLL requires a vector input to be 
>> rotated.
>>
>>> But I'm outside of my comfort zone so not really sure what I'm doing...
>>> (I was inspired by the "the instruction verll only allows immediates up
>>> to 32 bits." comment from
>>> https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg317099.html)
>>
>> That message is confused.  The immediate in VERLL is 12 bits (with 
>> only 6 bits ever used for MO_64).  Dunno where "32 bits" comes from.
> 
> So, what do we have here in the end?
> Should we fix this on qemu side?

Yes.

> This thread stopped quite some time ago, with problematic
> instruction found but no solution..

Be assured we are spending (too?) many hours on this...


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-01-17 14:00 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-21 14:51 chacha20-s390 broken in 8.2.0 in TCG on s390x Michael Tokarev
2024-01-03  0:22 ` Richard Henderson
2024-01-03  8:54   ` Michael Tokarev
2024-01-03 11:53     ` Philippe Mathieu-Daudé
2024-01-03 14:01       ` Philippe Mathieu-Daudé
2024-01-03 14:37         ` Philippe Mathieu-Daudé
2024-01-03 22:51           ` Richard Henderson
2024-01-17 11:53             ` Michael Tokarev
2024-01-17 13:46               ` Alex Bennée
2024-01-17 14:00               ` Philippe Mathieu-Daudé

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).