* chacha20-s390 broken in 8.2.0 in TCG on s390x @ 2023-12-21 14:51 Michael Tokarev 2024-01-03 0:22 ` Richard Henderson 0 siblings, 1 reply; 10+ messages in thread From: Michael Tokarev @ 2023-12-21 14:51 UTC (permalink / raw) To: QEMU Developers; +Cc: Richard Henderson When running current kernel on s390x in tcg mode *on s390x hw*, the following is generated when loading crypto selftest module (it gets loaded automatically): [ 10.546690] alg: skcipher: chacha20-s390 encryption test failed (wrong result) on test vector 1, cfg="in-place (one sglist)" [ 10.546914] alg: self-tests for chacha20 using chacha20-s390 failed (rc=-22) [ 10.546969] ------------[ cut here ]------------ [ 10.546998] alg: self-tests for chacha20 using chacha20-s390 failed (rc=-22) [ 10.547182] WARNING: CPU: 1 PID: 109 at crypto/testmgr.c:5936 alg_test+0x55a/0x5b8 [ 10.547510] Modules linked in: net_failover chacha_s390(+) libchacha virtio_blk(+) failover [ 10.547854] CPU: 1 PID: 109 Comm: cryptomgr_test Not tainted 6.5.0-5-s390x #1 Debian 6.5.13-1 [ 10.548002] Hardware name: QEMU 8561 QEMU (KVM/Linux) [ 10.548101] Krnl PSW : 0704c00180000000 00000000005df8fe (alg_test+0x55e/0x5b8) [ 10.548207] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 [ 10.548291] Krnl GPRS: 0000000000000000 0000000001286408 00000000005df8fa 0000000001286408 [ 10.548337] 000000000014bf14 00000000001c6ba8 0000000001838b3c 0000000000000005 [ 10.548475] 00000000025a4880 00000000025a4800 ffffffffffffffea 00000000ffffffea [ 10.548521] 000000003e649200 00000000ffffffff 00000000005df8fa 000003800016bcf8 [ 10.549504] Krnl Code: 00000000005df8ee: c020003b5828 larl %r2,0000000000d4a93e [ 10.549504] 00000000005df8f4: c0e5ffdb62d2 brasl %r14,000000000014be98 [ 10.549504] #00000000005df8fa: af000000 mc 0,0 [ 10.549504] >00000000005df8fe: a7f4fee6 brc 15,00000000005df6ca [ 10.549504] 00000000005df902: b9040042 lgr %r4,%r2 [ 10.549504] 00000000005df906: b9040039 lgr %r3,%r9 [ 10.549504] 00000000005df90a: c020003b57df larl %r2,0000000000d4a8c8 [ 10.549504] 00000000005df910: 18bd lr %r11,%r13 [ 10.550004] Call Trace: [ 10.550375] [<00000000005df8fe>] alg_test+0x55e/0x5b8 [ 10.550467] ([<00000000005df8fa>] alg_test+0x55a/0x5b8) [ 10.550489] [<00000000005d9fbc>] cryptomgr_test+0x34/0x60 [ 10.550514] [<000000000017d004>] kthread+0x124/0x130 [ 10.550539] [<0000000000103124>] __ret_from_fork+0x3c/0x50 [ 10.550562] [<0000000000b1dfca>] ret_from_fork+0xa/0x30 [ 10.550611] Last Breaking-Event-Address: [ 10.550626] [<000000000014bf20>] __warn_printk+0x88/0x110 [ 10.550723] ---[ end trace 0000000000000000 ]--- git bisect points to this commit: commit ab84dc398b3b702b0c692538b947ef65dbbdf52f Author: Richard Henderson <richard.henderson@linaro.org> Date: Wed Aug 23 23:04:24 2023 -0700 tcg/optimize: Optimize env memory operations So far, this seems to work on amd64 host, but fails on s390x host - where this has been observed so far. Maybe it also fails in some other combinations too, I don't yet know. Just finished bisecting it on s390x. FWIW, /mjt ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: chacha20-s390 broken in 8.2.0 in TCG on s390x 2023-12-21 14:51 chacha20-s390 broken in 8.2.0 in TCG on s390x Michael Tokarev @ 2024-01-03 0:22 ` Richard Henderson 2024-01-03 8:54 ` Michael Tokarev 0 siblings, 1 reply; 10+ messages in thread From: Richard Henderson @ 2024-01-03 0:22 UTC (permalink / raw) To: Michael Tokarev, QEMU Developers On 12/22/23 01:51, Michael Tokarev wrote: > When running current kernel on s390x in tcg mode *on s390x hw*, the following > is generated when loading crypto selftest module (it gets loaded automatically): > > [ 10.546690] alg: skcipher: chacha20-s390 encryption test failed (wrong result) on test > vector 1, cfg="in-place (one sglist)" > [ 10.546914] alg: self-tests for chacha20 using chacha20-s390 failed (rc=-22) > [ 10.546969] ------------[ cut here ]------------ > [ 10.546998] alg: self-tests for chacha20 using chacha20-s390 failed (rc=-22) > [ 10.547182] WARNING: CPU: 1 PID: 109 at crypto/testmgr.c:5936 alg_test+0x55a/0x5b8 > [ 10.547510] Modules linked in: net_failover chacha_s390(+) libchacha virtio_blk(+) > failover > [ 10.547854] CPU: 1 PID: 109 Comm: cryptomgr_test Not tainted 6.5.0-5-s390x #1 Debian > 6.5.13-1 > [ 10.548002] Hardware name: QEMU 8561 QEMU (KVM/Linux) > [ 10.548101] Krnl PSW : 0704c00180000000 00000000005df8fe (alg_test+0x55e/0x5b8) > [ 10.548207] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 > [ 10.548291] Krnl GPRS: 0000000000000000 0000000001286408 00000000005df8fa 0000000001286408 > [ 10.548337] 000000000014bf14 00000000001c6ba8 0000000001838b3c 0000000000000005 > [ 10.548475] 00000000025a4880 00000000025a4800 ffffffffffffffea 00000000ffffffea > [ 10.548521] 000000003e649200 00000000ffffffff 00000000005df8fa 000003800016bcf8 > [ 10.549504] Krnl Code: 00000000005df8ee: c020003b5828 larl %r2,0000000000d4a93e > [ 10.549504] 00000000005df8f4: c0e5ffdb62d2 brasl %r14,000000000014be98 > [ 10.549504] #00000000005df8fa: af000000 mc 0,0 > [ 10.549504] >00000000005df8fe: a7f4fee6 brc 15,00000000005df6ca > [ 10.549504] 00000000005df902: b9040042 lgr %r4,%r2 > [ 10.549504] 00000000005df906: b9040039 lgr %r3,%r9 > [ 10.549504] 00000000005df90a: c020003b57df larl %r2,0000000000d4a8c8 > [ 10.549504] 00000000005df910: 18bd lr %r11,%r13 > [ 10.550004] Call Trace: > [ 10.550375] [<00000000005df8fe>] alg_test+0x55e/0x5b8 > [ 10.550467] ([<00000000005df8fa>] alg_test+0x55a/0x5b8) > [ 10.550489] [<00000000005d9fbc>] cryptomgr_test+0x34/0x60 > [ 10.550514] [<000000000017d004>] kthread+0x124/0x130 > [ 10.550539] [<0000000000103124>] __ret_from_fork+0x3c/0x50 > [ 10.550562] [<0000000000b1dfca>] ret_from_fork+0xa/0x30 > [ 10.550611] Last Breaking-Event-Address: > [ 10.550626] [<000000000014bf20>] __warn_printk+0x88/0x110 > [ 10.550723] ---[ end trace 0000000000000000 ]--- > > git bisect points to this commit: > > commit ab84dc398b3b702b0c692538b947ef65dbbdf52f > Author: Richard Henderson <richard.henderson@linaro.org> > Date: Wed Aug 23 23:04:24 2023 -0700 > > tcg/optimize: Optimize env memory operations > > So far, this seems to work on amd64 host, but fails on s390x host - > where this has been observed so far. Maybe it also fails in some > other combinations too, I don't yet know. Just finished bisecting > it on s390x. I haven't been able to build a reproducer for this. Have you an image or kernel you can share? r~ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: chacha20-s390 broken in 8.2.0 in TCG on s390x 2024-01-03 0:22 ` Richard Henderson @ 2024-01-03 8:54 ` Michael Tokarev 2024-01-03 11:53 ` Philippe Mathieu-Daudé 0 siblings, 1 reply; 10+ messages in thread From: Michael Tokarev @ 2024-01-03 8:54 UTC (permalink / raw) To: Richard Henderson, QEMU Developers 03.01.2024 03:22, Richard Henderson wrote: > On 12/22/23 01:51, Michael Tokarev wrote: ... >> git bisect points to this commit: >> >> commit ab84dc398b3b702b0c692538b947ef65dbbdf52f >> Author: Richard Henderson <richard.henderson@linaro.org> >> Date: Wed Aug 23 23:04:24 2023 -0700 >> >> tcg/optimize: Optimize env memory operations >> >> So far, this seems to work on amd64 host, but fails on s390x host - >> where this has been observed so far. Maybe it also fails in some >> other combinations too, I don't yet know. Just finished bisecting >> it on s390x. > > I haven't been able to build a reproducer for this. > Have you an image or kernel you can share? Sure. Here's my actual testing "image": http://www.corpit.ru/mjt/tmp/s390x-chacha.tar.gz It contains vmlinuz and initrd - generated on a debian s390x system using standard debian tools. Actual command line I used when doing bisection: ~/qemu/b/qemu-system-s390x -append "root=/dev/vda rw" -nographic -smp 2 -drive format=raw,file=vmlinuz,if=virtio -no-user-config -m 1G -kernel vmlinuz -initrd initrd -snapshot This command has unrelated stuff, one of which is using of vmlinuz as the hdd image (in my initial test it was real filesystem image, but it doesn't really matter), - I don't need this filesystem to be mounted, the prob is visible before the mount when crypto modules are loaded. All it needs is to load crypto stuff, - in particular it runs some selftests at this point. But please note once again: it works just fine on amd64 hw. Where it breaks is the actual s390x *host*, - I did all my tests on a debian s390x porterbox, an actual s390x machine. Thanks, /mjt ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: chacha20-s390 broken in 8.2.0 in TCG on s390x 2024-01-03 8:54 ` Michael Tokarev @ 2024-01-03 11:53 ` Philippe Mathieu-Daudé 2024-01-03 14:01 ` Philippe Mathieu-Daudé 0 siblings, 1 reply; 10+ messages in thread From: Philippe Mathieu-Daudé @ 2024-01-03 11:53 UTC (permalink / raw) To: Michael Tokarev, Richard Henderson, QEMU Developers; +Cc: qemu-s390x Hi Richard, On 3/1/24 09:54, Michael Tokarev wrote: > 03.01.2024 03:22, Richard Henderson wrote: >> On 12/22/23 01:51, Michael Tokarev wrote: > ... >>> git bisect points to this commit: >>> >>> commit ab84dc398b3b702b0c692538b947ef65dbbdf52f >>> Author: Richard Henderson <richard.henderson@linaro.org> >>> Date: Wed Aug 23 23:04:24 2023 -0700 >>> >>> tcg/optimize: Optimize env memory operations >>> >>> So far, this seems to work on amd64 host, but fails on s390x host - >>> where this has been observed so far. Maybe it also fails in some >>> other combinations too, I don't yet know. Just finished bisecting >>> it on s390x. >> >> I haven't been able to build a reproducer for this. >> Have you an image or kernel you can share? > > Sure. > > Here's my actual testing "image": > http://www.corpit.ru/mjt/tmp/s390x-chacha.tar.gz > > It contains vmlinuz and initrd - generated on a debian s390x system > using standard > debian tools. > > Actual command line I used when doing bisection: > > ~/qemu/b/qemu-system-s390x -append "root=/dev/vda rw" -nographic -smp > 2 -drive format=raw,file=vmlinuz,if=virtio -no-user-config -m 1G -kernel > vmlinuz -initrd initrd -snapshot I had a quick look at the reproducer and reduced the code area to: void tcg_optimize(TCGContext *s) { ... switch (opc) { case INDEX_op_ld_vec: done = fold_tcg_ld_memcopy(&ctx, op); static bool fold_tcg_ld_memcopy(OptContext *ctx, TCGOp *op) { ... if (src && src->base_type == type) { return tcg_opt_gen_mov(ctx, op, temp_arg(dst), temp_arg(src)); } static bool tcg_opt_gen_mov(OptContext *ctx, TCGOp *op, TCGArg dst, TCGArg src) { ... switch (ctx->type) { case TCG_TYPE_V128: new_op = INDEX_op_mov_vec; By disabling this optimization, the test succeeds. Looking at commit 4caad79f8d ("tcg/s390x: Support 128-bit load/store") and remembering the constraints change on PPC LQ in https://lore.kernel.org/qemu-devel/20240102013456.131846-1-richard.henderson@linaro.org/ I wondered if LPQ constraints are correct, but I disabled TCG_TARGET_HAS_qemu_ldst_i128 and the bug persists (so re-enabled). Then disabling TCG_TARGET_HAS_v64 and TCG_TARGET_HAS_v128 the bug disappears. The problematic chacha20 guest code could be: Restarting code generation with smaller translation block (max 86 insns) ---------------- IN: 0x3ff80025a62: eb67 f030 0024 stmg %r6, %r7, 0x30(%r15) 0x3ff80025a68: a719 ff60 lghi %r1, -0xa0 0x3ff80025a6c: b904 000f lgr %r0, %r15 0x3ff80025a70: 41f1 f000 la %r15, 0(%r1, %r15) 0x3ff80025a74: e300 f000 0024 stg %r0, 0(%r15) 0x3ff80025a7a: c070 0000 12c3 larl %r7, -0x7ffd8000 0x3ff80025a80: a708 000a lhi %r0, 0xa 0x3ff80025a84: e789 5000 0c36 .byte 0xe7, 0x89, 0x50, 0x00, 0x0c, 0x36 0x3ff80025a8a: e7a0 6000 0806 .byte 0xe7, 0xa0, 0x60, 0x00, 0x08, 0x06 0x3ff80025a90: e7bf 7000 4c36 .byte 0xe7, 0xbf, 0x70, 0x00, 0x4c, 0x36 0x3ff80025a96: e70b 0000 0456 .byte 0xe7, 0x0b, 0x00, 0x00, 0x04, 0x56 0x3ff80025a9c: e718 0000 0456 .byte 0xe7, 0x18, 0x00, 0x00, 0x04, 0x56 0x3ff80025aa2: e74b 0000 0456 .byte 0xe7, 0x4b, 0x00, 0x00, 0x04, 0x56 0x3ff80025aa8: e758 0000 0456 .byte 0xe7, 0x58, 0x00, 0x00, 0x04, 0x56 0x3ff80025aae: e78b 0000 0456 .byte 0xe7, 0x8b, 0x00, 0x00, 0x04, 0x56 0x3ff80025ab4: e798 0000 0456 .byte 0xe7, 0x98, 0x00, 0x00, 0x04, 0x56 0x3ff80025aba: e7cb 0000 0456 .byte 0xe7, 0xcb, 0x00, 0x00, 0x04, 0x56 0x3ff80025ac0: e7d8 0000 0456 .byte 0xe7, 0xd8, 0x00, 0x00, 0x04, 0x56 0x3ff80025ac6: e70b 0000 0c56 .byte 0xe7, 0x0b, 0x00, 0x00, 0x0c, 0x56 0x3ff80025acc: e718 0000 0c56 .byte 0xe7, 0x18, 0x00, 0x00, 0x0c, 0x56 0x3ff80025ad2: e74b 0000 0c56 .byte 0xe7, 0x4b, 0x00, 0x00, 0x0c, 0x56 0x3ff80025ad8: e758 0000 0c56 .byte 0xe7, 0x58, 0x00, 0x00, 0x0c, 0x56 0x3ff80025ade: e73a 0000 0456 .byte 0xe7, 0x3a, 0x00, 0x00, 0x04, 0x56 0x3ff80025ae4: e77a c000 26f3 .byte 0xe7, 0x7a, 0xc0, 0x00, 0x26, 0xf3 0x3ff80025aea: e7ba d000 26f3 .byte 0xe7, 0xba, 0xd0, 0x00, 0x26, 0xf3 0x3ff80025af0: e7fa e000 26f3 .byte 0xe7, 0xfa, 0xe0, 0x00, 0x26, 0xf3 0x3ff80025af6: e73b d000 2af3 .byte 0xe7, 0x3b, 0xd0, 0x00, 0x2a, 0xf3 0x3ff80025afc: e77b e000 2af3 .byte 0xe7, 0x7b, 0xe0, 0x00, 0x2a, 0xf3 0x3ff80025b02: e729 0000 0456 .byte 0xe7, 0x29, 0x00, 0x00, 0x04, 0x56 0x3ff80025b08: e769 0000 0456 .byte 0xe7, 0x69, 0x00, 0x00, 0x04, 0x56 0x3ff80025b0e: e7a9 0000 0456 .byte 0xe7, 0xa9, 0x00, 0x00, 0x04, 0x56 0x3ff80025b14: e7e9 0000 0456 .byte 0xe7, 0xe9, 0x00, 0x00, 0x04, 0x56 0x3ff80025b1a: e729 0000 0c56 .byte 0xe7, 0x29, 0x00, 0x00, 0x0c, 0x56 0x3ff80025b20: e769 0000 0c56 .byte 0xe7, 0x69, 0x00, 0x00, 0x0c, 0x56 0x3ff80025b26: e7c7 0000 0856 .byte 0xe7, 0xc7, 0x00, 0x00, 0x08, 0x56 0x3ff80025b2c: e7db 0000 0856 .byte 0xe7, 0xdb, 0x00, 0x00, 0x08, 0x56 0x3ff80025b32: e7ef 0000 0856 .byte 0xe7, 0xef, 0x00, 0x00, 0x08, 0x56 0x3ff80025b38: e700 1000 20f3 .byte 0xe7, 0x00, 0x10, 0x00, 0x20, 0xf3 0x3ff80025b3e: e744 5000 20f3 .byte 0xe7, 0x44, 0x50, 0x00, 0x20, 0xf3 0x3ff80025b44: e788 9000 20f3 .byte 0xe7, 0x88, 0x90, 0x00, 0x20, 0xf3 0x3ff80025b4a: e7cc d000 20f3 .byte 0xe7, 0xcc, 0xd0, 0x00, 0x20, 0xf3 0x3ff80025b50: e700 1000 2ef3 .byte 0xe7, 0x00, 0x10, 0x00, 0x2e, 0xf3 0x3ff80025b56: e744 5000 2ef3 .byte 0xe7, 0x44, 0x50, 0x00, 0x2e, 0xf3 0x3ff80025b5c: e733 0000 006d .byte 0xe7, 0x33, 0x00, 0x00, 0x00, 0x6d 0x3ff80025b62: e777 4000 006d .byte 0xe7, 0x77, 0x40, 0x00, 0x00, 0x6d 0x3ff80025b68: e7bb 8000 006d .byte 0xe7, 0xbb, 0x80, 0x00, 0x00, 0x6d 0x3ff80025b6e: e7ff c000 006d .byte 0xe7, 0xff, 0xc0, 0x00, 0x00, 0x6d 0x3ff80025b74: e733 0000 0e6d .byte 0xe7, 0x33, 0x00, 0x00, 0x0e, 0x6d 0x3ff80025b7a: e777 4000 0e6d .byte 0xe7, 0x77, 0x40, 0x00, 0x0e, 0x6d 0x3ff80025b80: e733 0010 2033 .byte 0xe7, 0x33, 0x00, 0x10, 0x20, 0x33 0x3ff80025b86: e777 0010 2033 .byte 0xe7, 0x77, 0x00, 0x10, 0x20, 0x33 0x3ff80025b8c: e7bb 0010 2033 .byte 0xe7, 0xbb, 0x00, 0x10, 0x20, 0x33 0x3ff80025b92: e7ff 0010 2033 .byte 0xe7, 0xff, 0x00, 0x10, 0x20, 0x33 0x3ff80025b98: e733 0010 2c33 .byte 0xe7, 0x33, 0x00, 0x10, 0x2c, 0x33 0x3ff80025b9e: e777 0010 2c33 .byte 0xe7, 0x77, 0x00, 0x10, 0x2c, 0x33 0x3ff80025ba4: e722 3000 20f3 .byte 0xe7, 0x22, 0x30, 0x00, 0x20, 0xf3 0x3ff80025baa: e766 7000 20f3 .byte 0xe7, 0x66, 0x70, 0x00, 0x20, 0xf3 0x3ff80025bb0: e7aa b000 20f3 .byte 0xe7, 0xaa, 0xb0, 0x00, 0x20, 0xf3 0x3ff80025bb6: e7ee f000 20f3 .byte 0xe7, 0xee, 0xf0, 0x00, 0x20, 0xf3 0x3ff80025bbc: e722 3000 2ef3 .byte 0xe7, 0x22, 0x30, 0x00, 0x2e, 0xf3 0x3ff80025bc2: e766 7000 2ef3 .byte 0xe7, 0x66, 0x70, 0x00, 0x2e, 0xf3 0x3ff80025bc8: e711 2000 006d .byte 0xe7, 0x11, 0x20, 0x00, 0x00, 0x6d 0x3ff80025bce: e755 6000 006d .byte 0xe7, 0x55, 0x60, 0x00, 0x00, 0x6d 0x3ff80025bd4: e799 a000 006d .byte 0xe7, 0x99, 0xa0, 0x00, 0x00, 0x6d 0x3ff80025bda: e7dd e000 006d .byte 0xe7, 0xdd, 0xe0, 0x00, 0x00, 0x6d 0x3ff80025be0: e711 2000 0e6d .byte 0xe7, 0x11, 0x20, 0x00, 0x0e, 0x6d 0x3ff80025be6: e755 6000 0e6d .byte 0xe7, 0x55, 0x60, 0x00, 0x0e, 0x6d 0x3ff80025bec: e711 000c 2033 .byte 0xe7, 0x11, 0x00, 0x0c, 0x20, 0x33 0x3ff80025bf2: e755 000c 2033 .byte 0xe7, 0x55, 0x00, 0x0c, 0x20, 0x33 0x3ff80025bf8: e799 000c 2033 .byte 0xe7, 0x99, 0x00, 0x0c, 0x20, 0x33 0x3ff80025bfe: e7dd 000c 2033 .byte 0xe7, 0xdd, 0x00, 0x0c, 0x20, 0x33 0x3ff80025c04: e711 000c 2c33 .byte 0xe7, 0x11, 0x00, 0x0c, 0x2c, 0x33 0x3ff80025c0a: e755 000c 2c33 .byte 0xe7, 0x55, 0x00, 0x0c, 0x2c, 0x33 0x3ff80025c10: e700 1000 20f3 .byte 0xe7, 0x00, 0x10, 0x00, 0x20, 0xf3 0x3ff80025c16: e744 5000 20f3 .byte 0xe7, 0x44, 0x50, 0x00, 0x20, 0xf3 0x3ff80025c1c: e788 9000 20f3 .byte 0xe7, 0x88, 0x90, 0x00, 0x20, 0xf3 0x3ff80025c22: e7cc d000 20f3 .byte 0xe7, 0xcc, 0xd0, 0x00, 0x20, 0xf3 0x3ff80025c28: e700 1000 2ef3 .byte 0xe7, 0x00, 0x10, 0x00, 0x2e, 0xf3 0x3ff80025c2e: e744 5000 2ef3 .byte 0xe7, 0x44, 0x50, 0x00, 0x2e, 0xf3 0x3ff80025c34: e733 0000 006d .byte 0xe7, 0x33, 0x00, 0x00, 0x00, 0x6d 0x3ff80025c3a: e777 4000 006d .byte 0xe7, 0x77, 0x40, 0x00, 0x00, 0x6d 0x3ff80025c40: e7bb 8000 006d .byte 0xe7, 0xbb, 0x80, 0x00, 0x00, 0x6d 0x3ff80025c46: e7ff c000 006d .byte 0xe7, 0xff, 0xc0, 0x00, 0x00, 0x6d 0x3ff80025c4c: e733 0000 0e6d .byte 0xe7, 0x33, 0x00, 0x00, 0x0e, 0x6d 0x3ff80025c52: e777 4000 0e6d .byte 0xe7, 0x77, 0x40, 0x00, 0x0e, 0x6d 0x3ff80025c58: e733 0008 2033 .byte 0xe7, 0x33, 0x00, 0x08, 0x20, 0x33 Regards, Phil. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: chacha20-s390 broken in 8.2.0 in TCG on s390x 2024-01-03 11:53 ` Philippe Mathieu-Daudé @ 2024-01-03 14:01 ` Philippe Mathieu-Daudé 2024-01-03 14:37 ` Philippe Mathieu-Daudé 0 siblings, 1 reply; 10+ messages in thread From: Philippe Mathieu-Daudé @ 2024-01-03 14:01 UTC (permalink / raw) To: Michael Tokarev, Richard Henderson, QEMU Developers Cc: qemu-s390x, David Hildenbrand On 3/1/24 12:53, Philippe Mathieu-Daudé wrote: > Hi Richard, > > On 3/1/24 09:54, Michael Tokarev wrote: >> 03.01.2024 03:22, Richard Henderson wrote: >>> On 12/22/23 01:51, Michael Tokarev wrote: >> ... >>>> git bisect points to this commit: >>>> >>>> commit ab84dc398b3b702b0c692538b947ef65dbbdf52f >>>> Author: Richard Henderson <richard.henderson@linaro.org> >>>> Date: Wed Aug 23 23:04:24 2023 -0700 >>>> >>>> tcg/optimize: Optimize env memory operations >>>> >>>> So far, this seems to work on amd64 host, but fails on s390x host - >>>> where this has been observed so far. Maybe it also fails in some >>>> other combinations too, I don't yet know. Just finished bisecting >>>> it on s390x. >>> >>> I haven't been able to build a reproducer for this. >>> Have you an image or kernel you can share? >> >> Sure. >> >> Here's my actual testing "image": >> http://www.corpit.ru/mjt/tmp/s390x-chacha.tar.gz >> >> It contains vmlinuz and initrd - generated on a debian s390x system >> using standard >> debian tools. >> >> Actual command line I used when doing bisection: >> >> ~/qemu/b/qemu-system-s390x -append "root=/dev/vda rw" -nographic >> -smp 2 -drive format=raw,file=vmlinuz,if=virtio -no-user-config -m 1G >> -kernel vmlinuz -initrd initrd -snapshot > > I had a quick look at the reproducer and reduced the code > area to: > > void tcg_optimize(TCGContext *s) > { > ... > switch (opc) { > case INDEX_op_ld_vec: > done = fold_tcg_ld_memcopy(&ctx, op); > > > static bool fold_tcg_ld_memcopy(OptContext *ctx, TCGOp *op) > { > ... > if (src && src->base_type == type) { > return tcg_opt_gen_mov(ctx, op, temp_arg(dst), temp_arg(src)); > } > > > static bool tcg_opt_gen_mov(OptContext *ctx, TCGOp *op, TCGArg dst, > TCGArg src) > { > ... > switch (ctx->type) { > case TCG_TYPE_V128: > new_op = INDEX_op_mov_vec; > > > By disabling this optimization, the test succeeds. > > Looking at commit 4caad79f8d ("tcg/s390x: Support 128-bit load/store") > and remembering the constraints change on PPC LQ in > https://lore.kernel.org/qemu-devel/20240102013456.131846-1-richard.henderson@linaro.org/ > I wondered if LPQ constraints are correct, but I disabled > TCG_TARGET_HAS_qemu_ldst_i128 and the bug persists (so > re-enabled). > > Then disabling TCG_TARGET_HAS_v64 and TCG_TARGET_HAS_v128 the bug > disappears. Reducing a bit further, it works when disabling rotli_vec opcode (commit 22cb37b417 "tcg/s390x: Implement vector shift operations"): --- diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc index fbee43d3b0..5f147661e8 100644 --- a/tcg/s390x/tcg-target.c.inc +++ b/tcg/s390x/tcg-target.c.inc @@ -2918,3 +2918,5 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece) case INDEX_op_orc_vec: + return 1; case INDEX_op_rotli_vec: + return TCG_TARGET_HAS_roti_vec; case INDEX_op_rotls_vec: diff --git a/tcg/s390x/tcg-target.h b/tcg/s390x/tcg-target.h index e69b0d2ddd..5c18146a40 100644 --- a/tcg/s390x/tcg-target.h +++ b/tcg/s390x/tcg-target.h @@ -152,3 +152,3 @@ extern uint64_t s390_facilities[3]; #define TCG_TARGET_HAS_abs_vec 1 -#define TCG_TARGET_HAS_roti_vec 1 +#define TCG_TARGET_HAS_roti_vec 0 #define TCG_TARGET_HAS_rots_vec 1 --- ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: chacha20-s390 broken in 8.2.0 in TCG on s390x 2024-01-03 14:01 ` Philippe Mathieu-Daudé @ 2024-01-03 14:37 ` Philippe Mathieu-Daudé 2024-01-03 22:51 ` Richard Henderson 0 siblings, 1 reply; 10+ messages in thread From: Philippe Mathieu-Daudé @ 2024-01-03 14:37 UTC (permalink / raw) To: Michael Tokarev, Richard Henderson, QEMU Developers Cc: qemu-s390x, David Hildenbrand On 3/1/24 15:01, Philippe Mathieu-Daudé wrote: > On 3/1/24 12:53, Philippe Mathieu-Daudé wrote: >> Hi Richard, >> >> On 3/1/24 09:54, Michael Tokarev wrote: >>> 03.01.2024 03:22, Richard Henderson wrote: >>>> On 12/22/23 01:51, Michael Tokarev wrote: >>> ... >>>>> git bisect points to this commit: >>>>> >>>>> commit ab84dc398b3b702b0c692538b947ef65dbbdf52f >>>>> Author: Richard Henderson <richard.henderson@linaro.org> >>>>> Date: Wed Aug 23 23:04:24 2023 -0700 >>>>> >>>>> tcg/optimize: Optimize env memory operations >>>>> >>>>> So far, this seems to work on amd64 host, but fails on s390x host - >>>>> where this has been observed so far. Maybe it also fails in some >>>>> other combinations too, I don't yet know. Just finished bisecting >>>>> it on s390x. >>>> >>>> I haven't been able to build a reproducer for this. >>>> Have you an image or kernel you can share? >>> >>> Sure. >>> >>> Here's my actual testing "image": >>> http://www.corpit.ru/mjt/tmp/s390x-chacha.tar.gz >>> >>> It contains vmlinuz and initrd - generated on a debian s390x system >>> using standard >>> debian tools. >>> >>> Actual command line I used when doing bisection: >>> >>> ~/qemu/b/qemu-system-s390x -append "root=/dev/vda rw" -nographic >>> -smp 2 -drive format=raw,file=vmlinuz,if=virtio -no-user-config -m 1G >>> -kernel vmlinuz -initrd initrd -snapshot > Reducing a bit further, it works when disabling rotli_vec opcode > (commit 22cb37b417 "tcg/s390x: Implement vector shift operations"): > > --- > diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc > index fbee43d3b0..5f147661e8 100644 > --- a/tcg/s390x/tcg-target.c.inc > +++ b/tcg/s390x/tcg-target.c.inc > @@ -2918,3 +2918,5 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType > type, unsigned vece) > case INDEX_op_orc_vec: > + return 1; > case INDEX_op_rotli_vec: > + return TCG_TARGET_HAS_roti_vec; > case INDEX_op_rotls_vec: > diff --git a/tcg/s390x/tcg-target.h b/tcg/s390x/tcg-target.h > index e69b0d2ddd..5c18146a40 100644 > --- a/tcg/s390x/tcg-target.h > +++ b/tcg/s390x/tcg-target.h > @@ -152,3 +152,3 @@ extern uint64_t s390_facilities[3]; > #define TCG_TARGET_HAS_abs_vec 1 > -#define TCG_TARGET_HAS_roti_vec 1 > +#define TCG_TARGET_HAS_roti_vec 0 > #define TCG_TARGET_HAS_rots_vec 1 > --- Finally changing the constraints on op_rotli_vec seems to fix it: --- diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc index fbee43d3b0..b3456fe857 100644 --- a/tcg/s390x/tcg-target.c.inc +++ b/tcg/s390x/tcg-target.c.inc @@ -3264,13 +3264,13 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) case INDEX_op_ld_vec: case INDEX_op_dupm_vec: + case INDEX_op_rotli_vec: return C_O1_I1(v, r); case INDEX_op_dup_vec: return C_O1_I1(v, vr); case INDEX_op_abs_vec: case INDEX_op_neg_vec: case INDEX_op_not_vec: - case INDEX_op_rotli_vec: case INDEX_op_sari_vec: case INDEX_op_shli_vec: case INDEX_op_shri_vec: case INDEX_op_s390_vuph_vec: case INDEX_op_s390_vupl_vec: return C_O1_I1(v, v); --- But I'm outside of my comfort zone so not really sure what I'm doing... (I was inspired by the "the instruction verll only allows immediates up to 32 bits." comment from https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg317099.html) ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: chacha20-s390 broken in 8.2.0 in TCG on s390x 2024-01-03 14:37 ` Philippe Mathieu-Daudé @ 2024-01-03 22:51 ` Richard Henderson 2024-01-17 11:53 ` Michael Tokarev 0 siblings, 1 reply; 10+ messages in thread From: Richard Henderson @ 2024-01-03 22:51 UTC (permalink / raw) To: Philippe Mathieu-Daudé, Michael Tokarev, QEMU Developers Cc: qemu-s390x, David Hildenbrand On 1/4/24 01:37, Philippe Mathieu-Daudé wrote: > Finally changing the constraints on op_rotli_vec seems to fix it: > > --- > diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc > index fbee43d3b0..b3456fe857 100644 > --- a/tcg/s390x/tcg-target.c.inc > +++ b/tcg/s390x/tcg-target.c.inc > @@ -3264,13 +3264,13 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) > case INDEX_op_ld_vec: > case INDEX_op_dupm_vec: > + case INDEX_op_rotli_vec: > return C_O1_I1(v, r); > case INDEX_op_dup_vec: > return C_O1_I1(v, vr); > case INDEX_op_abs_vec: > case INDEX_op_neg_vec: > case INDEX_op_not_vec: > - case INDEX_op_rotli_vec: > case INDEX_op_sari_vec: > case INDEX_op_shli_vec: > case INDEX_op_shri_vec: > case INDEX_op_s390_vuph_vec: > case INDEX_op_s390_vupl_vec: > return C_O1_I1(v, v); Definitely not correct, since VERLL requires a vector input to be rotated. > But I'm outside of my comfort zone so not really sure what I'm doing... > (I was inspired by the "the instruction verll only allows immediates up > to 32 bits." comment from > https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg317099.html) That message is confused. The immediate in VERLL is 12 bits (with only 6 bits ever used for MO_64). Dunno where "32 bits" comes from. r~ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: chacha20-s390 broken in 8.2.0 in TCG on s390x 2024-01-03 22:51 ` Richard Henderson @ 2024-01-17 11:53 ` Michael Tokarev 2024-01-17 13:46 ` Alex Bennée 2024-01-17 14:00 ` Philippe Mathieu-Daudé 0 siblings, 2 replies; 10+ messages in thread From: Michael Tokarev @ 2024-01-17 11:53 UTC (permalink / raw) To: Richard Henderson, Philippe Mathieu-Daudé, QEMU Developers Cc: qemu-s390x, David Hildenbrand 04.01.2024 01:51, Richard Henderson : > On 1/4/24 01:37, Philippe Mathieu-Daudé wrote: >> Finally changing the constraints on op_rotli_vec seems to fix it: >> >> --- >> diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc >> index fbee43d3b0..b3456fe857 100644 >> --- a/tcg/s390x/tcg-target.c.inc >> +++ b/tcg/s390x/tcg-target.c.inc >> @@ -3264,13 +3264,13 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) >> case INDEX_op_ld_vec: >> case INDEX_op_dupm_vec: >> + case INDEX_op_rotli_vec: >> return C_O1_I1(v, r); >> case INDEX_op_dup_vec: >> return C_O1_I1(v, vr); >> case INDEX_op_abs_vec: >> case INDEX_op_neg_vec: >> case INDEX_op_not_vec: >> - case INDEX_op_rotli_vec: >> case INDEX_op_sari_vec: >> case INDEX_op_shli_vec: >> case INDEX_op_shri_vec: >> case INDEX_op_s390_vuph_vec: >> case INDEX_op_s390_vupl_vec: >> return C_O1_I1(v, v); > > Definitely not correct, since VERLL requires a vector input to be rotated. > >> But I'm outside of my comfort zone so not really sure what I'm doing... >> (I was inspired by the "the instruction verll only allows immediates up >> to 32 bits." comment from >> https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg317099.html) > > That message is confused. The immediate in VERLL is 12 bits (with only 6 bits ever used for MO_64). Dunno where "32 bits" comes from. So, what do we have here in the end? Should we fix this on qemu side? This thread stopped quite some time ago, with problematic instruction found but no solution.. Thanks, /mjt ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: chacha20-s390 broken in 8.2.0 in TCG on s390x 2024-01-17 11:53 ` Michael Tokarev @ 2024-01-17 13:46 ` Alex Bennée 2024-01-17 14:00 ` Philippe Mathieu-Daudé 1 sibling, 0 replies; 10+ messages in thread From: Alex Bennée @ 2024-01-17 13:46 UTC (permalink / raw) To: Michael Tokarev Cc: Richard Henderson, Philippe Mathieu-Daudé, QEMU Developers, qemu-s390x, David Hildenbrand Michael Tokarev <mjt@tls.msk.ru> writes: > 04.01.2024 01:51, Richard Henderson : >> On 1/4/24 01:37, Philippe Mathieu-Daudé wrote: >>> Finally changing the constraints on op_rotli_vec seems to fix it: >>> >>> --- >>> diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc >>> index fbee43d3b0..b3456fe857 100644 >>> --- a/tcg/s390x/tcg-target.c.inc >>> +++ b/tcg/s390x/tcg-target.c.inc >>> @@ -3264,13 +3264,13 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) >>> case INDEX_op_ld_vec: >>> case INDEX_op_dupm_vec: >>> + case INDEX_op_rotli_vec: >>> return C_O1_I1(v, r); >>> case INDEX_op_dup_vec: >>> return C_O1_I1(v, vr); >>> case INDEX_op_abs_vec: >>> case INDEX_op_neg_vec: >>> case INDEX_op_not_vec: >>> - case INDEX_op_rotli_vec: >>> case INDEX_op_sari_vec: >>> case INDEX_op_shli_vec: >>> case INDEX_op_shri_vec: >>> case INDEX_op_s390_vuph_vec: >>> case INDEX_op_s390_vupl_vec: >>> return C_O1_I1(v, v); >> Definitely not correct, since VERLL requires a vector input to be >> rotated. >> >>> But I'm outside of my comfort zone so not really sure what I'm doing... >>> (I was inspired by the "the instruction verll only allows immediates up >>> to 32 bits." comment from >>> https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg317099.html) >> That message is confused. The immediate in VERLL is 12 bits (with >> only 6 bits ever used for MO_64). Dunno where "32 bits" comes from. > > So, what do we have here in the end? > Should we fix this on qemu side? I think the thinking is we should disable the optimisation for the 8.2 stable while figuring out the true fix for 9.0. > > This thread stopped quite some time ago, with problematic > instruction found but no solution.. > > Thanks, > > /mjt -- Alex Bennée Virtualisation Tech Lead @ Linaro ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: chacha20-s390 broken in 8.2.0 in TCG on s390x 2024-01-17 11:53 ` Michael Tokarev 2024-01-17 13:46 ` Alex Bennée @ 2024-01-17 14:00 ` Philippe Mathieu-Daudé 1 sibling, 0 replies; 10+ messages in thread From: Philippe Mathieu-Daudé @ 2024-01-17 14:00 UTC (permalink / raw) To: Michael Tokarev, Richard Henderson, QEMU Developers Cc: qemu-s390x, David Hildenbrand On 17/1/24 12:53, Michael Tokarev wrote: > 04.01.2024 01:51, Richard Henderson : >> On 1/4/24 01:37, Philippe Mathieu-Daudé wrote: >>> Finally changing the constraints on op_rotli_vec seems to fix it: >>> >>> --- >>> diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc >>> index fbee43d3b0..b3456fe857 100644 >>> --- a/tcg/s390x/tcg-target.c.inc >>> +++ b/tcg/s390x/tcg-target.c.inc >>> @@ -3264,13 +3264,13 @@ static TCGConstraintSetIndex >>> tcg_target_op_def(TCGOpcode op) >>> case INDEX_op_ld_vec: >>> case INDEX_op_dupm_vec: >>> + case INDEX_op_rotli_vec: >>> return C_O1_I1(v, r); >>> case INDEX_op_dup_vec: >>> return C_O1_I1(v, vr); >>> case INDEX_op_abs_vec: >>> case INDEX_op_neg_vec: >>> case INDEX_op_not_vec: >>> - case INDEX_op_rotli_vec: >>> case INDEX_op_sari_vec: >>> case INDEX_op_shli_vec: >>> case INDEX_op_shri_vec: >>> case INDEX_op_s390_vuph_vec: >>> case INDEX_op_s390_vupl_vec: >>> return C_O1_I1(v, v); >> >> Definitely not correct, since VERLL requires a vector input to be >> rotated. >> >>> But I'm outside of my comfort zone so not really sure what I'm doing... >>> (I was inspired by the "the instruction verll only allows immediates up >>> to 32 bits." comment from >>> https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg317099.html) >> >> That message is confused. The immediate in VERLL is 12 bits (with >> only 6 bits ever used for MO_64). Dunno where "32 bits" comes from. > > So, what do we have here in the end? > Should we fix this on qemu side? Yes. > This thread stopped quite some time ago, with problematic > instruction found but no solution.. Be assured we are spending (too?) many hours on this... ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2024-01-17 14:00 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-12-21 14:51 chacha20-s390 broken in 8.2.0 in TCG on s390x Michael Tokarev 2024-01-03 0:22 ` Richard Henderson 2024-01-03 8:54 ` Michael Tokarev 2024-01-03 11:53 ` Philippe Mathieu-Daudé 2024-01-03 14:01 ` Philippe Mathieu-Daudé 2024-01-03 14:37 ` Philippe Mathieu-Daudé 2024-01-03 22:51 ` Richard Henderson 2024-01-17 11:53 ` Michael Tokarev 2024-01-17 13:46 ` Alex Bennée 2024-01-17 14:00 ` Philippe Mathieu-Daudé
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).