* tpm-tis-device-swtpm-test timeout
@ 2024-10-15 18:11 Fabiano Rosas
2024-10-15 18:58 ` Stefan Berger
0 siblings, 1 reply; 10+ messages in thread
From: Fabiano Rosas @ 2024-10-15 18:11 UTC (permalink / raw)
To: qemu-devel; +Cc: Stefan Berger
Hi Stefan,
I see the tpm-tis-device-swtpm test timing out, could you take a look?
qemu:qtest+qtest-aarch64 / qtest-aarch64/tpm-tis-device-swtpm-test time out (After 60.0 seconds)
135/138 qemu:qtest+qtest-aarch64 /
qtest-aarch64/tpm-tis-device-swtpm-test TIMEOUT 60.01s killed by signal 11 SIGSEGV
(gdb) bt
#0 0x00005654e7ac4872 in tpm_tis_transfer (s=0x5654e7dec570, req=0x5654e7b42410 <tpm_pcrread> "\200\001", req_size=21, rsp=0x7ffd2295ca00 '\377' <repeats 200 times>..., rsp_size=1024) at ../tests/qtest/tpm-tis-util.c:490
#1 0x00005654e7ac1879 in tpm_util_pcrread (s=0x5654e7dec570, tx=0x5654e7ac467a <tpm_tis_transfer>, exp_resp=0x5654e7b42620 <tpm_pcrread_resp> "\200\001", exp_resp_size=63) at ../tests/qtest/tpm-util.c:96
#2 0xffffffffffffffff in ?? ()
...
(gdb) l
485 sts = qtest_readl(s, TIS_REG(0, TPM_TIS_REG_STS));
486 bcount = (sts >> 8) & 0xffff;
487
488 memset(rsp, 0, rsp_size);
489 for (i = 0; i < bcount; i++) {
490 rsp[i] = qtest_readb(s, TIS_REG(0, TPM_TIS_REG_DATA_FIFO));
491 }
492
493 /* relinquish use of locality 0 */
494 qtest_writeb(s, TIS_REG(0, TPM_TIS_REG_ACCESS),
(gdb) p rsp_size
$5 = 1024
(gdb) p bcount
$6 = 65535
Thanks!
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: tpm-tis-device-swtpm-test timeout
2024-10-15 18:11 tpm-tis-device-swtpm-test timeout Fabiano Rosas
@ 2024-10-15 18:58 ` Stefan Berger
2024-10-15 19:22 ` Fabiano Rosas
0 siblings, 1 reply; 10+ messages in thread
From: Stefan Berger @ 2024-10-15 18:58 UTC (permalink / raw)
To: Fabiano Rosas, qemu-devel
On 10/15/24 2:11 PM, Fabiano Rosas wrote:
> Hi Stefan,
>
> I see the tpm-tis-device-swtpm test timing out, could you take a look?
>
> qemu:qtest+qtest-aarch64 / qtest-aarch64/tpm-tis-device-swtpm-test time out (After 60.0 seconds)
> 135/138 qemu:qtest+qtest-aarch64 /
> qtest-aarch64/tpm-tis-device-swtpm-test TIMEOUT 60.01s killed by signal 11 SIGSEGV
Is this something new or been happening for a while? Does it happen
consistently? I just build the master branch and ran the tests on a very
old machine, I mean 13 years old. The host runs Fedora 40 with
libtpms-0.96-6 + swtpm-0.9.0 from the distro:
104/473 qemu:qtest+qtest-aarch64 /
qtest-aarch64/tpm-tis-device-swtpm-test OK 2.32s
2 subtests passed
I see some other non-TPM tests timing out but not this one. Same result
for TIS test on 2nd run.
104/473 qemu:qtest+qtest-aarch64 /
qtest-aarch64/tpm-tis-device-swtpm-test OK 2.51s
2 subtests passed
3rd run:
104/473 qemu:qtest+qtest-aarch64 /
qtest-aarch64/tpm-tis-device-swtpm-test OK 2.37s
2 subtests passed
4th run:
104/473 qemu:qtest+qtest-aarch64 /
qtest-aarch64/tpm-tis-device-swtpm-test OK 2.63s
2 subtests passed
If I run it as a single test on the command line like this then it
consistently works here - I ran it >10 times now.
$ QTEST_QEMU_BINARY=build/qemu-system-aarch64
./build/tests/qtest/tpm-tis-device-swtpm-test
TAP version 13
# random seed: R02S8155b3cf024c2696e895e5f548396138
1..2
# Start of aarch64 tests
# Start of tpm tests
# Start of tis-swtpm tests
# starting QEMU: exec build/qemu-system-aarch64 -qtest
unix:/tmp/qtest-3738308.sock -qtest-log /dev/null -chardev
socket,path=/tmp/qtest-3738308.qmp,id=char0 -mon
chardev=char0,mode=control -display none -audio none -machine
virt,gic-version=max -accel tcg -chardev
socket,id=chr,path=/tmp/qemu-tpm-tis-device-swtpm-test.YEUOV2/sock
-tpmdev emulator,id=dev,chardev=chr -device tpm-tis-device,tpmdev=dev
-accel qtest
ok 1 /aarch64/tpm/tis-swtpm/test
# slow test /aarch64/tpm/tis-swtpm/test executed in 0.66 secs
# End of tis-swtpm tests
# Start of tis-swtpm-migration tests
# starting QEMU: exec build/qemu-system-aarch64 -qtest
unix:/tmp/qtest-3738308.sock -qtest-log /dev/null -chardev
socket,path=/tmp/qtest-3738308.qmp,id=char0 -mon
chardev=char0,mode=control -display none -audio none -machine
virt,gic-version=max -accel tcg -chardev
socket,id=chr,path=/tmp/qemu-tpm-tis-device-swtpm-test.YEUOV2/sock
-tpmdev emulator,id=dev,chardev=chr -device tpm-tis-device,tpmdev=dev
-accel qtest
# starting QEMU: exec build/qemu-system-aarch64 -qtest
unix:/tmp/qtest-3738308.sock -qtest-log /dev/null -chardev
socket,path=/tmp/qtest-3738308.qmp,id=char0 -mon
chardev=char0,mode=control -display none -audio none -machine
virt,gic-version=max -accel tcg -chardev
socket,id=chr,path=/tmp/qemu-tpm-tis-device-swtpm-test.V9TOV2/sock
-tpmdev emulator,id=dev,chardev=chr -device tpm-tis-device,tpmdev=dev
-incoming unix:/tmp/qemu-tpm-tis-device-swtpm-test.YEUOV2/migsocket
-accel qtest
ok 2 /aarch64/tpm/tis-swtpm-migration/test
# slow test /aarch64/tpm/tis-swtpm-migration/test executed in 1.10 secs
# End of tis-swtpm-migration tests
# End of tpm tests
# End of aarch64 tests
>
> (gdb) bt
> #0 0x00005654e7ac4872 in tpm_tis_transfer (s=0x5654e7dec570, req=0x5654e7b42410 <tpm_pcrread> "\200\001", req_size=21, rsp=0x7ffd2295ca00 '\377' <repeats 200 times>..., rsp_size=1024) at ../tests/qtest/tpm-tis-util.c:490
> #1 0x00005654e7ac1879 in tpm_util_pcrread (s=0x5654e7dec570, tx=0x5654e7ac467a <tpm_tis_transfer>, exp_resp=0x5654e7b42620 <tpm_pcrread_resp> "\200\001", exp_resp_size=63) at ../tests/qtest/tpm-util.c:96
> #2 0xffffffffffffffff in ?? ()
> ...
>
> (gdb) l
> 485 sts = qtest_readl(s, TIS_REG(0, TPM_TIS_REG_STS));
> 486 bcount = (sts >> 8) & 0xffff;
> 487
> 488 memset(rsp, 0, rsp_size);
> 489 for (i = 0; i < bcount; i++) {
> 490 rsp[i] = qtest_readb(s, TIS_REG(0, TPM_TIS_REG_DATA_FIFO));
> 491 }
> 492
> 493 /* relinquish use of locality 0 */
> 494 qtest_writeb(s, TIS_REG(0, TPM_TIS_REG_ACCESS),
> (gdb) p rsp_size
> $5 = 1024
> (gdb) p bcount
> $6 = 65535
>
> Thanks!
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: tpm-tis-device-swtpm-test timeout
2024-10-15 18:58 ` Stefan Berger
@ 2024-10-15 19:22 ` Fabiano Rosas
2024-10-15 19:32 ` Stefan Berger
0 siblings, 1 reply; 10+ messages in thread
From: Fabiano Rosas @ 2024-10-15 19:22 UTC (permalink / raw)
To: Stefan Berger, qemu-devel
Stefan Berger <stefanb@linux.ibm.com> writes:
> On 10/15/24 2:11 PM, Fabiano Rosas wrote:
>> Hi Stefan,
>>
>> I see the tpm-tis-device-swtpm test timing out, could you take a look?
>>
>> qemu:qtest+qtest-aarch64 / qtest-aarch64/tpm-tis-device-swtpm-test time out (After 60.0 seconds)
>> 135/138 qemu:qtest+qtest-aarch64 /
>> qtest-aarch64/tpm-tis-device-swtpm-test TIMEOUT 60.01s killed by signal 11 SIGSEGV
>
> Is this something new or been happening for a while? Does it happen
> consistently? I just build the master branch and ran the tests on a very
> old machine, I mean 13 years old. The host runs Fedora 40 with
> libtpms-0.96-6 + swtpm-0.9.0 from the distro:
It's the first time I see it. I can reproduce consistently by running
that test in a loop while make -j16 check is running in another window.
I'm on openSUSE Leap 15.5:
$ rpm -qa | grep libtpms
libtpms0-0.8.2-150300.3.9.1.x86_64
$ rpm -qa | grep swtpm
swtpm-0.7.3-150500.2.1.x86_64
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: tpm-tis-device-swtpm-test timeout
2024-10-15 19:22 ` Fabiano Rosas
@ 2024-10-15 19:32 ` Stefan Berger
2024-10-15 19:57 ` Fabiano Rosas
0 siblings, 1 reply; 10+ messages in thread
From: Stefan Berger @ 2024-10-15 19:32 UTC (permalink / raw)
To: Fabiano Rosas, qemu-devel
On 10/15/24 3:22 PM, Fabiano Rosas wrote:
> Stefan Berger <stefanb@linux.ibm.com> writes:
>
>> On 10/15/24 2:11 PM, Fabiano Rosas wrote:
>>> Hi Stefan,
>>>
>>> I see the tpm-tis-device-swtpm test timing out, could you take a look?
>>>
>>> qemu:qtest+qtest-aarch64 / qtest-aarch64/tpm-tis-device-swtpm-test time out (After 60.0 seconds)
>>> 135/138 qemu:qtest+qtest-aarch64 /
>>> qtest-aarch64/tpm-tis-device-swtpm-test TIMEOUT 60.01s killed by signal 11 SIGSEGV
>>
>> Is this something new or been happening for a while? Does it happen
>> consistently? I just build the master branch and ran the tests on a very
>> old machine, I mean 13 years old. The host runs Fedora 40 with
>> libtpms-0.96-6 + swtpm-0.9.0 from the distro:
>
> It's the first time I see it. I can reproduce consistently by running
> that test in a loop while make -j16 check is running in another window.
So this here is failing for you every time?
QTEST_QEMU_BINARY=build/qemu-system-aarch64
./build/tests/qtest/tpm-tis-device-swtpm-test
This test would start swtpm. Do you see it running while this test is
running? It should show process like this running for a very short time
obviously [ while :; do ps aux | grep -v grep | grep swtpm; done ]:
stefanb 4087462 0.0 0.0 11040 3732 pts/6 S+ 15:30 0:00 swtpm
socket --tpmstate dir=/tmp/qemu-tpm-tis-device-swtpm-test.D3SLV2 --ctrl
type=unixio,path=/tmp/qemu-tpm-tis-device-swtpm-test.D3SLV2/sock --tpm2
stefanb 4087467 0.0 0.0 11040 3660 pts/6 S+ 15:30 0:00 swtpm
socket --tpmstate dir=/tmp/qemu-tpm-tis-device-swtpm-test.H1SLV2 --ctrl
type=unixio,path=/tmp/qemu-tpm-tis-device-swtpm-test.H1SLV2/sock --tpm2
>
> I'm on openSUSE Leap 15.5:
>
> $ rpm -qa | grep libtpms
> libtpms0-0.8.2-150300.3.9.1.x86_64
>
> $ rpm -qa | grep swtpm
> swtpm-0.7.3-150500.2.1.x86_64
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: tpm-tis-device-swtpm-test timeout
2024-10-15 19:32 ` Stefan Berger
@ 2024-10-15 19:57 ` Fabiano Rosas
2024-10-15 20:57 ` Stefan Berger
0 siblings, 1 reply; 10+ messages in thread
From: Fabiano Rosas @ 2024-10-15 19:57 UTC (permalink / raw)
To: Stefan Berger, qemu-devel
Stefan Berger <stefanb@linux.ibm.com> writes:
> On 10/15/24 3:22 PM, Fabiano Rosas wrote:
>> Stefan Berger <stefanb@linux.ibm.com> writes:
>>
>>> On 10/15/24 2:11 PM, Fabiano Rosas wrote:
>>>> Hi Stefan,
>>>>
>>>> I see the tpm-tis-device-swtpm test timing out, could you take a look?
>>>>
>>>> qemu:qtest+qtest-aarch64 / qtest-aarch64/tpm-tis-device-swtpm-test time out (After 60.0 seconds)
>>>> 135/138 qemu:qtest+qtest-aarch64 /
>>>> qtest-aarch64/tpm-tis-device-swtpm-test TIMEOUT 60.01s killed by signal 11 SIGSEGV
>>>
>>> Is this something new or been happening for a while? Does it happen
>>> consistently? I just build the master branch and ran the tests on a very
>>> old machine, I mean 13 years old. The host runs Fedora 40 with
>>> libtpms-0.96-6 + swtpm-0.9.0 from the distro:
>>
>> It's the first time I see it. I can reproduce consistently by running
>> that test in a loop while make -j16 check is running in another window.
>
> So this here is failing for you every time?
>
> QTEST_QEMU_BINARY=build/qemu-system-aarch64
> ./build/tests/qtest/tpm-tis-device-swtpm-test
Sorry, I was unclear. No, that runs for about 30 iterations before it
fails. I just ran each of these in a terminal window:
$ for i in $(seq 1 999); do echo "$i ============="; QTEST_QEMU_BINARY=./qemu-system-aarch64 ./tests/qtest/tpm-tis-device-swtpm-test || break ; done
$ make -j$(nproc) check
$ while :; do ps aux | grep -v grep | grep swtpm; done
The result is:
$ for i in $(seq 1 999); do echo "$i =============";
QTEST_QEMU_BINARY=./qemu-system-aarch64
./tests/qtest/tpm-tis-device-swtpm-test || break ; done
...
...
22 =============
# random seed: R02Se1d542a529b7d9bfdfdcaf9d5359cbb9
1..2
# Start of aarch64 tests
# Start of tpm tests
# Start of tis-swtpm tests
# starting QEMU: exec ./qemu-system-aarch64 -qtest unix:/tmp/qtest-14870.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-14870.qmp,id=char0 -mon chardev=char0,mode=control -display
none -audio none -machine virt,gic-version=max -accel tcg -chardev socket,id=chr,path=/tmp/qemu-tpm-tis-device-swtpm-test.VWOBV2/sock -tpmdev emulator,id=dev,chardev=chr -device tpm-tis-devic
e,tpmdev=dev -accel qtest
ok 1 /aarch64/tpm/tis-swtpm/test
# End of tis-swtpm tests
# Start of tis-swtpm-migration tests
# starting QEMU: exec ./qemu-system-aarch64 -qtest unix:/tmp/qtest-14870.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-14870.qmp,id=char0 -mon chardev=char0,mode=control -display
none -audio none -machine virt,gic-version=max -accel tcg -chardev socket,id=chr,path=/tmp/qemu-tpm-tis-device-swtpm-test.VWOBV2/sock -tpmdev emulator,id=dev,chardev=chr -device tpm-tis-devic
e,tpmdev=dev -accel qtest
# starting QEMU: exec ./qemu-system-aarch64 -qtest unix:/tmp/qtest-14870.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-14870.qmp,id=char0 -mon chardev=char0,mode=control -display
none -audio none -machine virt,gic-version=max -accel tcg -chardev socket,id=chr,path=/tmp/qemu-tpm-tis-device-swtpm-test.QXOBV2/sock -tpmdev emulator,id=dev,chardev=chr -device tpm-tis-devic
e,tpmdev=dev -incoming unix:/tmp/qemu-tpm-tis-device-swtpm-test.VWOBV2/migsocket -accel qtest
Segmentation fault (core dumped)
$ swtpm: Data client disconnected
swtpm: Data client disconnected
>
> This test would start swtpm. Do you see it running while this test is
> running? It should show process like this running for a very short time
> obviously [ while :; do ps aux | grep -v grep | grep swtpm; done ]:
I see several of:
fabiano 30146 0.0 0.0 18056 5564 pts/9 S 15:11 0:00 swtpm socket
--tpmstate dir=/tmp/qemu-tpm-tis-device-swtpm-test.DQFLV2 --ctrl
type=unixio,path=/tmp/qemu-tpm-tis-device-swtpm-test.DQFLV2/sock --tpm2
> stefanb 4087462 0.0 0.0 11040 3732 pts/6 S+ 15:30 0:00 swtpm
> socket --tpmstate dir=/tmp/qemu-tpm-tis-device-swtpm-test.D3SLV2 --ctrl
> type=unixio,path=/tmp/qemu-tpm-tis-device-swtpm-test.D3SLV2/sock --tpm2
> stefanb 4087467 0.0 0.0 11040 3660 pts/6 S+ 15:30 0:00 swtpm
> socket --tpmstate dir=/tmp/qemu-tpm-tis-device-swtpm-test.H1SLV2 --ctrl
> type=unixio,path=/tmp/qemu-tpm-tis-device-swtpm-test.H1SLV2/sock --tpm2
>
>
>>
>> I'm on openSUSE Leap 15.5:
>>
>> $ rpm -qa | grep libtpms
>> libtpms0-0.8.2-150300.3.9.1.x86_64
>>
>> $ rpm -qa | grep swtpm
>> swtpm-0.7.3-150500.2.1.x86_64
>>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: tpm-tis-device-swtpm-test timeout
2024-10-15 19:57 ` Fabiano Rosas
@ 2024-10-15 20:57 ` Stefan Berger
2024-10-15 22:02 ` Fabiano Rosas
0 siblings, 1 reply; 10+ messages in thread
From: Stefan Berger @ 2024-10-15 20:57 UTC (permalink / raw)
To: Fabiano Rosas, qemu-devel
On 10/15/24 3:57 PM, Fabiano Rosas wrote:
> Stefan Berger <stefanb@linux.ibm.com> writes:
>
>>
>> So this here is failing for you every time?
>>
>> QTEST_QEMU_BINARY=build/qemu-system-aarch64
>> ./build/tests/qtest/tpm-tis-device-swtpm-test
>
> Sorry, I was unclear. No, that runs for about 30 iterations before it
> fails. I just ran each of these in a terminal window:
>
> $ for i in $(seq 1 999); do echo "$i ============="; QTEST_QEMU_BINARY=./qemu-system-aarch64 ./tests/qtest/tpm-tis-device-swtpm-test || break ; done
On my Fedora 40 host this command line here alone has been running for
250 loop iterations now and is still continuing.
> $ make -j$(nproc) check
So this needs to be run in parallel to the above command line to cause
the failure?
> $ while :; do ps aux | grep -v grep | grep swtpm; done
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: tpm-tis-device-swtpm-test timeout
2024-10-15 20:57 ` Stefan Berger
@ 2024-10-15 22:02 ` Fabiano Rosas
2024-10-15 23:35 ` Stefan Berger
0 siblings, 1 reply; 10+ messages in thread
From: Fabiano Rosas @ 2024-10-15 22:02 UTC (permalink / raw)
To: Stefan Berger, qemu-devel
Stefan Berger <stefanb@linux.ibm.com> writes:
> On 10/15/24 3:57 PM, Fabiano Rosas wrote:
>> Stefan Berger <stefanb@linux.ibm.com> writes:
>>
>
>>>
>>> So this here is failing for you every time?
>>>
>>> QTEST_QEMU_BINARY=build/qemu-system-aarch64
>>> ./build/tests/qtest/tpm-tis-device-swtpm-test
>>
>> Sorry, I was unclear. No, that runs for about 30 iterations before it
>> fails. I just ran each of these in a terminal window:
>>
>> $ for i in $(seq 1 999); do echo "$i ============="; QTEST_QEMU_BINARY=./qemu-system-aarch64 ./tests/qtest/tpm-tis-device-swtpm-test || break ; done
>
> On my Fedora 40 host this command line here alone has been running for
> 250 loop iterations now and is still continuing.
>
>> $ make -j$(nproc) check
>
> So this needs to be run in parallel to the above command line to cause
> the failure?
>
Yes, I've been using that method to reproduce live migration race
conditions as well. It's quite effective.
If you don't think you'll be able to find the root cause due to the
unreproducibility on your side, maybe we could at least add an assert
that bcount is not larger than rsp_size. I think that would at least
give an explicit error instead of a buffer overflow.
I can also try to dig deeper into this when I get some time. At the
moment I know nothing about the tpm device emulation.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: tpm-tis-device-swtpm-test timeout
2024-10-15 22:02 ` Fabiano Rosas
@ 2024-10-15 23:35 ` Stefan Berger
2024-10-16 1:52 ` Stefan Berger
2024-10-16 14:03 ` Fabiano Rosas
0 siblings, 2 replies; 10+ messages in thread
From: Stefan Berger @ 2024-10-15 23:35 UTC (permalink / raw)
To: Fabiano Rosas, qemu-devel
On 10/15/24 6:02 PM, Fabiano Rosas wrote:
> Stefan Berger <stefanb@linux.ibm.com> writes:
>
>> On 10/15/24 3:57 PM, Fabiano Rosas wrote:
>>> Stefan Berger <stefanb@linux.ibm.com> writes:
>>>
>>
>>>>
>>>> So this here is failing for you every time?
>>>>
>>>> QTEST_QEMU_BINARY=build/qemu-system-aarch64
>>>> ./build/tests/qtest/tpm-tis-device-swtpm-test
>>>
>>> Sorry, I was unclear. No, that runs for about 30 iterations before it
>>> fails. I just ran each of these in a terminal window:
>>>
>>> $ for i in $(seq 1 999); do echo "$i ============="; QTEST_QEMU_BINARY=./qemu-system-aarch64 ./tests/qtest/tpm-tis-device-swtpm-test || break ; done
>>
>> On my Fedora 40 host this command line here alone has been running for
>> 250 loop iterations now and is still continuing.
>>
>>> $ make -j$(nproc) check
>>
>> So this needs to be run in parallel to the above command line to cause
>> the failure?
>>
>
> Yes, I've been using that method to reproduce live migration race
> conditions as well. It's quite effective.
>
> If you don't think you'll be able to find the root cause due to the
> unreproducibility on your side, maybe we could at least add an assert
> that bcount is not larger than rsp_size. I think that would at least
> give an explicit error instead of a buffer overflow.
>
> I can also try to dig deeper into this when I get some time. At the
> moment I know nothing about the tpm device emulation.
>
The loop has run 3000 times by itself so that part is stable. However,
it seems there is some other test case that the loop cannot run in
parallel with. So, yes there is 'something'. ... ... Just having all
CPUs in a system busy requires waiting for migration to be complete on
the dst_qemu side as well. Can you try it with this patch:
diff --git a/tests/qtest/tpm-tests.c b/tests/qtest/tpm-tests.c
index fb94496bbd..b52cd44841 100644
--- a/tests/qtest/tpm-tests.c
+++ b/tests/qtest/tpm-tests.c
@@ -115,6 +115,7 @@ void tpm_test_swtpm_migration_test(const char
*src_tpm_path,
tpm_util_migrate(src_qemu, uri);
tpm_util_wait_for_migration_complete(src_qemu);
+ tpm_util_wait_for_migration_complete(dst_qemu);
tpm_util_pcrread(dst_qemu, tx, tpm_pcrread_resp,
sizeof(tpm_pcrread_resp));
For me this fixes the issue I had seen where reading the STS register
was done too early before all the TPM TIS state was completely restored.
The active locality was -1 and STS return 0xffffffff and from then on
things went bad.
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: tpm-tis-device-swtpm-test timeout
2024-10-15 23:35 ` Stefan Berger
@ 2024-10-16 1:52 ` Stefan Berger
2024-10-16 14:03 ` Fabiano Rosas
1 sibling, 0 replies; 10+ messages in thread
From: Stefan Berger @ 2024-10-16 1:52 UTC (permalink / raw)
To: Fabiano Rosas, qemu-devel
On 10/15/24 7:35 PM, Stefan Berger wrote:
>
>
> On 10/15/24 6:02 PM, Fabiano Rosas wrote:
>> Stefan Berger <stefanb@linux.ibm.com> writes:
>>
>>>
>>
>> Yes, I've been using that method to reproduce live migration race
>> conditions as well. It's quite effective.
>>
>> If you don't think you'll be able to find the root cause due to the
>> unreproducibility on your side, maybe we could at least add an assert
>> that bcount is not larger than rsp_size. I think that would at least
>> give an explicit error instead of a buffer overflow.
>>
>> I can also try to dig deeper into this when I get some time. At the
>> moment I know nothing about the tpm device emulation.
>>
>
> The loop has run 3000 times by itself so that part is stable. However,
> it seems there is some other test case that the loop cannot run in
> parallel with. So, yes there is 'something'. ... ... Just having all
> CPUs in a system busy requires waiting for migration to be complete on
> the dst_qemu side as well. Can you try it with this patch:
>
> diff --git a/tests/qtest/tpm-tests.c b/tests/qtest/tpm-tests.c
> index fb94496bbd..b52cd44841 100644
> --- a/tests/qtest/tpm-tests.c
> +++ b/tests/qtest/tpm-tests.c
> @@ -115,6 +115,7 @@ void tpm_test_swtpm_migration_test(const char
> *src_tpm_path,
>
> tpm_util_migrate(src_qemu, uri);
> tpm_util_wait_for_migration_complete(src_qemu);
> + tpm_util_wait_for_migration_complete(dst_qemu);
>
> tpm_util_pcrread(dst_qemu, tx, tpm_pcrread_resp,
> sizeof(tpm_pcrread_resp));
>
Only waiting on the destination side is also possible:
diff --git a/tests/qtest/tpm-tests.c b/tests/qtest/tpm-tests.c
index fb94496bbd..197714f8d9 100644
--- a/tests/qtest/tpm-tests.c
+++ b/tests/qtest/tpm-tests.c
@@ -114,7 +114,7 @@ void tpm_test_swtpm_migration_test(const char
*src_tpm_path,
sizeof(tpm_pcrread_resp));
tpm_util_migrate(src_qemu, uri);
- tpm_util_wait_for_migration_complete(src_qemu);
+ tpm_util_wait_for_migration_complete(dst_qemu);
tpm_util_pcrread(dst_qemu, tx, tpm_pcrread_resp,
sizeof(tpm_pcrread_resp));
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: tpm-tis-device-swtpm-test timeout
2024-10-15 23:35 ` Stefan Berger
2024-10-16 1:52 ` Stefan Berger
@ 2024-10-16 14:03 ` Fabiano Rosas
1 sibling, 0 replies; 10+ messages in thread
From: Fabiano Rosas @ 2024-10-16 14:03 UTC (permalink / raw)
To: Stefan Berger, qemu-devel
Stefan Berger <stefanb@linux.ibm.com> writes:
> On 10/15/24 6:02 PM, Fabiano Rosas wrote:
>> Stefan Berger <stefanb@linux.ibm.com> writes:
>>
>>> On 10/15/24 3:57 PM, Fabiano Rosas wrote:
>>>> Stefan Berger <stefanb@linux.ibm.com> writes:
>>>>
>>>
>>>>>
>>>>> So this here is failing for you every time?
>>>>>
>>>>> QTEST_QEMU_BINARY=build/qemu-system-aarch64
>>>>> ./build/tests/qtest/tpm-tis-device-swtpm-test
>>>>
>>>> Sorry, I was unclear. No, that runs for about 30 iterations before it
>>>> fails. I just ran each of these in a terminal window:
>>>>
>>>> $ for i in $(seq 1 999); do echo "$i ============="; QTEST_QEMU_BINARY=./qemu-system-aarch64 ./tests/qtest/tpm-tis-device-swtpm-test || break ; done
>>>
>>> On my Fedora 40 host this command line here alone has been running for
>>> 250 loop iterations now and is still continuing.
>>>
>>>> $ make -j$(nproc) check
>>>
>>> So this needs to be run in parallel to the above command line to cause
>>> the failure?
>>>
>>
>> Yes, I've been using that method to reproduce live migration race
>> conditions as well. It's quite effective.
>>
>> If you don't think you'll be able to find the root cause due to the
>> unreproducibility on your side, maybe we could at least add an assert
>> that bcount is not larger than rsp_size. I think that would at least
>> give an explicit error instead of a buffer overflow.
>>
>> I can also try to dig deeper into this when I get some time. At the
>> moment I know nothing about the tpm device emulation.
>>
>
> The loop has run 3000 times by itself so that part is stable. However,
> it seems there is some other test case that the loop cannot run in
> parallel with. So, yes there is 'something'. ... ... Just having all
> CPUs in a system busy requires waiting for migration to be complete on
> the dst_qemu side as well. Can you try it with this patch:
>
> diff --git a/tests/qtest/tpm-tests.c b/tests/qtest/tpm-tests.c
> index fb94496bbd..b52cd44841 100644
> --- a/tests/qtest/tpm-tests.c
> +++ b/tests/qtest/tpm-tests.c
> @@ -115,6 +115,7 @@ void tpm_test_swtpm_migration_test(const char
> *src_tpm_path,
>
> tpm_util_migrate(src_qemu, uri);
> tpm_util_wait_for_migration_complete(src_qemu);
> + tpm_util_wait_for_migration_complete(dst_qemu);
>
> tpm_util_pcrread(dst_qemu, tx, tpm_pcrread_resp,
> sizeof(tpm_pcrread_resp));
>
>
> For me this fixes the issue I had seen where reading the STS register
> was done too early before all the TPM TIS state was completely restored.
> The active locality was -1 and STS return 0xffffffff and from then on
> things went bad.
Thanks, that fixes the issue. Could you send a patch please?
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2024-10-16 14:03 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-15 18:11 tpm-tis-device-swtpm-test timeout Fabiano Rosas
2024-10-15 18:58 ` Stefan Berger
2024-10-15 19:22 ` Fabiano Rosas
2024-10-15 19:32 ` Stefan Berger
2024-10-15 19:57 ` Fabiano Rosas
2024-10-15 20:57 ` Stefan Berger
2024-10-15 22:02 ` Fabiano Rosas
2024-10-15 23:35 ` Stefan Berger
2024-10-16 1:52 ` Stefan Berger
2024-10-16 14:03 ` Fabiano Rosas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).