qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* tpm-tis-device-swtpm-test timeout
@ 2024-10-15 18:11 Fabiano Rosas
  2024-10-15 18:58 ` Stefan Berger
  0 siblings, 1 reply; 10+ messages in thread
From: Fabiano Rosas @ 2024-10-15 18:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: Stefan Berger

Hi Stefan,

I see the tpm-tis-device-swtpm test timing out, could you take a look?

qemu:qtest+qtest-aarch64 / qtest-aarch64/tpm-tis-device-swtpm-test time out (After 60.0 seconds)
135/138 qemu:qtest+qtest-aarch64 /
qtest-aarch64/tpm-tis-device-swtpm-test TIMEOUT 60.01s killed by signal 11 SIGSEGV

(gdb) bt
#0 0x00005654e7ac4872 in tpm_tis_transfer (s=0x5654e7dec570, req=0x5654e7b42410 <tpm_pcrread> "\200\001", req_size=21, rsp=0x7ffd2295ca00 '\377' <repeats 200 times>..., rsp_size=1024) at ../tests/qtest/tpm-tis-util.c:490
#1 0x00005654e7ac1879 in tpm_util_pcrread (s=0x5654e7dec570, tx=0x5654e7ac467a <tpm_tis_transfer>, exp_resp=0x5654e7b42620 <tpm_pcrread_resp> "\200\001", exp_resp_size=63) at ../tests/qtest/tpm-util.c:96
#2 0xffffffffffffffff in ?? ()
...

(gdb) l
485         sts = qtest_readl(s, TIS_REG(0, TPM_TIS_REG_STS));
486         bcount = (sts >> 8) & 0xffff;
487
488         memset(rsp, 0, rsp_size);
489         for (i = 0; i < bcount; i++) {
490             rsp[i] = qtest_readb(s, TIS_REG(0, TPM_TIS_REG_DATA_FIFO));
491         }
492
493         /* relinquish use of locality 0 */
494         qtest_writeb(s, TIS_REG(0, TPM_TIS_REG_ACCESS),
(gdb) p rsp_size
$5 = 1024
(gdb) p bcount
$6 = 65535

Thanks!


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: tpm-tis-device-swtpm-test timeout
  2024-10-15 18:11 tpm-tis-device-swtpm-test timeout Fabiano Rosas
@ 2024-10-15 18:58 ` Stefan Berger
  2024-10-15 19:22   ` Fabiano Rosas
  0 siblings, 1 reply; 10+ messages in thread
From: Stefan Berger @ 2024-10-15 18:58 UTC (permalink / raw)
  To: Fabiano Rosas, qemu-devel



On 10/15/24 2:11 PM, Fabiano Rosas wrote:
> Hi Stefan,
> 
> I see the tpm-tis-device-swtpm test timing out, could you take a look?
> 
> qemu:qtest+qtest-aarch64 / qtest-aarch64/tpm-tis-device-swtpm-test time out (After 60.0 seconds)
> 135/138 qemu:qtest+qtest-aarch64 /
> qtest-aarch64/tpm-tis-device-swtpm-test TIMEOUT 60.01s killed by signal 11 SIGSEGV

Is this something new or been happening for a while? Does it happen 
consistently? I just build the master branch and ran the tests on a very 
old machine, I mean 13 years old. The host runs Fedora 40 with 
libtpms-0.96-6 + swtpm-0.9.0 from the distro:

104/473 qemu:qtest+qtest-aarch64 / 
qtest-aarch64/tpm-tis-device-swtpm-test           OK               2.32s 
   2 subtests passed

I see some other non-TPM tests timing out but not this one. Same result 
for TIS test on 2nd run.

104/473 qemu:qtest+qtest-aarch64 / 
qtest-aarch64/tpm-tis-device-swtpm-test           OK               2.51s 
   2 subtests passed

3rd run:

104/473 qemu:qtest+qtest-aarch64 / 
qtest-aarch64/tpm-tis-device-swtpm-test           OK               2.37s 
   2 subtests passed

4th run:

104/473 qemu:qtest+qtest-aarch64 / 
qtest-aarch64/tpm-tis-device-swtpm-test           OK               2.63s 
   2 subtests passed



If I run it as a single test on the command line like this then it 
consistently works here - I ran it >10 times now.

$ QTEST_QEMU_BINARY=build/qemu-system-aarch64 
./build/tests/qtest/tpm-tis-device-swtpm-test
TAP version 13
# random seed: R02S8155b3cf024c2696e895e5f548396138
1..2
# Start of aarch64 tests
# Start of tpm tests
# Start of tis-swtpm tests
# starting QEMU: exec build/qemu-system-aarch64 -qtest 
unix:/tmp/qtest-3738308.sock -qtest-log /dev/null -chardev 
socket,path=/tmp/qtest-3738308.qmp,id=char0 -mon 
chardev=char0,mode=control -display none -audio none -machine 
virt,gic-version=max -accel tcg -chardev 
socket,id=chr,path=/tmp/qemu-tpm-tis-device-swtpm-test.YEUOV2/sock 
-tpmdev emulator,id=dev,chardev=chr -device tpm-tis-device,tpmdev=dev 
-accel qtest
ok 1 /aarch64/tpm/tis-swtpm/test
# slow test /aarch64/tpm/tis-swtpm/test executed in 0.66 secs
# End of tis-swtpm tests
# Start of tis-swtpm-migration tests
# starting QEMU: exec build/qemu-system-aarch64 -qtest 
unix:/tmp/qtest-3738308.sock -qtest-log /dev/null -chardev 
socket,path=/tmp/qtest-3738308.qmp,id=char0 -mon 
chardev=char0,mode=control -display none -audio none -machine 
virt,gic-version=max -accel tcg -chardev 
socket,id=chr,path=/tmp/qemu-tpm-tis-device-swtpm-test.YEUOV2/sock 
-tpmdev emulator,id=dev,chardev=chr -device tpm-tis-device,tpmdev=dev 
-accel qtest
# starting QEMU: exec build/qemu-system-aarch64 -qtest 
unix:/tmp/qtest-3738308.sock -qtest-log /dev/null -chardev 
socket,path=/tmp/qtest-3738308.qmp,id=char0 -mon 
chardev=char0,mode=control -display none -audio none -machine 
virt,gic-version=max -accel tcg -chardev 
socket,id=chr,path=/tmp/qemu-tpm-tis-device-swtpm-test.V9TOV2/sock 
-tpmdev emulator,id=dev,chardev=chr -device tpm-tis-device,tpmdev=dev 
-incoming unix:/tmp/qemu-tpm-tis-device-swtpm-test.YEUOV2/migsocket 
-accel qtest
ok 2 /aarch64/tpm/tis-swtpm-migration/test
# slow test /aarch64/tpm/tis-swtpm-migration/test executed in 1.10 secs
# End of tis-swtpm-migration tests
# End of tpm tests
# End of aarch64 tests




> 
> (gdb) bt
> #0 0x00005654e7ac4872 in tpm_tis_transfer (s=0x5654e7dec570, req=0x5654e7b42410 <tpm_pcrread> "\200\001", req_size=21, rsp=0x7ffd2295ca00 '\377' <repeats 200 times>..., rsp_size=1024) at ../tests/qtest/tpm-tis-util.c:490
> #1 0x00005654e7ac1879 in tpm_util_pcrread (s=0x5654e7dec570, tx=0x5654e7ac467a <tpm_tis_transfer>, exp_resp=0x5654e7b42620 <tpm_pcrread_resp> "\200\001", exp_resp_size=63) at ../tests/qtest/tpm-util.c:96
> #2 0xffffffffffffffff in ?? ()
> ...
> 
> (gdb) l
> 485         sts = qtest_readl(s, TIS_REG(0, TPM_TIS_REG_STS));
> 486         bcount = (sts >> 8) & 0xffff;
> 487
> 488         memset(rsp, 0, rsp_size);
> 489         for (i = 0; i < bcount; i++) {
> 490             rsp[i] = qtest_readb(s, TIS_REG(0, TPM_TIS_REG_DATA_FIFO));
> 491         }
> 492
> 493         /* relinquish use of locality 0 */
> 494         qtest_writeb(s, TIS_REG(0, TPM_TIS_REG_ACCESS),
> (gdb) p rsp_size
> $5 = 1024
> (gdb) p bcount
> $6 = 65535
> 
> Thanks!
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: tpm-tis-device-swtpm-test timeout
  2024-10-15 18:58 ` Stefan Berger
@ 2024-10-15 19:22   ` Fabiano Rosas
  2024-10-15 19:32     ` Stefan Berger
  0 siblings, 1 reply; 10+ messages in thread
From: Fabiano Rosas @ 2024-10-15 19:22 UTC (permalink / raw)
  To: Stefan Berger, qemu-devel

Stefan Berger <stefanb@linux.ibm.com> writes:

> On 10/15/24 2:11 PM, Fabiano Rosas wrote:
>> Hi Stefan,
>> 
>> I see the tpm-tis-device-swtpm test timing out, could you take a look?
>> 
>> qemu:qtest+qtest-aarch64 / qtest-aarch64/tpm-tis-device-swtpm-test time out (After 60.0 seconds)
>> 135/138 qemu:qtest+qtest-aarch64 /
>> qtest-aarch64/tpm-tis-device-swtpm-test TIMEOUT 60.01s killed by signal 11 SIGSEGV
>
> Is this something new or been happening for a while? Does it happen 
> consistently? I just build the master branch and ran the tests on a very 
> old machine, I mean 13 years old. The host runs Fedora 40 with 
> libtpms-0.96-6 + swtpm-0.9.0 from the distro:

It's the first time I see it. I can reproduce consistently by running
that test in a loop while make -j16 check is running in another window.

I'm on openSUSE Leap 15.5:

$ rpm -qa | grep libtpms
libtpms0-0.8.2-150300.3.9.1.x86_64

$ rpm -qa | grep swtpm
swtpm-0.7.3-150500.2.1.x86_64


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: tpm-tis-device-swtpm-test timeout
  2024-10-15 19:22   ` Fabiano Rosas
@ 2024-10-15 19:32     ` Stefan Berger
  2024-10-15 19:57       ` Fabiano Rosas
  0 siblings, 1 reply; 10+ messages in thread
From: Stefan Berger @ 2024-10-15 19:32 UTC (permalink / raw)
  To: Fabiano Rosas, qemu-devel



On 10/15/24 3:22 PM, Fabiano Rosas wrote:
> Stefan Berger <stefanb@linux.ibm.com> writes:
> 
>> On 10/15/24 2:11 PM, Fabiano Rosas wrote:
>>> Hi Stefan,
>>>
>>> I see the tpm-tis-device-swtpm test timing out, could you take a look?
>>>
>>> qemu:qtest+qtest-aarch64 / qtest-aarch64/tpm-tis-device-swtpm-test time out (After 60.0 seconds)
>>> 135/138 qemu:qtest+qtest-aarch64 /
>>> qtest-aarch64/tpm-tis-device-swtpm-test TIMEOUT 60.01s killed by signal 11 SIGSEGV
>>
>> Is this something new or been happening for a while? Does it happen
>> consistently? I just build the master branch and ran the tests on a very
>> old machine, I mean 13 years old. The host runs Fedora 40 with
>> libtpms-0.96-6 + swtpm-0.9.0 from the distro:
> 
> It's the first time I see it. I can reproduce consistently by running
> that test in a loop while make -j16 check is running in another window.

So this here is failing for you every time?

QTEST_QEMU_BINARY=build/qemu-system-aarch64 
./build/tests/qtest/tpm-tis-device-swtpm-test

This test would start swtpm. Do you see it running while this test is 
running? It should show process like this running for a very short time 
obviously  [ while :; do ps aux | grep -v grep | grep swtpm; done ]:

stefanb  4087462  0.0  0.0  11040  3732 pts/6    S+   15:30   0:00 swtpm 
socket --tpmstate dir=/tmp/qemu-tpm-tis-device-swtpm-test.D3SLV2 --ctrl 
type=unixio,path=/tmp/qemu-tpm-tis-device-swtpm-test.D3SLV2/sock --tpm2
stefanb  4087467  0.0  0.0  11040  3660 pts/6    S+   15:30   0:00 swtpm 
socket --tpmstate dir=/tmp/qemu-tpm-tis-device-swtpm-test.H1SLV2 --ctrl 
type=unixio,path=/tmp/qemu-tpm-tis-device-swtpm-test.H1SLV2/sock --tpm2


> 
> I'm on openSUSE Leap 15.5:
> 
> $ rpm -qa | grep libtpms
> libtpms0-0.8.2-150300.3.9.1.x86_64
> 
> $ rpm -qa | grep swtpm
> swtpm-0.7.3-150500.2.1.x86_64
> 



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: tpm-tis-device-swtpm-test timeout
  2024-10-15 19:32     ` Stefan Berger
@ 2024-10-15 19:57       ` Fabiano Rosas
  2024-10-15 20:57         ` Stefan Berger
  0 siblings, 1 reply; 10+ messages in thread
From: Fabiano Rosas @ 2024-10-15 19:57 UTC (permalink / raw)
  To: Stefan Berger, qemu-devel

Stefan Berger <stefanb@linux.ibm.com> writes:

> On 10/15/24 3:22 PM, Fabiano Rosas wrote:
>> Stefan Berger <stefanb@linux.ibm.com> writes:
>> 
>>> On 10/15/24 2:11 PM, Fabiano Rosas wrote:
>>>> Hi Stefan,
>>>>
>>>> I see the tpm-tis-device-swtpm test timing out, could you take a look?
>>>>
>>>> qemu:qtest+qtest-aarch64 / qtest-aarch64/tpm-tis-device-swtpm-test time out (After 60.0 seconds)
>>>> 135/138 qemu:qtest+qtest-aarch64 /
>>>> qtest-aarch64/tpm-tis-device-swtpm-test TIMEOUT 60.01s killed by signal 11 SIGSEGV
>>>
>>> Is this something new or been happening for a while? Does it happen
>>> consistently? I just build the master branch and ran the tests on a very
>>> old machine, I mean 13 years old. The host runs Fedora 40 with
>>> libtpms-0.96-6 + swtpm-0.9.0 from the distro:
>> 
>> It's the first time I see it. I can reproduce consistently by running
>> that test in a loop while make -j16 check is running in another window.
>
> So this here is failing for you every time?
>
> QTEST_QEMU_BINARY=build/qemu-system-aarch64 
> ./build/tests/qtest/tpm-tis-device-swtpm-test

Sorry, I was unclear. No, that runs for about 30 iterations before it
fails. I just ran each of these in a terminal window:

$ for i in $(seq 1 999); do echo "$i =============";  QTEST_QEMU_BINARY=./qemu-system-aarch64 ./tests/qtest/tpm-tis-device-swtpm-test || break ; done
$ make -j$(nproc) check
$ while :; do ps aux | grep -v grep | grep swtpm; done

The result is:

$ for i in $(seq 1 999); do echo "$i =============";
QTEST_QEMU_BINARY=./qemu-system-aarch64
./tests/qtest/tpm-tis-device-swtpm-test || break ; done
...
...
22 =============                               
# random seed: R02Se1d542a529b7d9bfdfdcaf9d5359cbb9                                            
1..2                                           
# Start of aarch64 tests                       
# Start of tpm tests                           
# Start of tis-swtpm tests                     
# starting QEMU: exec ./qemu-system-aarch64 -qtest unix:/tmp/qtest-14870.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-14870.qmp,id=char0 -mon chardev=char0,mode=control -display 
none -audio none -machine virt,gic-version=max -accel tcg -chardev socket,id=chr,path=/tmp/qemu-tpm-tis-device-swtpm-test.VWOBV2/sock -tpmdev emulator,id=dev,chardev=chr -device tpm-tis-devic
e,tpmdev=dev -accel qtest
ok 1 /aarch64/tpm/tis-swtpm/test               
# End of tis-swtpm tests                       
# Start of tis-swtpm-migration tests                                                           
# starting QEMU: exec ./qemu-system-aarch64 -qtest unix:/tmp/qtest-14870.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-14870.qmp,id=char0 -mon chardev=char0,mode=control -display 
none -audio none -machine virt,gic-version=max -accel tcg -chardev socket,id=chr,path=/tmp/qemu-tpm-tis-device-swtpm-test.VWOBV2/sock -tpmdev emulator,id=dev,chardev=chr -device tpm-tis-devic
e,tpmdev=dev  -accel qtest
# starting QEMU: exec ./qemu-system-aarch64 -qtest unix:/tmp/qtest-14870.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-14870.qmp,id=char0 -mon chardev=char0,mode=control -display 
none -audio none -machine virt,gic-version=max -accel tcg -chardev socket,id=chr,path=/tmp/qemu-tpm-tis-device-swtpm-test.QXOBV2/sock -tpmdev emulator,id=dev,chardev=chr -device tpm-tis-devic
e,tpmdev=dev -incoming unix:/tmp/qemu-tpm-tis-device-swtpm-test.VWOBV2/migsocket -accel qtest
Segmentation fault (core dumped)               
$ swtpm: Data client disconnected
swtpm: Data client disconnected

>
> This test would start swtpm. Do you see it running while this test is 
> running? It should show process like this running for a very short time 
> obviously  [ while :; do ps aux | grep -v grep | grep swtpm; done ]:

I see several of:

fabiano 30146 0.0 0.0 18056 5564 pts/9 S 15:11 0:00 swtpm socket
--tpmstate dir=/tmp/qemu-tpm-tis-device-swtpm-test.DQFLV2 --ctrl
type=unixio,path=/tmp/qemu-tpm-tis-device-swtpm-test.DQFLV2/sock --tpm2

> stefanb  4087462  0.0  0.0  11040  3732 pts/6    S+   15:30   0:00 swtpm 
> socket --tpmstate dir=/tmp/qemu-tpm-tis-device-swtpm-test.D3SLV2 --ctrl 
> type=unixio,path=/tmp/qemu-tpm-tis-device-swtpm-test.D3SLV2/sock --tpm2
> stefanb  4087467  0.0  0.0  11040  3660 pts/6    S+   15:30   0:00 swtpm 
> socket --tpmstate dir=/tmp/qemu-tpm-tis-device-swtpm-test.H1SLV2 --ctrl 
> type=unixio,path=/tmp/qemu-tpm-tis-device-swtpm-test.H1SLV2/sock --tpm2
>
>
>> 
>> I'm on openSUSE Leap 15.5:
>> 
>> $ rpm -qa | grep libtpms
>> libtpms0-0.8.2-150300.3.9.1.x86_64
>> 
>> $ rpm -qa | grep swtpm
>> swtpm-0.7.3-150500.2.1.x86_64
>> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: tpm-tis-device-swtpm-test timeout
  2024-10-15 19:57       ` Fabiano Rosas
@ 2024-10-15 20:57         ` Stefan Berger
  2024-10-15 22:02           ` Fabiano Rosas
  0 siblings, 1 reply; 10+ messages in thread
From: Stefan Berger @ 2024-10-15 20:57 UTC (permalink / raw)
  To: Fabiano Rosas, qemu-devel



On 10/15/24 3:57 PM, Fabiano Rosas wrote:
> Stefan Berger <stefanb@linux.ibm.com> writes:
> 

>>
>> So this here is failing for you every time?
>>
>> QTEST_QEMU_BINARY=build/qemu-system-aarch64
>> ./build/tests/qtest/tpm-tis-device-swtpm-test
> 
> Sorry, I was unclear. No, that runs for about 30 iterations before it
> fails. I just ran each of these in a terminal window:
> 
> $ for i in $(seq 1 999); do echo "$i =============";  QTEST_QEMU_BINARY=./qemu-system-aarch64 ./tests/qtest/tpm-tis-device-swtpm-test || break ; done

On my Fedora 40 host this command line here alone has been running for 
250 loop iterations now and is still continuing.

> $ make -j$(nproc) check

So this needs to be run in parallel to the above command line to cause 
the failure?

> $ while :; do ps aux | grep -v grep | grep swtpm; done


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: tpm-tis-device-swtpm-test timeout
  2024-10-15 20:57         ` Stefan Berger
@ 2024-10-15 22:02           ` Fabiano Rosas
  2024-10-15 23:35             ` Stefan Berger
  0 siblings, 1 reply; 10+ messages in thread
From: Fabiano Rosas @ 2024-10-15 22:02 UTC (permalink / raw)
  To: Stefan Berger, qemu-devel

Stefan Berger <stefanb@linux.ibm.com> writes:

> On 10/15/24 3:57 PM, Fabiano Rosas wrote:
>> Stefan Berger <stefanb@linux.ibm.com> writes:
>> 
>
>>>
>>> So this here is failing for you every time?
>>>
>>> QTEST_QEMU_BINARY=build/qemu-system-aarch64
>>> ./build/tests/qtest/tpm-tis-device-swtpm-test
>> 
>> Sorry, I was unclear. No, that runs for about 30 iterations before it
>> fails. I just ran each of these in a terminal window:
>> 
>> $ for i in $(seq 1 999); do echo "$i =============";  QTEST_QEMU_BINARY=./qemu-system-aarch64 ./tests/qtest/tpm-tis-device-swtpm-test || break ; done
>
> On my Fedora 40 host this command line here alone has been running for 
> 250 loop iterations now and is still continuing.
>
>> $ make -j$(nproc) check
>
> So this needs to be run in parallel to the above command line to cause 
> the failure?
>

Yes, I've been using that method to reproduce live migration race
conditions as well. It's quite effective.

If you don't think you'll be able to find the root cause due to the
unreproducibility on your side, maybe we could at least add an assert
that bcount is not larger than rsp_size. I think that would at least
give an explicit error instead of a buffer overflow.

I can also try to dig deeper into this when I get some time. At the
moment I know nothing about the tpm device emulation.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: tpm-tis-device-swtpm-test timeout
  2024-10-15 22:02           ` Fabiano Rosas
@ 2024-10-15 23:35             ` Stefan Berger
  2024-10-16  1:52               ` Stefan Berger
  2024-10-16 14:03               ` Fabiano Rosas
  0 siblings, 2 replies; 10+ messages in thread
From: Stefan Berger @ 2024-10-15 23:35 UTC (permalink / raw)
  To: Fabiano Rosas, qemu-devel



On 10/15/24 6:02 PM, Fabiano Rosas wrote:
> Stefan Berger <stefanb@linux.ibm.com> writes:
> 
>> On 10/15/24 3:57 PM, Fabiano Rosas wrote:
>>> Stefan Berger <stefanb@linux.ibm.com> writes:
>>>
>>
>>>>
>>>> So this here is failing for you every time?
>>>>
>>>> QTEST_QEMU_BINARY=build/qemu-system-aarch64
>>>> ./build/tests/qtest/tpm-tis-device-swtpm-test
>>>
>>> Sorry, I was unclear. No, that runs for about 30 iterations before it
>>> fails. I just ran each of these in a terminal window:
>>>
>>> $ for i in $(seq 1 999); do echo "$i =============";  QTEST_QEMU_BINARY=./qemu-system-aarch64 ./tests/qtest/tpm-tis-device-swtpm-test || break ; done
>>
>> On my Fedora 40 host this command line here alone has been running for
>> 250 loop iterations now and is still continuing.
>>
>>> $ make -j$(nproc) check
>>
>> So this needs to be run in parallel to the above command line to cause
>> the failure?
>>
> 
> Yes, I've been using that method to reproduce live migration race
> conditions as well. It's quite effective.
> 
> If you don't think you'll be able to find the root cause due to the
> unreproducibility on your side, maybe we could at least add an assert
> that bcount is not larger than rsp_size. I think that would at least
> give an explicit error instead of a buffer overflow.
> 
> I can also try to dig deeper into this when I get some time. At the
> moment I know nothing about the tpm device emulation.
> 

The loop has run 3000 times by itself so that part is stable. However, 
it seems there is some other test case that the loop cannot run in 
parallel with. So, yes there is 'something'. ... ... Just having all 
CPUs in a system busy requires waiting for migration to be complete on 
the dst_qemu side as well. Can you try it with this patch:

diff --git a/tests/qtest/tpm-tests.c b/tests/qtest/tpm-tests.c
index fb94496bbd..b52cd44841 100644
--- a/tests/qtest/tpm-tests.c
+++ b/tests/qtest/tpm-tests.c
@@ -115,6 +115,7 @@ void tpm_test_swtpm_migration_test(const char 
*src_tpm_path,

      tpm_util_migrate(src_qemu, uri);
      tpm_util_wait_for_migration_complete(src_qemu);
+    tpm_util_wait_for_migration_complete(dst_qemu);

      tpm_util_pcrread(dst_qemu, tx, tpm_pcrread_resp,
                       sizeof(tpm_pcrread_resp));


For me this fixes the issue I had seen where reading the STS register 
was done too early before all the TPM TIS state was completely restored. 
The active locality was -1 and STS return 0xffffffff and from then on 
things went bad.



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: tpm-tis-device-swtpm-test timeout
  2024-10-15 23:35             ` Stefan Berger
@ 2024-10-16  1:52               ` Stefan Berger
  2024-10-16 14:03               ` Fabiano Rosas
  1 sibling, 0 replies; 10+ messages in thread
From: Stefan Berger @ 2024-10-16  1:52 UTC (permalink / raw)
  To: Fabiano Rosas, qemu-devel



On 10/15/24 7:35 PM, Stefan Berger wrote:
> 
> 
> On 10/15/24 6:02 PM, Fabiano Rosas wrote:
>> Stefan Berger <stefanb@linux.ibm.com> writes:
>>

>>>
>>
>> Yes, I've been using that method to reproduce live migration race
>> conditions as well. It's quite effective.
>>
>> If you don't think you'll be able to find the root cause due to the
>> unreproducibility on your side, maybe we could at least add an assert
>> that bcount is not larger than rsp_size. I think that would at least
>> give an explicit error instead of a buffer overflow.
>>
>> I can also try to dig deeper into this when I get some time. At the
>> moment I know nothing about the tpm device emulation.
>>
> 
> The loop has run 3000 times by itself so that part is stable. However, 
> it seems there is some other test case that the loop cannot run in 
> parallel with. So, yes there is 'something'. ... ... Just having all 
> CPUs in a system busy requires waiting for migration to be complete on 
> the dst_qemu side as well. Can you try it with this patch:
> 
> diff --git a/tests/qtest/tpm-tests.c b/tests/qtest/tpm-tests.c
> index fb94496bbd..b52cd44841 100644
> --- a/tests/qtest/tpm-tests.c
> +++ b/tests/qtest/tpm-tests.c
> @@ -115,6 +115,7 @@ void tpm_test_swtpm_migration_test(const char 
> *src_tpm_path,
> 
>       tpm_util_migrate(src_qemu, uri);
>       tpm_util_wait_for_migration_complete(src_qemu);
> +    tpm_util_wait_for_migration_complete(dst_qemu);
> 
>       tpm_util_pcrread(dst_qemu, tx, tpm_pcrread_resp,
>                        sizeof(tpm_pcrread_resp));
> 

Only waiting on the destination side is also possible:

diff --git a/tests/qtest/tpm-tests.c b/tests/qtest/tpm-tests.c
index fb94496bbd..197714f8d9 100644
--- a/tests/qtest/tpm-tests.c
+++ b/tests/qtest/tpm-tests.c
@@ -114,7 +114,7 @@ void tpm_test_swtpm_migration_test(const char 
*src_tpm_path,
                       sizeof(tpm_pcrread_resp));

      tpm_util_migrate(src_qemu, uri);
-    tpm_util_wait_for_migration_complete(src_qemu);
+    tpm_util_wait_for_migration_complete(dst_qemu);

      tpm_util_pcrread(dst_qemu, tx, tpm_pcrread_resp,
                       sizeof(tpm_pcrread_resp));



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: tpm-tis-device-swtpm-test timeout
  2024-10-15 23:35             ` Stefan Berger
  2024-10-16  1:52               ` Stefan Berger
@ 2024-10-16 14:03               ` Fabiano Rosas
  1 sibling, 0 replies; 10+ messages in thread
From: Fabiano Rosas @ 2024-10-16 14:03 UTC (permalink / raw)
  To: Stefan Berger, qemu-devel

Stefan Berger <stefanb@linux.ibm.com> writes:

> On 10/15/24 6:02 PM, Fabiano Rosas wrote:
>> Stefan Berger <stefanb@linux.ibm.com> writes:
>> 
>>> On 10/15/24 3:57 PM, Fabiano Rosas wrote:
>>>> Stefan Berger <stefanb@linux.ibm.com> writes:
>>>>
>>>
>>>>>
>>>>> So this here is failing for you every time?
>>>>>
>>>>> QTEST_QEMU_BINARY=build/qemu-system-aarch64
>>>>> ./build/tests/qtest/tpm-tis-device-swtpm-test
>>>>
>>>> Sorry, I was unclear. No, that runs for about 30 iterations before it
>>>> fails. I just ran each of these in a terminal window:
>>>>
>>>> $ for i in $(seq 1 999); do echo "$i =============";  QTEST_QEMU_BINARY=./qemu-system-aarch64 ./tests/qtest/tpm-tis-device-swtpm-test || break ; done
>>>
>>> On my Fedora 40 host this command line here alone has been running for
>>> 250 loop iterations now and is still continuing.
>>>
>>>> $ make -j$(nproc) check
>>>
>>> So this needs to be run in parallel to the above command line to cause
>>> the failure?
>>>
>> 
>> Yes, I've been using that method to reproduce live migration race
>> conditions as well. It's quite effective.
>> 
>> If you don't think you'll be able to find the root cause due to the
>> unreproducibility on your side, maybe we could at least add an assert
>> that bcount is not larger than rsp_size. I think that would at least
>> give an explicit error instead of a buffer overflow.
>> 
>> I can also try to dig deeper into this when I get some time. At the
>> moment I know nothing about the tpm device emulation.
>> 
>
> The loop has run 3000 times by itself so that part is stable. However, 
> it seems there is some other test case that the loop cannot run in 
> parallel with. So, yes there is 'something'. ... ... Just having all 
> CPUs in a system busy requires waiting for migration to be complete on 
> the dst_qemu side as well. Can you try it with this patch:
>
> diff --git a/tests/qtest/tpm-tests.c b/tests/qtest/tpm-tests.c
> index fb94496bbd..b52cd44841 100644
> --- a/tests/qtest/tpm-tests.c
> +++ b/tests/qtest/tpm-tests.c
> @@ -115,6 +115,7 @@ void tpm_test_swtpm_migration_test(const char 
> *src_tpm_path,
>
>       tpm_util_migrate(src_qemu, uri);
>       tpm_util_wait_for_migration_complete(src_qemu);
> +    tpm_util_wait_for_migration_complete(dst_qemu);
>
>       tpm_util_pcrread(dst_qemu, tx, tpm_pcrread_resp,
>                        sizeof(tpm_pcrread_resp));
>
>
> For me this fixes the issue I had seen where reading the STS register 
> was done too early before all the TPM TIS state was completely restored. 
> The active locality was -1 and STS return 0xffffffff and from then on 
> things went bad.

Thanks, that fixes the issue. Could you send a patch please?


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-10-16 14:03 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-15 18:11 tpm-tis-device-swtpm-test timeout Fabiano Rosas
2024-10-15 18:58 ` Stefan Berger
2024-10-15 19:22   ` Fabiano Rosas
2024-10-15 19:32     ` Stefan Berger
2024-10-15 19:57       ` Fabiano Rosas
2024-10-15 20:57         ` Stefan Berger
2024-10-15 22:02           ` Fabiano Rosas
2024-10-15 23:35             ` Stefan Berger
2024-10-16  1:52               ` Stefan Berger
2024-10-16 14:03               ` Fabiano Rosas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).