* netdev-socket test hang (s390 host, mips64el guest, backtrace)
@ 2023-04-13 10:50 Peter Maydell
2023-04-13 12:05 ` Peter Maydell
0 siblings, 1 reply; 5+ messages in thread
From: Peter Maydell @ 2023-04-13 10:50 UTC (permalink / raw)
To: QEMU Developers; +Cc: Laurent Vivier
I just found a hung netdev-socket test on our s390 CI runner.
Looks like a deadlock, no processes using CPU.
Here's the backtrace; looks like both QEMU processes are sat
idle but the test process is sat waiting forever for something
in test_stream_inet_reconnect(). Any ideas?
Process tree:
netdev-socket(3496843)-+-qemu-system-mip(3496956)
`-qemu-system-mip(3496976)
===========================================================
PROCESS: 3496843
gitlab-+ 3496843 3472329 0 Apr10 ? 00:00:00
/home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/build/tests/qtest/netdev-socket
--tap -k
[New LWP 3496844]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/s390x-linux-gnu/libthread_db.so.1".
__libc_recv (fd=fd@entry=3, buf=buf@entry=0x3ffe1cf9f27,
len=len@entry=1, flags=flags@entry=0) at
../sysdeps/unix/sysv/linux/recv.c:30
30 ../sysdeps/unix/sysv/linux/recv.c: No such file or directory.
Thread 2 (Thread 0x3ffa457f900 (LWP 3496844)):
#0 syscall () at ../sysdeps/unix/sysv/linux/s390/s390-64/syscall.S:37
#1 0x000002aa22cccbbc in qemu_futex_wait (val=<optimized out>,
f=<optimized out>) at
/home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/include/qemu/futex.h:29
#2 qemu_event_wait (ev=ev@entry=0x2aa22d3c860 <rcu_call_ready_event>)
at ../util/qemu-thread-posix.c:464
#3 0x000002aa22cf89a2 in call_rcu_thread (opaque=opaque@entry=0x0) at
../util/rcu.c:261
#4 0x000002aa22ccbc22 in qemu_thread_start (args=<optimized out>) at
../util/qemu-thread-posix.c:541
#5 0x000003ffa4807e66 in start_thread (arg=0x3ffa457f900) at
pthread_create.c:477
#6 0x000003ffa46fcbe6 in thread_start () at
../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65
Thread 1 (Thread 0x3ffa4c72770 (LWP 3496843)):
#0 __libc_recv (fd=fd@entry=3, buf=buf@entry=0x3ffe1cf9f27,
len=len@entry=1, flags=flags@entry=0) at
../sysdeps/unix/sysv/linux/recv.c:30
#1 0x000002aa22c9d982 in recv (__flags=0, __n=1, __buf=0x3ffe1cf9f27,
__fd=3) at /usr/include/s390x-linux-gnu/bits/socket2.h:44
#2 qmp_fd_receive (fd=<optimized out>) at ../tests/qtest/libqmp.c:73
#3 0x000002aa22c9baee in qtest_qmp_receive_dict (s=0x2aa232a50d0) at
../tests/qtest/libqtest.c:837
#4 qtest_qmp_eventwait_ref (event=<optimized out>, s=<optimized out>)
at ../tests/qtest/libqtest.c:837
#5 qtest_qmp_eventwait_ref (s=0x2aa232a50d0, event=<optimized out>)
at ../tests/qtest/libqtest.c:828
#6 0x000002aa22c9262c in wait_stream_connected (qts=<optimized out>,
addr=0x3ffe1cfa1b8, id=0x2aa22cfeed6 "st0") at
../tests/qtest/netdev-socket.c:157
#7 0x000002aa22c929b6 in test_stream_inet_reconnect () at
../tests/qtest/netdev-socket.c:229
#8 0x000003ffa49fe608 in ?? () from /lib/s390x-linux-gnu/libglib-2.0.so.0
#9 0x000003ffa49fe392 in ?? () from /lib/s390x-linux-gnu/libglib-2.0.so.0
#10 0x000003ffa49fe392 in ?? () from /lib/s390x-linux-gnu/libglib-2.0.so.0
#11 0x000003ffa49fe392 in ?? () from /lib/s390x-linux-gnu/libglib-2.0.so.0
#12 0x000003ffa49fe392 in ?? () from /lib/s390x-linux-gnu/libglib-2.0.so.0
#13 0x000003ffa49feada in g_test_run_suite () from
/lib/s390x-linux-gnu/libglib-2.0.so.0
#14 0x000003ffa49feb10 in g_test_run () from
/lib/s390x-linux-gnu/libglib-2.0.so.0
#15 0x000002aa22c90678 in main (argc=<optimized out>, argv=<optimized
out>) at ../tests/qtest/netdev-socket.c:543
[Inferior 1 (process 3496843) detached]
===========================================================
PROCESS: 3496956
gitlab-+ 3496956 3496843 0 Apr10 ? 00:00:00
./qemu-system-mips64el -qtest unix:/tmp/qtest-3496843.sock -qtest-log
/dev/null -chardev socket,path=/tmp/qtest-3496843.qmp,id=char0 -mon
chardev=char0,mode=control -display none -nodefaults -M none -netdev
stream,server=false,id=st0,addr.type=inet,addr.ipv4=on,addr.ipv6=off,reconnect=1,addr.host=127.0.0.1,addr.port=50989
-accel qtest
[New LWP 3496965]
[New LWP 3496967]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/s390x-linux-gnu/libthread_db.so.1".
0x000003ff81af1c8c in __ppoll (fds=0x2aa40b08230, nfds=6,
timeout=<optimized out>, timeout@entry=0x0, sigmask=sigmask@entry=0x0)
at ../sysdeps/unix/sysv/linux/ppoll.c:44
44 ../sysdeps/unix/sysv/linux/ppoll.c: No such file or directory.
Thread 3 (Thread 0x3ff71c20900 (LWP 3496967)):
#0 0x000003ff81af1b32 in __GI___poll (fds=0x3ff64003680, nfds=3,
timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:29
#1 0x000003ff842d4386 in () at /lib/s390x-linux-gnu/libglib-2.0.so.0
#2 0x000003ff842d4790 in g_main_loop_run () at
/lib/s390x-linux-gnu/libglib-2.0.so.0
#3 0x000002aa3ea03bbe in iothread_run
(opaque=opaque@entry=0x2aa4096bf00) at ../iothread.c:70
#4 0x000002aa3eb534ca in qemu_thread_start (args=<optimized out>) at
../util/qemu-thread-posix.c:541
#5 0x000003ff81c07e66 in start_thread (arg=0x3ff71c20900) at
pthread_create.c:477
#6 0x000003ff81afcbe6 in thread_start () at
../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65
Thread 2 (Thread 0x3ff72d23900 (LWP 3496965)):
#0 syscall () at ../sysdeps/unix/sysv/linux/s390/s390-64/syscall.S:37
#1 0x000002aa3eb54464 in qemu_futex_wait (val=<optimized out>,
f=<optimized out>) at
/home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/include/qemu/futex.h:29
#2 qemu_event_wait (ev=ev@entry=0x2aa3f1e01c8 <rcu_call_ready_event>)
at ../util/qemu-thread-posix.c:464
#3 0x000002aa3eb5e212 in call_rcu_thread (opaque=opaque@entry=0x0) at
../util/rcu.c:261
#4 0x000002aa3eb534ca in qemu_thread_start (args=<optimized out>) at
../util/qemu-thread-posix.c:541
#5 0x000003ff81c07e66 in start_thread (arg=0x3ff72d23900) at
pthread_create.c:477
#6 0x000003ff81afcbe6 in thread_start () at
../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65
Thread 1 (Thread 0x3ff8547e4a0 (LWP 3496956)):
#0 0x000003ff81af1c8c in __ppoll (fds=0x2aa40b08230, nfds=6,
timeout=<optimized out>, timeout@entry=0x0, sigmask=sigmask@entry=0x0)
at ../sysdeps/unix/sysv/linux/ppoll.c:44
#1 0x000002aa3eb6a486 in ppoll (__ss=0x0, __timeout=0x0,
__nfds=<optimized out>, __fds=<optimized out>) at
/usr/include/s390x-linux-gnu/bits/poll2.h:77
#2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>,
timeout=timeout@entry=-1) at ../util/qemu-timer.c:339
#3 0x000002aa3eb67460 in os_host_main_loop_wait (timeout=-1) at
../util/main-loop.c:308
#4 main_loop_wait (nonblocking=nonblocking@entry=0) at ../util/main-loop.c:592
#5 0x000002aa3e77082c in qemu_main_loop () at ../softmmu/runstate.c:731
#6 0x000002aa3e59d5da in qemu_default_main () at ../softmmu/main.c:37
#7 0x000003ff81a2440a in __libc_start_main (main=0x2aa3e59b0c0
<main>, argc=<optimized out>, argv=0x3fffbdfaf68, init=<optimized
out>, fini=<optimized out>, rtld_fini=0x3ff85490e50 <_dl_fini>,
stack_end=0x3fffbdfaeb0) at libc-start.c:308
#8 0x000002aa3e59d504 in _start () at ../softmmu/main.c:48
[Inferior 1 (process 3496956) detached]
===========================================================
PROCESS: 3496976
gitlab-+ 3496976 3496843 0 Apr10 ? 00:00:00
./qemu-system-mips64el -qtest unix:/tmp/qtest-3496843.sock -qtest-log
/dev/null -chardev socket,path=/tmp/qtest-3496843.qmp,id=char0 -mon
chardev=char0,mode=control -display none -nodefaults -M none -netdev
stream,id=st0,server=true,addr.type=inet,addr.ipv4=on,addr.ipv6=off,addr.host=127.0.0.1,addr.port=50989
-accel qtest
[New LWP 3496981]
[New LWP 3496983]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/s390x-linux-gnu/libthread_db.so.1".
0x000003ff88bf1c8c in __ppoll (fds=0x2aa0d0cbe70, nfds=6,
timeout=<optimized out>, timeout@entry=0x0, sigmask=sigmask@entry=0x0)
at ../sysdeps/unix/sysv/linux/ppoll.c:44
44 ../sysdeps/unix/sysv/linux/ppoll.c: No such file or directory.
Thread 3 (Thread 0x3ff78d20900 (LWP 3496983)):
#0 0x000003ff88bf1b32 in __GI___poll (fds=0x3ff6c001610, nfds=3,
timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:29
#1 0x000003ff8b3d4386 in () at /lib/s390x-linux-gnu/libglib-2.0.so.0
#2 0x000003ff8b3d4790 in g_main_loop_run () at
/lib/s390x-linux-gnu/libglib-2.0.so.0
#3 0x000002aa0bc03bbe in iothread_run
(opaque=opaque@entry=0x2aa0cf4c700) at ../iothread.c:70
#4 0x000002aa0bd534ca in qemu_thread_start (args=<optimized out>) at
../util/qemu-thread-posix.c:541
#5 0x000003ff88d07e66 in start_thread (arg=0x3ff78d20900) at
pthread_create.c:477
#6 0x000003ff88bfcbe6 in thread_start () at
../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65
Thread 2 (Thread 0x3ff79e23900 (LWP 3496981)):
#0 syscall () at ../sysdeps/unix/sysv/linux/s390/s390-64/syscall.S:37
#1 0x000002aa0bd54464 in qemu_futex_wait (val=<optimized out>,
f=<optimized out>) at
/home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/include/qemu/futex.h:29
#2 qemu_event_wait (ev=ev@entry=0x2aa0c3e01c8 <rcu_call_ready_event>)
at ../util/qemu-thread-posix.c:464
#3 0x000002aa0bd5e212 in call_rcu_thread (opaque=opaque@entry=0x0) at
../util/rcu.c:261
#4 0x000002aa0bd534ca in qemu_thread_start (args=<optimized out>) at
../util/qemu-thread-posix.c:541
#5 0x000003ff88d07e66 in start_thread (arg=0x3ff79e23900) at
pthread_create.c:477
#6 0x000003ff88bfcbe6 in thread_start () at
../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65
Thread 1 (Thread 0x3ff8c57e4a0 (LWP 3496976)):
#0 0x000003ff88bf1c8c in __ppoll (fds=0x2aa0d0cbe70, nfds=6,
timeout=<optimized out>, timeout@entry=0x0, sigmask=sigmask@entry=0x0)
at ../sysdeps/unix/sysv/linux/ppoll.c:44
#1 0x000002aa0bd6a486 in ppoll (__ss=0x0, __timeout=0x0,
__nfds=<optimized out>, __fds=<optimized out>) at
/usr/include/s390x-linux-gnu/bits/poll2.h:77
#2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>,
timeout=timeout@entry=-1) at ../util/qemu-timer.c:339
#3 0x000002aa0bd67460 in os_host_main_loop_wait (timeout=-1) at
../util/main-loop.c:308
#4 main_loop_wait (nonblocking=nonblocking@entry=0) at ../util/main-loop.c:592
#5 0x000002aa0b97082c in qemu_main_loop () at ../softmmu/runstate.c:731
#6 0x000002aa0b79d5da in qemu_default_main () at ../softmmu/main.c:37
#7 0x000003ff88b2440a in __libc_start_main (main=0x2aa0b79b0c0
<main>, argc=<optimized out>, argv=0x3ffe28fa838, init=<optimized
out>, fini=<optimized out>, rtld_fini=0x3ff8c590e50 <_dl_fini>,
stack_end=0x3ffe28fa780) at libc-start.c:308
#8 0x000002aa0b79d504 in _start () at ../softmmu/main.c:48
[Inferior 1 (process 3496976) detached]
thanks
-- PMM
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: netdev-socket test hang (s390 host, mips64el guest, backtrace)
2023-04-13 10:50 netdev-socket test hang (s390 host, mips64el guest, backtrace) Peter Maydell
@ 2023-04-13 12:05 ` Peter Maydell
2023-04-17 9:06 ` Laurent Vivier
0 siblings, 1 reply; 5+ messages in thread
From: Peter Maydell @ 2023-04-13 12:05 UTC (permalink / raw)
To: QEMU Developers; +Cc: Laurent Vivier
On Thu, 13 Apr 2023 at 11:50, Peter Maydell <peter.maydell@linaro.org> wrote:
>
> I just found a hung netdev-socket test on our s390 CI runner.
> Looks like a deadlock, no processes using CPU.
> Here's the backtrace; looks like both QEMU processes are sat
> idle but the test process is sat waiting forever for something
> in test_stream_inet_reconnect(). Any ideas?
May well not be related, but I think there's a race condition
in this test's inet_get_free_port() code. The code tries
to find a free port number by creating a socket, looking
at what port it is bound to, and then closing the socket.
If there are several copies of this test running at once
(as is plausible in a 'make -j8' setup), then you can
get an interleaving:
test 1 test 2
find a port number
close the socket
find a port number
(get the same number as test 1)
close the socket
use port number for test
use port number for test
(fail because of test 1)
thanks
-- PMM
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: netdev-socket test hang (s390 host, mips64el guest, backtrace)
2023-04-13 12:05 ` Peter Maydell
@ 2023-04-17 9:06 ` Laurent Vivier
2023-04-17 10:16 ` Alex Bennée
0 siblings, 1 reply; 5+ messages in thread
From: Laurent Vivier @ 2023-04-17 9:06 UTC (permalink / raw)
To: Peter Maydell, QEMU Developers
Hi Peter,
On 4/13/23 14:05, Peter Maydell wrote:
> On Thu, 13 Apr 2023 at 11:50, Peter Maydell <peter.maydell@linaro.org> wrote:
>>
>> I just found a hung netdev-socket test on our s390 CI runner.
>> Looks like a deadlock, no processes using CPU.
>> Here's the backtrace; looks like both QEMU processes are sat
>> idle but the test process is sat waiting forever for something
>> in test_stream_inet_reconnect(). Any ideas?
>
> May well not be related, but I think there's a race condition
> in this test's inet_get_free_port() code. The code tries
> to find a free port number by creating a socket, looking
> at what port it is bound to, and then closing the socket.
> If there are several copies of this test running at once
> (as is plausible in a 'make -j8' setup), then you can
> get an interleaving:
>
> test 1 test 2
> find a port number
> close the socket
> find a port number
> (get the same number as test 1)
> close the socket
> use port number for test
> use port number for test
> (fail because of test 1)
>
I don't see an easy way to avoid to race, but perhaps we can change the test to use unix
socket rather than inet one? In this case we can use an unique name.
Thanks,
Laurent
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: netdev-socket test hang (s390 host, mips64el guest, backtrace)
2023-04-17 9:06 ` Laurent Vivier
@ 2023-04-17 10:16 ` Alex Bennée
2023-04-17 13:02 ` Laurent Vivier
0 siblings, 1 reply; 5+ messages in thread
From: Alex Bennée @ 2023-04-17 10:16 UTC (permalink / raw)
To: Laurent Vivier; +Cc: Peter Maydell, qemu-devel
Laurent Vivier <lvivier@redhat.com> writes:
> Hi Peter,
>
> On 4/13/23 14:05, Peter Maydell wrote:
>> On Thu, 13 Apr 2023 at 11:50, Peter Maydell <peter.maydell@linaro.org> wrote:
>>>
>>> I just found a hung netdev-socket test on our s390 CI runner.
>>> Looks like a deadlock, no processes using CPU.
>>> Here's the backtrace; looks like both QEMU processes are sat
>>> idle but the test process is sat waiting forever for something
>>> in test_stream_inet_reconnect(). Any ideas?
>> May well not be related, but I think there's a race condition
>> in this test's inet_get_free_port() code. The code tries
>> to find a free port number by creating a socket, looking
>> at what port it is bound to, and then closing the socket.
>> If there are several copies of this test running at once
>> (as is plausible in a 'make -j8' setup), then you can
>> get an interleaving:
>> test 1 test 2
>> find a port number
>> close the socket
>> find a port number
>> (get the same number as test 1)
>> close the socket
>> use port number for test
>> use port number for test
>> (fail because of test 1)
>>
>
> I don't see an easy way to avoid to race, but perhaps we can change
> the test to use unix socket rather than inet one? In this case we can
> use an unique name.
We could use a lock file that would stop the test clashing with itself
(although another process on the machine could still race for the
socket). The unix socket would be easier but wouldn't we loose test
coverage or do we not care about the exact details for this test?
>
> Thanks,
> Laurent
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: netdev-socket test hang (s390 host, mips64el guest, backtrace)
2023-04-17 10:16 ` Alex Bennée
@ 2023-04-17 13:02 ` Laurent Vivier
0 siblings, 0 replies; 5+ messages in thread
From: Laurent Vivier @ 2023-04-17 13:02 UTC (permalink / raw)
To: Alex Bennée; +Cc: Peter Maydell, qemu-devel
On 4/17/23 12:16, Alex Bennée wrote:
>
> Laurent Vivier <lvivier@redhat.com> writes:
>
>> Hi Peter,
>>
>> On 4/13/23 14:05, Peter Maydell wrote:
>>> On Thu, 13 Apr 2023 at 11:50, Peter Maydell <peter.maydell@linaro.org> wrote:
>>>>
>>>> I just found a hung netdev-socket test on our s390 CI runner.
>>>> Looks like a deadlock, no processes using CPU.
>>>> Here's the backtrace; looks like both QEMU processes are sat
>>>> idle but the test process is sat waiting forever for something
>>>> in test_stream_inet_reconnect(). Any ideas?
>>> May well not be related, but I think there's a race condition
>>> in this test's inet_get_free_port() code. The code tries
>>> to find a free port number by creating a socket, looking
>>> at what port it is bound to, and then closing the socket.
>>> If there are several copies of this test running at once
>>> (as is plausible in a 'make -j8' setup), then you can
>>> get an interleaving:
>>> test 1 test 2
>>> find a port number
>>> close the socket
>>> find a port number
>>> (get the same number as test 1)
>>> close the socket
>>> use port number for test
>>> use port number for test
>>> (fail because of test 1)
>>>
>>
>> I don't see an easy way to avoid to race, but perhaps we can change
>> the test to use unix socket rather than inet one? In this case we can
>> use an unique name.
>
> We could use a lock file that would stop the test clashing with itself
> (although another process on the machine could still race for the
> socket). The unix socket would be easier but wouldn't we loose test
> coverage or do we not care about the exact details for this test?
According to the backtrace, the problem happens with the reconnect test because we launch
twice the server side to test the client reconnection. I think this test can be moved to
an unix socket without any issue. For the other inet tests, there can be a race but the
window is much more shorter, and we want to test inet, not unix. Even with a lock file,
the port can be taken by another process.
Thanks,
Laurent
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-04-17 13:03 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-04-13 10:50 netdev-socket test hang (s390 host, mips64el guest, backtrace) Peter Maydell
2023-04-13 12:05 ` Peter Maydell
2023-04-17 9:06 ` Laurent Vivier
2023-04-17 10:16 ` Alex Bennée
2023-04-17 13:02 ` Laurent Vivier
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).