From: Thomas Huth <thuth@redhat.com>
To: "Daniel P. Berrangé" <berrange@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>, Hanna Reitz <hreitz@redhat.com>,
Eric Blake <eblake@redhat.com>,
Qemu-block <qemu-block@nongnu.org>,
QEMU Developers <qemu-devel@nongnu.org>,
Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: Problem with iotest 233
Date: Wed, 26 Feb 2025 07:40:48 +0100 [thread overview]
Message-ID: <4dbc253d-fd6c-4feb-9597-81afbfa8d670@redhat.com> (raw)
In-Reply-To: <7a81b491-d982-4a5b-b250-f7b772f64eb4@redhat.com>
On 25/02/2025 22.00, Thomas Huth wrote:
> On 25/02/2025 21.35, Thomas Huth wrote:
>> On 25/02/2025 18.57, Daniel P. Berrangé wrote:
>>> On Tue, Feb 25, 2025 at 06:52:43PM +0100, Thomas Huth wrote:
>>>> On 25/02/2025 18.44, Thomas Huth wrote:
>>>>> On 25/02/2025 11.12, Kevin Wolf wrote:
>>>>>> Am 25.02.2025 um 08:20 hat Thomas Huth geschrieben:
>>>>>>>
>>>>>>> Hi!
>>>>>>>
>>>>>>> I'm facing a weird hang in iotest 233 on my Fedora 41 laptop. When
>>>>>>> running
>>>>>>>
>>>>>>> ./check -raw 233
>>>>>>>
>>>>>>> the test simply hangs. Looking at the log, the last message is "== check
>>>>>>> plain client to TLS server fails ==". I added some debug messages,
>>>>>>> and it
>>>>>>> seems like the previous NBD server is not correctly terminated here.
>>>>>>> The test works fine again if I apply this patch:
>>>>>>>
>>>>>>> diff --git a/tests/qemu-iotests/common.nbd b/tests/qemu-iotests/
>>>>>>> common.nbd
>>>>>>> --- a/tests/qemu-iotests/common.nbd
>>>>>>> +++ b/tests/qemu-iotests/common.nbd
>>>>>>> @@ -35,7 +35,7 @@ nbd_server_stop()
>>>>>>> read NBD_PID < "$nbd_pid_file"
>>>>>>> rm -f "$nbd_pid_file"
>>>>>>> if [ -n "$NBD_PID" ]; then
>>>>>>> - kill "$NBD_PID"
>>>>>>> + kill -9 "$NBD_PID"
>>>>>>> fi
>>>>>>> fi
>>>>>>> rm -f "$nbd_unix_socket" "$nbd_stderr_fifo"
>>>>>>>
>>>>>>> ... but that does not look like the right solution to me. What could
>>>>>>> prevent
>>>>>>> the qemu-nbd from correctly shutting down when it receives a normal
>>>>>>> SIGTERM
>>>>>>> signal?
>>>>>>
>>>>>> Not sure. In theory, qemu_system_killed() should set state = TERMINATE
>>>>>> and make main_loop_wait() return through the notification, which should
>>>>>> then make it shut down. Maybe you can attach gdb and check what 'state'
>>>>>> is when it hangs and if it's still in the main loop?
>>>>>
>>>>> I attached a gdb and ran "bt", and it looks like it is hanging in an
>>>>> exit() handler:
>>>>>
>>>>> (gdb) bt
>>>>> #0 0x00007f127f8fff1d in syscall () from /lib64/libc.so.6
>>>>> #1 0x00007f127fd32e1d in g_cond_wait () from /lib64/libglib-2.0.so.0
>>>>> #2 0x00005583df3048b2 in flush_trace_file (wait=true) at
>>>>> ../../devel/qemu/ trace/simple.c:140
>>>>> #3 st_flush_trace_buffer () at ../../devel/qemu/trace/simple.c:383
>>>>> #4 0x00007f127f8296c1 in __run_exit_handlers () from /lib64/libc.so.6
>>>>> #5 0x00007f127f82978e in exit () from /lib64/libc.so.6
>>>>> #6 0x00005583df1ae9e1 in main (argc=<optimized out>, argv=<optimized
>>>>> out>) at ../../devel/qemu/qemu-nbd.c:1242
>>>>
>>>> Ah, now that I wrote that: I recently ran "configure" with
>>>> --enable-trace-backends=simple ... when I remove that from "config.status"
>>>> again, then the test works fine again 8-)
>>>>
>>>> Still, I think it should not hang with the simple trace backend here,
>>>> should it?
>>>
>>> IIUC this is waiting on trace_empty_cond.
>>>
>>> This condition should be signalled from wait_for_trace_records_available
>>> which is in turn called from writeout_thread.
>>>
>>> This thread is started from st_init, which is called from
>>> trace_init_backends
>>> which should be called from qemu-nbd. I would expect this thread to still
>>> be running when exit() handlers are run.
>>>
>>> Does GDB show any other threads running at the time of this hang ?
>>
>> There is indeed a second thread running:
>>
>> (gdb) thread apply all bt
>>
>> Thread 2 (Thread 0x7f657096b6c0 (LWP 1117884) "qemu-nbd"):
>> #0 0x00007f6573419f1d in syscall () from /lib64/libc.so.6
>> #1 0x0000562bbad9b783 in qemu_futex_wait (f=0x562bbaed25d8
>> <rcu_call_ready_event>, val=4294967295) at ../../devel/qemu/include/qemu/
>> futex.h:29
>> #2 0x0000562bbad9b9af in qemu_event_wait (ev=0x562bbaed25d8
>> <rcu_call_ready_event>) at ../../devel/qemu/util/qemu-thread-posix.c:465
>> #3 0x0000562bbada86a6 in call_rcu_thread (opaque=0x0) at ../../devel/
>> qemu/ util/rcu.c:278
>> #4 0x0000562bbad9bba3 in qemu_thread_start (args=0x562bf958a5c0)
>> at ../../ devel/qemu/util/qemu-thread-posix.c:542
>> #5 0x00007f6573398168 in start_thread () from /lib64/libc.so.6
>> #6 0x00007f657341c14c in __clone3 () from /lib64/libc.so.6
>
> Though, that does not look like the thread from the simpletrace, but the the
> QEMU RCU thread instead ... so no clue where that writer thread might have
> gone...
OK, I think I now understood the problem: qemu-nbd is calling
trace_init_backends() first, which creates the simpletrace threads and
installs the atexit() handler. Then it is calling fork() since the test uses
the --fork command line option. But fork() does not clone the simpletrace
thread into the new process, only the main thread (see man-page of fork, the
new process starts single-threaded). So when the new child process exits,
the exit handler calls the simple trace flush function which tries to wait
for a thread that has never been created in that process.
The test works when I move the trace_init_backends() behind the fork() in
the main function... but I am not sure if we would miss some logs this way,
so I don't know whether that's the right solution. Could a qemu-nbd expert
please have a look at this?
Thanks,
Thomas
next prev parent reply other threads:[~2025-02-26 6:41 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-25 7:20 Problem with iotest 233 Thomas Huth
2025-02-25 10:12 ` Kevin Wolf
2025-02-25 17:44 ` Thomas Huth
2025-02-25 17:52 ` Thomas Huth
2025-02-25 17:57 ` Daniel P. Berrangé
2025-02-25 20:35 ` Thomas Huth
2025-02-25 21:00 ` Thomas Huth
2025-02-26 6:40 ` Thomas Huth [this message]
2025-02-26 8:55 ` Thomas Huth
2025-02-27 19:18 ` Eric Blake
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4dbc253d-fd6c-4feb-9597-81afbfa8d670@redhat.com \
--to=thuth@redhat.com \
--cc=berrange@redhat.com \
--cc=eblake@redhat.com \
--cc=hreitz@redhat.com \
--cc=kwolf@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).