All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Daniel P. Berrangé" <berrange@redhat.com>
To: Thomas Huth <thuth@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>, Hanna Reitz <hreitz@redhat.com>,
	Eric Blake <eblake@redhat.com>,
	Qemu-block <qemu-block@nongnu.org>,
	QEMU Developers <qemu-devel@nongnu.org>,
	Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: Problem with iotest 233
Date: Tue, 25 Feb 2025 17:57:51 +0000	[thread overview]
Message-ID: <Z74En98KD0v11X8w@redhat.com> (raw)
In-Reply-To: <574cdf2e-6b8c-4ff3-9a2b-a7d00c92a788@redhat.com>

On Tue, Feb 25, 2025 at 06:52:43PM +0100, Thomas Huth wrote:
> On 25/02/2025 18.44, Thomas Huth wrote:
> > On 25/02/2025 11.12, Kevin Wolf wrote:
> > > Am 25.02.2025 um 08:20 hat Thomas Huth geschrieben:
> > > > 
> > > >   Hi!
> > > > 
> > > > I'm facing a weird hang in iotest 233 on my Fedora 41 laptop. When running
> > > > 
> > > >   ./check -raw 233
> > > > 
> > > > the test simply hangs. Looking at the log, the last message is "== check
> > > > plain client to TLS server fails ==". I added some debug messages, and it
> > > > seems like the previous NBD server is not correctly terminated here.
> > > > The test works fine again if I apply this patch:
> > > > 
> > > > diff --git a/tests/qemu-iotests/common.nbd b/tests/qemu-iotests/common.nbd
> > > > --- a/tests/qemu-iotests/common.nbd
> > > > +++ b/tests/qemu-iotests/common.nbd
> > > > @@ -35,7 +35,7 @@ nbd_server_stop()
> > > >           read NBD_PID < "$nbd_pid_file"
> > > >           rm -f "$nbd_pid_file"
> > > >           if [ -n "$NBD_PID" ]; then
> > > > -            kill "$NBD_PID"
> > > > +            kill -9 "$NBD_PID"
> > > >           fi
> > > >       fi
> > > >       rm -f "$nbd_unix_socket" "$nbd_stderr_fifo"
> > > > 
> > > > ... but that does not look like the right solution to me. What could prevent
> > > > the qemu-nbd from correctly shutting down when it receives a normal SIGTERM
> > > > signal?
> > > 
> > > Not sure. In theory, qemu_system_killed() should set state = TERMINATE
> > > and make main_loop_wait() return through the notification, which should
> > > then make it shut down. Maybe you can attach gdb and check what 'state'
> > > is when it hangs and if it's still in the main loop?
> > 
> > I attached a gdb and ran "bt", and it looks like it is hanging in an
> > exit() handler:
> > 
> > (gdb) bt
> > #0  0x00007f127f8fff1d in syscall () from /lib64/libc.so.6
> > #1  0x00007f127fd32e1d in g_cond_wait () from /lib64/libglib-2.0.so.0
> > #2  0x00005583df3048b2 in flush_trace_file (wait=true) at
> > ../../devel/qemu/ trace/simple.c:140
> > #3  st_flush_trace_buffer () at ../../devel/qemu/trace/simple.c:383
> > #4  0x00007f127f8296c1 in __run_exit_handlers () from /lib64/libc.so.6
> > #5  0x00007f127f82978e in exit () from /lib64/libc.so.6
> > #6  0x00005583df1ae9e1 in main (argc=<optimized out>, argv=<optimized
> > out>) at ../../devel/qemu/qemu-nbd.c:1242
> 
> Ah, now that I wrote that: I recently ran "configure" with
> --enable-trace-backends=simple ... when I remove that from "config.status"
> again, then the test works fine again 8-)
> 
> Still, I think it should not hang with the simple trace backend here, should it?

IIUC this is waiting on trace_empty_cond.

This condition should be signalled from wait_for_trace_records_available
which is in turn called from writeout_thread.

This thread is started from st_init, which is called from trace_init_backends
which should be called from qemu-nbd. I would expect this thread to still
be running when exit() handlers are run.

Does GDB show any other threads running at the time of this hang ?


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



  reply	other threads:[~2025-02-25 17:58 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-25  7:20 Problem with iotest 233 Thomas Huth
2025-02-25 10:12 ` Kevin Wolf
2025-02-25 17:44   ` Thomas Huth
2025-02-25 17:52     ` Thomas Huth
2025-02-25 17:57       ` Daniel P. Berrangé [this message]
2025-02-25 20:35         ` Thomas Huth
2025-02-25 21:00           ` Thomas Huth
2025-02-26  6:40             ` Thomas Huth
2025-02-26  8:55               ` Thomas Huth
2025-02-27 19:18                 ` Eric Blake

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z74En98KD0v11X8w@redhat.com \
    --to=berrange@redhat.com \
    --cc=eblake@redhat.com \
    --cc=hreitz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.