* [Qemu-devel] [PULL for-3.0 0/1] Tracing patches @ 2018-07-24 14:25 Stefan Hajnoczi 2018-07-24 14:25 ` [Qemu-devel] [PULL for-3.0 1/1] trace/simple: fix hang in child after fork(2) Stefan Hajnoczi 2018-07-24 15:13 ` [Qemu-devel] [PULL for-3.0 0/1] Tracing patches Eric Blake 0 siblings, 2 replies; 8+ messages in thread From: Stefan Hajnoczi @ 2018-07-24 14:25 UTC (permalink / raw) To: qemu-devel; +Cc: Stefan Hajnoczi, Peter Maydell The following changes since commit 768cef2974fb1fa30dd188b043ea737e13fea477: Merge remote-tracking branch 'remotes/ehabkost/tags/x86-next-pull-request' into staging (2018-07-24 10:37:52 +0100) are available in the Git repository at: git://github.com/stefanha/qemu.git tags/tracing-pull-request for you to fetch changes up to b6ad6a528f2356b357bd1d210048dd7988dc9a7b: trace/simple: fix hang in child after fork(2) (2018-07-24 14:27:51 +0100) ---------------------------------------------------------------- Pull request Fix qemu-iotests 147 when QEMU was built with ./configure --enable-trace-backend=simple. ---------------------------------------------------------------- Stefan Hajnoczi (1): trace/simple: fix hang in child after fork(2) trace/simple.c | 80 ++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 71 insertions(+), 9 deletions(-) -- 2.17.1 ^ permalink raw reply [flat|nested] 8+ messages in thread
* [Qemu-devel] [PULL for-3.0 1/1] trace/simple: fix hang in child after fork(2) 2018-07-24 14:25 [Qemu-devel] [PULL for-3.0 0/1] Tracing patches Stefan Hajnoczi @ 2018-07-24 14:25 ` Stefan Hajnoczi 2018-07-24 14:35 ` Daniel P. Berrangé 2018-07-24 15:16 ` Eric Blake 2018-07-24 15:13 ` [Qemu-devel] [PULL for-3.0 0/1] Tracing patches Eric Blake 1 sibling, 2 replies; 8+ messages in thread From: Stefan Hajnoczi @ 2018-07-24 14:25 UTC (permalink / raw) To: qemu-devel; +Cc: Stefan Hajnoczi, Peter Maydell The simple trace backend spawns a write-out thread which is used to asynchronously flush the in-memory ring buffer to disk. fork(2) does not clone all threads, only the thread that invoked fork(2). As a result there is no write-out thread in the child process! This causes a hang during shutdown when atexit(3) handler installed by the simple trace backend waits for the non-existent write-out thread. This patch uses pthread_atfork(3) to terminate the write-out thread before fork and restart it in both the parent and child after fork. This solves a hang in qemu-iotests 147 due to qemu-nbd --fork usage. Reported-by: Cornelia Huck <cohuck@redhat.com> Tested-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20180717101944.11691-1-stefanha@redhat.com Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> --- trace/simple.c | 80 ++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 71 insertions(+), 9 deletions(-) diff --git a/trace/simple.c b/trace/simple.c index 701dec639c..a4300b6ff1 100644 --- a/trace/simple.c +++ b/trace/simple.c @@ -39,9 +39,11 @@ static GMutex trace_lock; static GCond trace_available_cond; static GCond trace_empty_cond; +static GThread *trace_writeout_thread; static bool trace_available; static bool trace_writeout_enabled; +static bool trace_writeout_running; enum { TRACE_BUF_LEN = 4096 * 64, @@ -142,15 +144,34 @@ static void flush_trace_file(bool wait) g_mutex_unlock(&trace_lock); } -static void wait_for_trace_records_available(void) +/** + * Wait to be kicked by flush_trace_file() + * + * Returns: true if the writeout thread should continue + * false if the writeout thread should terminate + */ +static bool wait_for_trace_records_available(void) { + bool running; + g_mutex_lock(&trace_lock); - while (!(trace_available && trace_writeout_enabled)) { + for (;;) { + running = trace_writeout_running; + if (!running) { + break; + } + + if (trace_available && trace_writeout_enabled) { + break; + } + g_cond_signal(&trace_empty_cond); g_cond_wait(&trace_available_cond, &trace_lock); } trace_available = false; g_mutex_unlock(&trace_lock); + + return running; } static gpointer writeout_thread(gpointer opaque) @@ -165,9 +186,7 @@ static gpointer writeout_thread(gpointer opaque) size_t unused __attribute__ ((unused)); uint64_t type = TRACE_RECORD_TYPE_EVENT; - for (;;) { - wait_for_trace_records_available(); - + while (wait_for_trace_records_available()) { if (g_atomic_int_get(&dropped_events)) { dropped.rec.event = DROPPED_EVENT_ID, dropped.rec.timestamp_ns = get_clock(); @@ -398,18 +417,61 @@ static GThread *trace_thread_create(GThreadFunc fn) return thread; } +#ifndef _WIN32 +static void stop_writeout_thread(void) +{ + g_mutex_lock(&trace_lock); + trace_writeout_running = false; + g_cond_signal(&trace_available_cond); + g_mutex_unlock(&trace_lock); + + g_thread_join(trace_writeout_thread); + trace_writeout_thread = NULL; + + /* Hold trace_lock across fork! Since threads aren't cloned by fork() the + * mutex would be held in the child process and cause a deadlock. + * Acquiring the mutex here prevents other threads from being in a + * trace_lock critical region when fork() occurs. + */ + g_mutex_lock(&trace_lock); +} + +static void restart_writeout_thread(void) +{ + trace_writeout_running = true; + trace_writeout_thread = trace_thread_create(writeout_thread); + if (!trace_writeout_thread) { + warn_report("unable to initialize simple trace backend"); + } + + /* This relies on undefined behavior in the fork() child (it's fine in the + * fork() parent). g_mutex_unlock() on a mutex acquired by another thread + * is undefined (see glib documentation). + */ + g_mutex_unlock(&trace_lock); +} +#endif /* !_WIN32 */ + bool st_init(void) { - GThread *thread; - trace_pid = getpid(); + trace_writeout_running = true; - thread = trace_thread_create(writeout_thread); - if (!thread) { + trace_writeout_thread = trace_thread_create(writeout_thread); + if (!trace_writeout_thread) { warn_report("unable to initialize simple trace backend"); return false; } +#ifndef _WIN32 + /* Terminate writeout thread across fork and restart it in parent and + * child afterwards. + */ + pthread_atfork(stop_writeout_thread, + restart_writeout_thread, + restart_writeout_thread); +#endif + atexit(st_flush_trace_buffer); return true; } -- 2.17.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PULL for-3.0 1/1] trace/simple: fix hang in child after fork(2) 2018-07-24 14:25 ` [Qemu-devel] [PULL for-3.0 1/1] trace/simple: fix hang in child after fork(2) Stefan Hajnoczi @ 2018-07-24 14:35 ` Daniel P. Berrangé 2018-07-24 14:41 ` Daniel P. Berrangé 2018-07-24 15:16 ` Eric Blake 1 sibling, 1 reply; 8+ messages in thread From: Daniel P. Berrangé @ 2018-07-24 14:35 UTC (permalink / raw) To: Stefan Hajnoczi; +Cc: qemu-devel, Peter Maydell On Tue, Jul 24, 2018 at 03:25:04PM +0100, Stefan Hajnoczi wrote: > The simple trace backend spawns a write-out thread which is used to > asynchronously flush the in-memory ring buffer to disk. > > fork(2) does not clone all threads, only the thread that invoked > fork(2). As a result there is no write-out thread in the child process! > > This causes a hang during shutdown when atexit(3) handler installed by > the simple trace backend waits for the non-existent write-out thread. > > This patch uses pthread_atfork(3) to terminate the write-out thread > before fork and restart it in both the parent and child after fork. > This solves a hang in qemu-iotests 147 due to qemu-nbd --fork usage. I'm not convinced this is safe, as it looks like it has a window in which both the parent and child processes will be doing write-out to the same file. In particular in the main QEMU system emulators it means that any time we fork() in QEMU, eg for spawning commands with migration exec: URI, or TAP devuce ifup scripts, etc, we'll be starting a write-out thread in the child. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PULL for-3.0 1/1] trace/simple: fix hang in child after fork(2) 2018-07-24 14:35 ` Daniel P. Berrangé @ 2018-07-24 14:41 ` Daniel P. Berrangé 2018-07-26 14:18 ` Stefan Hajnoczi 0 siblings, 1 reply; 8+ messages in thread From: Daniel P. Berrangé @ 2018-07-24 14:41 UTC (permalink / raw) To: Stefan Hajnoczi; +Cc: Peter Maydell, qemu-devel On Tue, Jul 24, 2018 at 03:35:51PM +0100, Daniel P. Berrangé wrote: > On Tue, Jul 24, 2018 at 03:25:04PM +0100, Stefan Hajnoczi wrote: > > The simple trace backend spawns a write-out thread which is used to > > asynchronously flush the in-memory ring buffer to disk. > > > > fork(2) does not clone all threads, only the thread that invoked > > fork(2). As a result there is no write-out thread in the child process! > > > > This causes a hang during shutdown when atexit(3) handler installed by > > the simple trace backend waits for the non-existent write-out thread. > > > > This patch uses pthread_atfork(3) to terminate the write-out thread > > before fork and restart it in both the parent and child after fork. > > This solves a hang in qemu-iotests 147 due to qemu-nbd --fork usage. > > I'm not convinced this is safe, as it looks like it has a window in > which both the parent and child processes will be doing write-out to > the same file. > > In particular in the main QEMU system emulators it means that any > time we fork() in QEMU, eg for spawning commands with migration > exec: URI, or TAP devuce ifup scripts, etc, we'll be starting a > write-out thread in the child. I'd be more inclined to have the pthread_atfork() handle simply terminate the tracing process, reversing all effects of trace_init_backends(). Then after qemu-nbd has called fork(), it can simply call trace_init_backends() explicitly to start it running again. This avoids unecessarily starting tracing in child processes that are not requiring/expecting it. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PULL for-3.0 1/1] trace/simple: fix hang in child after fork(2) 2018-07-24 14:41 ` Daniel P. Berrangé @ 2018-07-26 14:18 ` Stefan Hajnoczi 0 siblings, 0 replies; 8+ messages in thread From: Stefan Hajnoczi @ 2018-07-26 14:18 UTC (permalink / raw) To: Daniel P. Berrangé; +Cc: Peter Maydell, qemu-devel [-- Attachment #1: Type: text/plain, Size: 1740 bytes --] On Tue, Jul 24, 2018 at 03:41:01PM +0100, Daniel P. Berrangé wrote: > On Tue, Jul 24, 2018 at 03:35:51PM +0100, Daniel P. Berrangé wrote: > > On Tue, Jul 24, 2018 at 03:25:04PM +0100, Stefan Hajnoczi wrote: > > > The simple trace backend spawns a write-out thread which is used to > > > asynchronously flush the in-memory ring buffer to disk. > > > > > > fork(2) does not clone all threads, only the thread that invoked > > > fork(2). As a result there is no write-out thread in the child process! > > > > > > This causes a hang during shutdown when atexit(3) handler installed by > > > the simple trace backend waits for the non-existent write-out thread. > > > > > > This patch uses pthread_atfork(3) to terminate the write-out thread > > > before fork and restart it in both the parent and child after fork. > > > This solves a hang in qemu-iotests 147 due to qemu-nbd --fork usage. > > > > I'm not convinced this is safe, as it looks like it has a window in > > which both the parent and child processes will be doing write-out to > > the same file. > > > > In particular in the main QEMU system emulators it means that any > > time we fork() in QEMU, eg for spawning commands with migration > > exec: URI, or TAP devuce ifup scripts, etc, we'll be starting a > > write-out thread in the child. > > I'd be more inclined to have the pthread_atfork() handle simply terminate > the tracing process, reversing all effects of trace_init_backends(). Then > after qemu-nbd has called fork(), it can simply call trace_init_backends() > explicitly to start it running again. This avoids unecessarily starting > tracing in child processes that are not requiring/expecting it. Good point. NACK Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 455 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PULL for-3.0 1/1] trace/simple: fix hang in child after fork(2) 2018-07-24 14:25 ` [Qemu-devel] [PULL for-3.0 1/1] trace/simple: fix hang in child after fork(2) Stefan Hajnoczi 2018-07-24 14:35 ` Daniel P. Berrangé @ 2018-07-24 15:16 ` Eric Blake 1 sibling, 0 replies; 8+ messages in thread From: Eric Blake @ 2018-07-24 15:16 UTC (permalink / raw) To: Stefan Hajnoczi, qemu-devel; +Cc: Peter Maydell On 07/24/2018 09:25 AM, Stefan Hajnoczi wrote: > The simple trace backend spawns a write-out thread which is used to > asynchronously flush the in-memory ring buffer to disk. > > fork(2) does not clone all threads, only the thread that invoked > fork(2). As a result there is no write-out thread in the child process! > > This causes a hang during shutdown when atexit(3) handler installed by > the simple trace backend waits for the non-existent write-out thread. > > This patch uses pthread_atfork(3) to terminate the write-out thread > before fork and restart it in both the parent and child after fork. > This solves a hang in qemu-iotests 147 due to qemu-nbd --fork usage. > > Reported-by: Cornelia Huck <cohuck@redhat.com> > Tested-by: Cornelia Huck <cohuck@redhat.com> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> > Message-id: 20180717101944.11691-1-stefanha@redhat.com > Suggested-by: Paolo Bonzini <pbonzini@redhat.com> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> > --- > +static void restart_writeout_thread(void) > +{ > + trace_writeout_running = true; > + trace_writeout_thread = trace_thread_create(writeout_thread); > + if (!trace_writeout_thread) { > + warn_report("unable to initialize simple trace backend"); > + } > + > + /* This relies on undefined behavior in the fork() child (it's fine in the > + * fork() parent). g_mutex_unlock() on a mutex acquired by another thread > + * is undefined (see glib documentation). > + */ > + g_mutex_unlock(&trace_lock); Dan's point about stopping tracing prior to fork, then restarting it from scratch in both the parent and in specific children, would also get rid of this risky non-portable behavior of trying to manipulate a mutex acquired by the parent process' thread. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PULL for-3.0 0/1] Tracing patches 2018-07-24 14:25 [Qemu-devel] [PULL for-3.0 0/1] Tracing patches Stefan Hajnoczi 2018-07-24 14:25 ` [Qemu-devel] [PULL for-3.0 1/1] trace/simple: fix hang in child after fork(2) Stefan Hajnoczi @ 2018-07-24 15:13 ` Eric Blake 2018-07-24 19:54 ` Peter Maydell 1 sibling, 1 reply; 8+ messages in thread From: Eric Blake @ 2018-07-24 15:13 UTC (permalink / raw) To: Stefan Hajnoczi, qemu-devel; +Cc: Peter Maydell On 07/24/2018 09:25 AM, Stefan Hajnoczi wrote: > The following changes since commit 768cef2974fb1fa30dd188b043ea737e13fea477: > > Merge remote-tracking branch 'remotes/ehabkost/tags/x86-next-pull-request' into staging (2018-07-24 10:37:52 +0100) > > are available in the Git repository at: > > git://github.com/stefanha/qemu.git tags/tracing-pull-request > > for you to fetch changes up to b6ad6a528f2356b357bd1d210048dd7988dc9a7b: > > trace/simple: fix hang in child after fork(2) (2018-07-24 14:27:51 +0100) > > ---------------------------------------------------------------- > Pull request > > Fix qemu-iotests 147 when QEMU was built with ./configure --enable-trace-backend=simple. Design conversation ongoing under patch 1/1 - replying here to make sure this doesn't get applied if we aren't ready for it yet. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PULL for-3.0 0/1] Tracing patches 2018-07-24 15:13 ` [Qemu-devel] [PULL for-3.0 0/1] Tracing patches Eric Blake @ 2018-07-24 19:54 ` Peter Maydell 0 siblings, 0 replies; 8+ messages in thread From: Peter Maydell @ 2018-07-24 19:54 UTC (permalink / raw) To: Eric Blake; +Cc: Stefan Hajnoczi, QEMU Developers On 24 July 2018 at 16:13, Eric Blake <eblake@redhat.com> wrote: > On 07/24/2018 09:25 AM, Stefan Hajnoczi wrote: >> >> The following changes since commit >> 768cef2974fb1fa30dd188b043ea737e13fea477: >> >> Merge remote-tracking branch >> 'remotes/ehabkost/tags/x86-next-pull-request' into staging (2018-07-24 >> 10:37:52 +0100) >> >> are available in the Git repository at: >> >> git://github.com/stefanha/qemu.git tags/tracing-pull-request >> >> for you to fetch changes up to b6ad6a528f2356b357bd1d210048dd7988dc9a7b: >> >> trace/simple: fix hang in child after fork(2) (2018-07-24 14:27:51 >> +0100) >> >> ---------------------------------------------------------------- >> Pull request >> >> Fix qemu-iotests 147 when QEMU was built with ./configure >> --enable-trace-backend=simple. > > > Design conversation ongoing under patch 1/1 - replying here to make sure > this doesn't get applied if we aren't ready for it yet. OK; I'll drop this pullreq from my queue. Pleas ping me again if the outcome is that it should be applied. thanks -- PMM ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2018-07-26 14:18 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-07-24 14:25 [Qemu-devel] [PULL for-3.0 0/1] Tracing patches Stefan Hajnoczi 2018-07-24 14:25 ` [Qemu-devel] [PULL for-3.0 1/1] trace/simple: fix hang in child after fork(2) Stefan Hajnoczi 2018-07-24 14:35 ` Daniel P. Berrangé 2018-07-24 14:41 ` Daniel P. Berrangé 2018-07-26 14:18 ` Stefan Hajnoczi 2018-07-24 15:16 ` Eric Blake 2018-07-24 15:13 ` [Qemu-devel] [PULL for-3.0 0/1] Tracing patches Eric Blake 2018-07-24 19:54 ` Peter Maydell
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).