From: Steven Sistare <steven.sistare@oracle.com>
To: "Daniel P. Berrangé" <berrange@redhat.com>
Cc: qemu-devel@nongnu.org, Markus Armbruster <armbru@redhat.com>,
Michael Roth <michael.roth@amd.com>,
"Dr. David Alan Gilbert" <dave@treblig.org>
Subject: Re: [PATCH] monitor: flush messages on abort
Date: Fri, 3 Nov 2023 15:51:00 -0400 [thread overview]
Message-ID: <3d45ebc0-de9f-4051-9c08-47e40fea65da@oracle.com> (raw)
In-Reply-To: <ZUUu2IuUQ/Od7+Vr@redhat.com>
On 11/3/2023 1:33 PM, Daniel P. Berrangé wrote:
> On Fri, Nov 03, 2023 at 09:01:29AM -0700, Steve Sistare wrote:
>> Buffered monitor output is lost when abort() is called. The pattern
>> error_report() followed by abort() occurs about 60 times, so valuable
>> information is being lost when the abort is called in the context of a
>> monitor command.
>
> I'm curious, was there a particular abort() scenario that you hit ?
Yes, while tweaking the suspended state, and forgetting to add transitions:
error_report("invalid runstate transition: '%s' -> '%s'",
abort();
But I have previously hit this for other errors.
> For some crude statistics:
>
> $ for i in abort return exit goto ; do echo -n "$i: " ; git grep --after 1 error_report | grep $i | wc -l ; done
> abort: 47
> return: 512
> exit: 458
> goto: 177
>
> to me those numbers say that calling "abort()" after error_report
> should be considered a bug, and we can blanket replace all the
> abort() calls with exit(EXIT_FAILURE), and thus avoid the need to
> special case flushing the monitor.
And presumably add an atexit handler to flush the monitor ala monitor_abort.
AFAICT currently no destructor is called for the monitor at exit time.
> Also I think there's a decent case to be made for error_report()
> to call monitor_flush().
A good start, but that would not help for monitors with skip_flush=true, which
need to format the buffered string in a json response, which is the case I
tripped over.
>> To fix, install a SIGABRT handler to flush the monitor buffer to stderr.
>>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>> ---
>> monitor/monitor.c | 38 ++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 38 insertions(+)
>>
>> diff --git a/monitor/monitor.c b/monitor/monitor.c
>> index dc352f9..65dace0 100644
>> --- a/monitor/monitor.c
>> +++ b/monitor/monitor.c
>> @@ -701,6 +701,43 @@ void monitor_cleanup(void)
>> }
>> }
>>
>> +#ifdef CONFIG_LINUX
>> +
>> +static void monitor_abort(int signal, siginfo_t *info, void *c)
>> +{
>> + Monitor *mon = monitor_cur();
>> +
>> + if (!mon || qemu_mutex_trylock(&mon->mon_lock)) {
>> + return;
>> + }
>> +
>> + if (mon->outbuf && mon->outbuf->len) {
>> + fputs("SIGABRT received: ", stderr);
>> + fputs(mon->outbuf->str, stderr);
>> + if (mon->outbuf->str[mon->outbuf->len - 1] != '\n') {
>> + fputc('\n', stderr);
>> + }
>> + }
>> +
>> + qemu_mutex_unlock(&mon->mon_lock);
>
> The SIGABRT handling does not only fire in response to abort()
> calls, but also in response to bad memory scenarios, so we have
> to be careful what we do in signal handlers.
>
> In particular using mutexes in signal handlers is a big red
> flag generally. Mutex APIs are not declare async signal
> safe, so this code is technically a POSIX compliance
> violation.
Righto. I would need to mask all signals in the sigaction to be on the safe(r) side.
> So I think we'd be safer just eliminating the explicit abort()
> calls and adding monitor_flush call to error_report.
I like adding a handler because it is future proof. No need to play whack-a-mole when
developers re-introduce abort() calls in the future. A minor benefit is I would not
need ack's from 50 maintainers to change 50 call sites from abort to exit.
A slight risk of the exit solution is that something bad happened at the call site, so
qemu state can no longer be trusted. Calling abort immediately may be safer than calling
exit which will call the existing atexit handlers and could have side effects.
A third option is to define qemu_abort() which flushes the monitor, and replaces all abort
calls. That avoids async-signal-mutex hand wringing, but is still subject to whack-a-mole.
So: atexit, signal handler, or qemu_abort? I will go with your preference.
- Steve
>> +}
>> +
>> +static void monitor_add_abort_handler(void)
>> +{
>> + struct sigaction act;
>> +
>> + memset(&act, 0, sizeof(act));
>> + act.sa_sigaction = monitor_abort;
>> + act.sa_flags = SA_SIGINFO;
>> + sigaction(SIGABRT, &act, NULL);
>> +}
>> +
>> +#else
>> +
>> +static void monitor_add_abort_handler(void) {}
>> +
>> +#endif
>> +
>> static void monitor_qapi_event_init(void)
>> {
>> monitor_qapi_event_state = g_hash_table_new(qapi_event_throttle_hash,
>> @@ -712,6 +749,7 @@ void monitor_init_globals(void)
>> monitor_qapi_event_init();
>> qemu_mutex_init(&monitor_lock);
>> coroutine_mon = g_hash_table_new(NULL, NULL);
>> + monitor_add_abort_handler();
>>
>> /*
>> * The dispatcher BH must run in the main loop thread, since we
>> --
>> 1.8.3.1
>>
>>
>
> With regards,
> Daniel
next prev parent reply other threads:[~2023-11-03 19:52 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-03 16:01 [PATCH] monitor: flush messages on abort Steve Sistare
2023-11-03 17:33 ` Daniel P. Berrangé
2023-11-03 19:51 ` Steven Sistare [this message]
2023-11-03 20:28 ` Steven Sistare
2023-11-06 10:10 ` Daniel P. Berrangé
2023-11-15 8:41 ` Markus Armbruster
2023-11-15 15:52 ` Steven Sistare
2023-11-15 16:15 ` Markus Armbruster
2023-11-15 17:41 ` Steven Sistare
2023-11-16 15:14 ` Markus Armbruster
2023-11-16 17:03 ` Steven Sistare
2023-11-15 15:30 ` Steven Sistare
2023-11-15 16:05 ` Markus Armbruster
2023-11-16 11:21 ` Daniel P. Berrangé
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3d45ebc0-de9f-4051-9c08-47e40fea65da@oracle.com \
--to=steven.sistare@oracle.com \
--cc=armbru@redhat.com \
--cc=berrange@redhat.com \
--cc=dave@treblig.org \
--cc=michael.roth@amd.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).