From: Steven Sistare <steven.sistare@oracle.com>
To: "Daniel P. Berrangé" <berrange@redhat.com>
Cc: qemu-devel@nongnu.org, Markus Armbruster <armbru@redhat.com>,
Michael Roth <michael.roth@amd.com>,
"Dr. David Alan Gilbert" <dave@treblig.org>
Subject: Re: [PATCH] monitor: flush messages on abort
Date: Fri, 3 Nov 2023 16:28:56 -0400 [thread overview]
Message-ID: <2e365128-2b02-4de0-976b-b4a573ae2d97@oracle.com> (raw)
In-Reply-To: <3d45ebc0-de9f-4051-9c08-47e40fea65da@oracle.com>
On 11/3/2023 3:51 PM, Steven Sistare wrote:
> On 11/3/2023 1:33 PM, Daniel P. Berrangé wrote:
>> On Fri, Nov 03, 2023 at 09:01:29AM -0700, Steve Sistare wrote:
>>> Buffered monitor output is lost when abort() is called. The pattern
>>> error_report() followed by abort() occurs about 60 times, so valuable
>>> information is being lost when the abort is called in the context of a
>>> monitor command.
>>
>> I'm curious, was there a particular abort() scenario that you hit ?
>
> Yes, while tweaking the suspended state, and forgetting to add transitions:
>
> error_report("invalid runstate transition: '%s' -> '%s'",
> abort();
>
> But I have previously hit this for other errors.
>
>> For some crude statistics:
>>
>> $ for i in abort return exit goto ; do echo -n "$i: " ; git grep --after 1 error_report | grep $i | wc -l ; done
>> abort: 47
>> return: 512
>> exit: 458
>> goto: 177
>>
>> to me those numbers say that calling "abort()" after error_report
>> should be considered a bug, and we can blanket replace all the
>> abort() calls with exit(EXIT_FAILURE), and thus avoid the need to
>> special case flushing the monitor.
>
> And presumably add an atexit handler to flush the monitor ala monitor_abort.
> AFAICT currently no destructor is called for the monitor at exit time.
>
>> Also I think there's a decent case to be made for error_report()
>> to call monitor_flush().
>
> A good start, but that would not help for monitors with skip_flush=true, which
> need to format the buffered string in a json response, which is the case I
> tripped over.
>
>>> To fix, install a SIGABRT handler to flush the monitor buffer to stderr.
>>>
>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>>> ---
>>> monitor/monitor.c | 38 ++++++++++++++++++++++++++++++++++++++
>>> 1 file changed, 38 insertions(+)
>>>
>>> diff --git a/monitor/monitor.c b/monitor/monitor.c
>>> index dc352f9..65dace0 100644
>>> --- a/monitor/monitor.c
>>> +++ b/monitor/monitor.c
>>> @@ -701,6 +701,43 @@ void monitor_cleanup(void)
>>> }
>>> }
>>>
>>> +#ifdef CONFIG_LINUX
>>> +
>>> +static void monitor_abort(int signal, siginfo_t *info, void *c)
>>> +{
>>> + Monitor *mon = monitor_cur();
>>> +
>>> + if (!mon || qemu_mutex_trylock(&mon->mon_lock)) {
>>> + return;
>>> + }
>>> +
>>> + if (mon->outbuf && mon->outbuf->len) {
>>> + fputs("SIGABRT received: ", stderr);
>>> + fputs(mon->outbuf->str, stderr);
>>> + if (mon->outbuf->str[mon->outbuf->len - 1] != '\n') {
>>> + fputc('\n', stderr);
>>> + }
>>> + }
>>> +
>>> + qemu_mutex_unlock(&mon->mon_lock);
>>
>> The SIGABRT handling does not only fire in response to abort()
>> calls, but also in response to bad memory scenarios, so we have
>> to be careful what we do in signal handlers.
>>
>> In particular using mutexes in signal handlers is a big red
>> flag generally. Mutex APIs are not declare async signal
>> safe, so this code is technically a POSIX compliance
>> violation.
>
> Righto. I would need to mask all signals in the sigaction to be on the safe(r) side.
>
>> So I think we'd be safer just eliminating the explicit abort()
>> calls and adding monitor_flush call to error_report.
>
> I like adding a handler because it is future proof. No need to play whack-a-mole when
> developers re-introduce abort() calls in the future. A minor benefit is I would not
> need ack's from 50 maintainers to change 50 call sites from abort to exit.
>
> A slight risk of the exit solution is that something bad happened at the call site, so
> qemu state can no longer be trusted. Calling abort immediately may be safer than calling
> exit which will call the existing atexit handlers and could have side effects.
>
> A third option is to define qemu_abort() which flushes the monitor, and replaces all abort
> calls. That avoids async-signal-mutex hand wringing, but is still subject to whack-a-mole.
>
> So: atexit, signal handler, or qemu_abort? I will go with your preference.
If I go with signal handler, I would also add atexit to flush the existing call sites of
error_report() + exit().
One more tidbit: the signal handler would print pending messages if any of the 11000 assert()
calls fails, though having a message present is less likely in this case.
- Steve
>>> +}
>>> +
>>> +static void monitor_add_abort_handler(void)
>>> +{
>>> + struct sigaction act;
>>> +
>>> + memset(&act, 0, sizeof(act));
>>> + act.sa_sigaction = monitor_abort;
>>> + act.sa_flags = SA_SIGINFO;
>>> + sigaction(SIGABRT, &act, NULL);
>>> +}
>>> +
>>> +#else
>>> +
>>> +static void monitor_add_abort_handler(void) {}
>>> +
>>> +#endif
>>> +
>>> static void monitor_qapi_event_init(void)
>>> {
>>> monitor_qapi_event_state = g_hash_table_new(qapi_event_throttle_hash,
>>> @@ -712,6 +749,7 @@ void monitor_init_globals(void)
>>> monitor_qapi_event_init();
>>> qemu_mutex_init(&monitor_lock);
>>> coroutine_mon = g_hash_table_new(NULL, NULL);
>>> + monitor_add_abort_handler();
>>>
>>> /*
>>> * The dispatcher BH must run in the main loop thread, since we
>>> --
>>> 1.8.3.1
>>>
>>>
>>
>> With regards,
>> Daniel
next prev parent reply other threads:[~2023-11-03 20:30 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-03 16:01 [PATCH] monitor: flush messages on abort Steve Sistare
2023-11-03 17:33 ` Daniel P. Berrangé
2023-11-03 19:51 ` Steven Sistare
2023-11-03 20:28 ` Steven Sistare [this message]
2023-11-06 10:10 ` Daniel P. Berrangé
2023-11-15 8:41 ` Markus Armbruster
2023-11-15 15:52 ` Steven Sistare
2023-11-15 16:15 ` Markus Armbruster
2023-11-15 17:41 ` Steven Sistare
2023-11-16 15:14 ` Markus Armbruster
2023-11-16 17:03 ` Steven Sistare
2023-11-15 15:30 ` Steven Sistare
2023-11-15 16:05 ` Markus Armbruster
2023-11-16 11:21 ` Daniel P. Berrangé
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2e365128-2b02-4de0-976b-b4a573ae2d97@oracle.com \
--to=steven.sistare@oracle.com \
--cc=armbru@redhat.com \
--cc=berrange@redhat.com \
--cc=dave@treblig.org \
--cc=michael.roth@amd.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).