qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Daniel P. Berrangé" <berrange@redhat.com>
To: Steven Sistare <steven.sistare@oracle.com>
Cc: qemu-devel@nongnu.org, Markus Armbruster <armbru@redhat.com>,
	Michael Roth <michael.roth@amd.com>,
	"Dr. David Alan Gilbert" <dave@treblig.org>
Subject: Re: [PATCH] monitor: flush messages on abort
Date: Mon, 6 Nov 2023 10:10:19 +0000	[thread overview]
Message-ID: <ZUi7izJoVpU+iiuC@redhat.com> (raw)
In-Reply-To: <3d45ebc0-de9f-4051-9c08-47e40fea65da@oracle.com>

On Fri, Nov 03, 2023 at 03:51:00PM -0400, Steven Sistare wrote:
> On 11/3/2023 1:33 PM, Daniel P. Berrangé wrote:
> > On Fri, Nov 03, 2023 at 09:01:29AM -0700, Steve Sistare wrote:
> >> Buffered monitor output is lost when abort() is called.  The pattern
> >> error_report() followed by abort() occurs about 60 times, so valuable
> >> information is being lost when the abort is called in the context of a
> >> monitor command.
> > 
> > I'm curious, was there a particular abort() scenario that you hit ?
> 
> Yes, while tweaking the suspended state, and forgetting to add transitions:
> 
>         error_report("invalid runstate transition: '%s' -> '%s'",
>         abort();
> 
> But I have previously hit this for other errors.
> 
> > For some crude statistics:
> > 
> >   $ for i in abort return exit goto ; do echo -n "$i: " ; git grep --after 1 error_report | grep $i | wc -l ; done
> >   abort: 47
> >   return: 512
> >   exit: 458
> >   goto: 177
> > 
> > to me those numbers say that calling "abort()" after error_report
> > should be considered a bug, and we can blanket replace all the
> > abort() calls with exit(EXIT_FAILURE), and thus avoid the need to
> > special case flushing the monitor.
> 
> And presumably add an atexit handler to flush the monitor ala monitor_abort.
> AFAICT currently no destructor is called for the monitor at exit time.

The HMP monitor flushes at each newline,  and exit() will take care of
flushing stdout, so I don't think there's anything else needed.

> > Also I think there's a decent case to be made for error_report()
> > to call monitor_flush().
> 
> A good start, but that would not help for monitors with skip_flush=true, which 
> need to format the buffered string in a json response, which is the case I 
> tripped over.

'skip_flush' is only set to 'true' when using a QMP monitor and invoking
"hmp-monitor-command".

In such a case, the error message needs to be built into a JSON error
reply and sent over the socket. Your patch doesn't help this case
since you've just printed to stderr.  I don't think it is reasonable
to expect QMP monitors to send replies on SIG_ABRT anyway. So I don't
think the skip_flush=true scenario is a problem to be concerned with.

> >> To fix, install a SIGABRT handler to flush the monitor buffer to stderr.
> >>
> >> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> >> ---
> >>  monitor/monitor.c | 38 ++++++++++++++++++++++++++++++++++++++
> >>  1 file changed, 38 insertions(+)
> >>
> >> diff --git a/monitor/monitor.c b/monitor/monitor.c
> >> index dc352f9..65dace0 100644
> >> --- a/monitor/monitor.c
> >> +++ b/monitor/monitor.c
> >> @@ -701,6 +701,43 @@ void monitor_cleanup(void)
> >>      }
> >>  }
> >>  
> >> +#ifdef CONFIG_LINUX
> >> +
> >> +static void monitor_abort(int signal, siginfo_t *info, void *c)
> >> +{
> >> +    Monitor *mon = monitor_cur();
> >> +
> >> +    if (!mon || qemu_mutex_trylock(&mon->mon_lock)) {
> >> +        return;
> >> +    }
> >> +
> >> +    if (mon->outbuf && mon->outbuf->len) {
> >> +        fputs("SIGABRT received: ", stderr);
> >> +        fputs(mon->outbuf->str, stderr);
> >> +        if (mon->outbuf->str[mon->outbuf->len - 1] != '\n') {
> >> +            fputc('\n', stderr);
> >> +        }
> >> +    }
> >> +
> >> +    qemu_mutex_unlock(&mon->mon_lock);
> > 
> > The SIGABRT handling does not only fire in response to abort()
> > calls, but also in response to bad memory scenarios, so we have
> > to be careful what we do in signal handlers.
> > 
> > In particular using mutexes in signal handlers is a big red
> > flag generally. Mutex APIs are not declare async signal
> > safe, so this code is technically a POSIX compliance
> > violation.
> 
> Righto.  I would need to mask all signals in the sigaction to be on the safe(r) side.

This is still doomed, because SIGABRT could fire while 'mon_lock' is
already held, and so this code would deadlock trying to acquire the
lock.

> > So I think we'd be safer just eliminating the explicit abort()
> > calls and adding monitor_flush call to error_report.
> 
> I like adding a handler because it is future proof.  No need to play whack-a-mole when
> developers re-introduce abort() calls in the future.  A minor benefit is I would not
> need ack's from 50 maintainers to change 50 call sites from abort to exit.

That's a bit of a crazy exaggeration. THe aborts() don't cover 50 different
subsystems, and we don't require explicit acks from every subsystem maintainer
for trivial cleanups like this.

> A slight risk of the exit solution is that something bad happened at the call site, so 
> qemu state can no longer be trusted.  Calling abort immediately may be safer than calling 
> exit which will call the existing atexit handlers and could have side effects.

If that was a real problem, then we already face it because we have
~500 places already calling exit() and only 50 calling abort().

> A third option is to define qemu_abort() which flushes the monitor, and replaces all abort
> calls.  That avoids async-signal-mutex hand wringing, but is still subject to whack-a-mole.
> 
> So: atexit, signal handler, or qemu_abort?  I will go with your preference.

Just replace abort -> exit.

I'm not seeing a need for an atexit handler on top.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



  parent reply	other threads:[~2023-11-06 10:11 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-03 16:01 [PATCH] monitor: flush messages on abort Steve Sistare
2023-11-03 17:33 ` Daniel P. Berrangé
2023-11-03 19:51   ` Steven Sistare
2023-11-03 20:28     ` Steven Sistare
2023-11-06 10:10     ` Daniel P. Berrangé [this message]
2023-11-15  8:41       ` Markus Armbruster
2023-11-15 15:52         ` Steven Sistare
2023-11-15 16:15           ` Markus Armbruster
2023-11-15 17:41             ` Steven Sistare
2023-11-16 15:14               ` Markus Armbruster
2023-11-16 17:03                 ` Steven Sistare
2023-11-15 15:30       ` Steven Sistare
2023-11-15 16:05         ` Markus Armbruster
2023-11-16 11:21         ` Daniel P. Berrangé

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZUi7izJoVpU+iiuC@redhat.com \
    --to=berrange@redhat.com \
    --cc=armbru@redhat.com \
    --cc=dave@treblig.org \
    --cc=michael.roth@amd.com \
    --cc=qemu-devel@nongnu.org \
    --cc=steven.sistare@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).