qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Steven Sistare <steven.sistare@oracle.com>
To: Markus Armbruster <armbru@redhat.com>
Cc: "Daniel P. Berrangé" <berrange@redhat.com>,
	qemu-devel@nongnu.org, "Michael Roth" <michael.roth@amd.com>,
	"Dr. David Alan Gilbert" <dave@treblig.org>
Subject: Re: [PATCH] monitor: flush messages on abort
Date: Thu, 16 Nov 2023 12:03:28 -0500	[thread overview]
Message-ID: <45bfd9b2-09dd-471f-9c54-b7f746bf0baf@oracle.com> (raw)
In-Reply-To: <8734x5ennk.fsf@pond.sub.org>

On 11/16/2023 10:14 AM, Markus Armbruster wrote:
> Steven Sistare <steven.sistare@oracle.com> writes:
>> On 11/15/2023 11:15 AM, Markus Armbruster wrote:
>>> Steven Sistare <steven.sistare@oracle.com> writes:
>>>> On 11/15/2023 3:41 AM, Markus Armbruster wrote:
>>>>> Daniel P. Berrangé <berrange@redhat.com> writes:
>>>>>
>>>>>> On Fri, Nov 03, 2023 at 03:51:00PM -0400, Steven Sistare wrote:
>>>>>>> On 11/3/2023 1:33 PM, Daniel P. Berrangé wrote:
>>>>>>>> On Fri, Nov 03, 2023 at 09:01:29AM -0700, Steve Sistare wrote:
>>>>>>>>> Buffered monitor output is lost when abort() is called.  The pattern
>>>>>>>>> error_report() followed by abort() occurs about 60 times, so valuable
>>>>>>>>> information is being lost when the abort is called in the context of a
>>>>>>>>> monitor command.
>>>>>>>>
>>>>>>>> I'm curious, was there a particular abort() scenario that you hit ?
>>>>>>>
>>>>>>> Yes, while tweaking the suspended state, and forgetting to add transitions:
>>>>>>>
>>>>>>>         error_report("invalid runstate transition: '%s' -> '%s'",
>>>>>>>         abort();
>>>>>>>
>>>>>>> But I have previously hit this for other errors.
>>>>>
>>>>> Can you provide a reproducer?
>>>>
>>>> I sometimes hit this when developing new code.  I do not have a reproducer for upstream
>>>> branches. The patch is aimed at helping developers, not users.
>>>
>>> I'm asking because I can't see how the error message could be lost.  A
>>> reproducer would let me find out.  "Apply this set of broken patches,
>>> then do that" would serve.
>>
>> $ patch -p1 << EOF
>> diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
>> index b0f948d..c9a3aee 100644
>> --- a/monitor/qmp-cmds.c
>> +++ b/monitor/qmp-cmds.c
>> @@ -47,8 +47,12 @@ void qmp_quit(Error **errp)
>>      qemu_system_shutdown_request(SHUTDOWN_CAUSE_HOST_QMP_QUIT);
>>  }
>>
>> +#include "qemu/error-report.h"
>> +
>>  void qmp_stop(Error **errp)
>>  {
>> +    error_report("injected failure");
>> +    abort();
>>      /* if there is a dump in background, we should wait until the dump
>>       * finished */
>>      if (qemu_system_dump_in_progress()) {
>> EOF
>>
>> # This example loses the error message:
>>
>> $ args='-display none -chardev socket,id=mon1,server=on,path=mon1.sock,wait=off -mon mon1,mode=control'
>> $ qemu-system-x86_64 $args < /dev/null &
>> [1] 18048
>> $ echo '{"execute":"qmp_capabilities"} {"execute":"human-monitor-command","arguments":{"command-line":"stop"}}' | ncat -U mon1.sock
>> {"QMP": {"version": {"qemu": {"micro": 50, "minor": 1, "major": 8}, "package": "v8.1.0-2976-g4025fde-dirty"}, "capabilities": ["oob"]}}
>> {"return": {}}
>> Ncat: Connection reset by peer.
>> $
>> [1]+  Aborted                 qemu-system-x86_64 $args < /dev/null
>>
>>
>> # This example preserves the error message. I include it to show the ncat-based test is valid.
>>
>> $ qemu-system-x86_64 $args < /dev/null &
>> [1] 18060
>> $ echo '{"execute":"qmp_capabilities"} {"execute":"stop"}' | ncat -U mon1.sock
>> {"QMP": {"version": {"qemu": {"micro": 50, "minor": 1, "major": 8}, "package": "v8.1.0-2976-g4025fde-dirty"}, "capabilities": ["oob"]}}
>> {"return": {}}
>> injected failure                <============= qemu stderr
>> Ncat: Connection reset by peer.
>> $
>> [1]+  Aborted                 qemu-system-x86_64 $args < /dev/null
>>
>> - Steve
> 
> Reproduced, thanks!
> 
> Confirms my reading of the code.  Two cases:
> 
> 1. Normal monitor output
> 
>    Flushed on newline.  A partial last line can be lost on crash or
>    exit().
> 
>    Works as intended.
> 
> 2. Output being captured for QMP command human-monitor-command
> 
>    Never flushed.  Instead, the entire buffer is included in the
>    command's success response.  Naturally there is no response on crash.
> 
>    Works as intended.
> 
> Here's how to fish unflushed output out of a core dump:
> 
>     (gdb) bt
>     #0  0x00007efeba52fecc in __pthread_kill_implementation () at /lib64/libc.so.6
>     #1  0x00007efeba4dfab6 in raise () at /lib64/libc.so.6
>     #2  0x00007efeba4c97fc in abort () at /lib64/libc.so.6
>     #3  0x000055588dcc848c in qmp_stop (errp=0x0) at ../monitor/qmp-cmds.c:53
>     #4  0x000055588dcc36ab in hmp_stop (mon=0x7ffc01d47cc0, qdict=0x555891d19200)
>         at ../monitor/hmp-cmds.c:119
>     #5  0x000055588dcc714e in handle_hmp_command_exec
>         (mon=0x7ffc01d47cc0, cmd=0x55588f0689d0 <hmp_cmds+7760>, qdict=0x555891d19200) at ../monitor/hmp.c:1106
>     #6  0x000055588dcc737b in handle_hmp_command
>         (mon=0x7ffc01d47cc0, cmdline=0x7efea80036f4 "") at ../monitor/hmp.c:1158
>     #7  0x000055588dcc8922 in qmp_human_monitor_command
>         (command_line=0x7efea80036f0 "stop", has_cpu_index=false, cpu_index=0, errp=0x7ffc01d47dd0) at ../monitor/qmp-cmds.c:184
> 
>     [...]
> 
>     (gdb) up 4
>     #4  0x000055588dcc36ab in hmp_stop (mon=0x7ffc01d47cc0, qdict=0x555891d19200)
>         at ../monitor/hmp-cmds.c:119
>     119	    qmp_stop(NULL);
>     (gdb) p mon->outbuf
>     $1 = (GString *) 0x555890a65260
>     (gdb) p *$
>     $2 = {str = 0x7efea8002c00 "injected failure\r\n", len = 18, 
>       allocated_len = 128}

Yes. thanks.  With the patch, the computer does the work for us, and sometimes the message
is definitive enough to fix without further investigation, eg
  invalid runstate transition: 'suspended -> 'save-vm'

But, it's no big deal, and I withdraw the patch request.  Thanks for looking at it.

- Steve


  reply	other threads:[~2023-11-16 17:04 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-03 16:01 [PATCH] monitor: flush messages on abort Steve Sistare
2023-11-03 17:33 ` Daniel P. Berrangé
2023-11-03 19:51   ` Steven Sistare
2023-11-03 20:28     ` Steven Sistare
2023-11-06 10:10     ` Daniel P. Berrangé
2023-11-15  8:41       ` Markus Armbruster
2023-11-15 15:52         ` Steven Sistare
2023-11-15 16:15           ` Markus Armbruster
2023-11-15 17:41             ` Steven Sistare
2023-11-16 15:14               ` Markus Armbruster
2023-11-16 17:03                 ` Steven Sistare [this message]
2023-11-15 15:30       ` Steven Sistare
2023-11-15 16:05         ` Markus Armbruster
2023-11-16 11:21         ` Daniel P. Berrangé

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=45bfd9b2-09dd-471f-9c54-b7f746bf0baf@oracle.com \
    --to=steven.sistare@oracle.com \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=dave@treblig.org \
    --cc=michael.roth@amd.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).