qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: John Snow <jsnow@redhat.com>
To: qemu-devel <qemu-devel@nongnu.org>
Cc: Kevin Wolf <kwolf@redhat.com>,
	Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
	Eduardo Habkost <ehabkost@redhat.com>,
	qemu-block@nongnu.org, Hanna Reitz <hreitz@redhat.com>,
	Cleber Rosa <crosa@redhat.com>, Eric Blake <eblake@redhat.com>
Subject: Re: [PATCH v2 12/17] python/machine: Handle QMP errors on close more meticulously
Date: Thu, 7 Oct 2021 12:52:18 -0400	[thread overview]
Message-ID: <CAFn=p-YwL+v7_rvC40z_T3DiKhH0cdgu53KTghwTjmixakHC+Q@mail.gmail.com> (raw)
In-Reply-To: <20210923004938.3999963-13-jsnow@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 5691 bytes --]

On Wed, Sep 22, 2021 at 8:50 PM John Snow <jsnow@redhat.com> wrote:

> To use the AQMP backend, Machine just needs to be a little more diligent
> about what happens when closing a QMP connection. The operation is no
> longer a freebie in the async world; it may return errors encountered in
> the async bottom half on incoming message receipt, etc.
>
> (AQMP's disconnect, ultimately, serves as the quiescence point where all
> async contexts are gathered together, and any final errors reported at
> that point.)
>
> Because async QMP continues to check for messages asynchronously, it's
> almost certainly likely that the loop will have exited due to EOF after
> issuing the last 'quit' command. That error will ultimately be bubbled
> up when attempting to close the QMP connection. The manager class here
> then is free to discard it -- if it was expected.
>
> Signed-off-by: John Snow <jsnow@redhat.com>
>
> ---
>
> Yes, I regret that this class has become quite a dumping ground for
> complexity around the exit path. It's in need of a refactor to help
> separate out the exception handling and cleanup mechanisms from the
> VM-related stuff, but it's not a priority to do that just yet -- but
> it's on the list.
>
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  python/qemu/machine/machine.py | 48 +++++++++++++++++++++++++++++-----
>  1 file changed, 42 insertions(+), 6 deletions(-)
>
> diff --git a/python/qemu/machine/machine.py
> b/python/qemu/machine/machine.py
> index 0bd40bc2f76..c33a78a2d9f 100644
> --- a/python/qemu/machine/machine.py
> +++ b/python/qemu/machine/machine.py
> @@ -342,9 +342,15 @@ def _post_shutdown(self) -> None:
>          # Comprehensive reset for the failed launch case:
>          self._early_cleanup()
>
> -        if self._qmp_connection:
> -            self._qmp.close()
> -            self._qmp_connection = None
> +        try:
> +            self._close_qmp_connection()
> +        except Exception as err:  # pylint: disable=broad-except
> +            LOG.warning(
> +                "Exception closing QMP connection: %s",
> +                str(err) if str(err) else type(err).__name__
> +            )
> +        finally:
> +            assert self._qmp_connection is None
>
>          self._close_qemu_log_file()
>
> @@ -420,6 +426,31 @@ def _launch(self) -> None:
>                                         close_fds=False)
>          self._post_launch()
>
> +    def _close_qmp_connection(self) -> None:
> +        """
> +        Close the underlying QMP connection, if any.
> +
> +        Dutifully report errors that occurred while closing, but assume
> +        that any error encountered indicates an abnormal termination
> +        process and not a failure to close.
> +        """
> +        if self._qmp_connection is None:
> +            return
> +
> +        try:
> +            self._qmp.close()
> +        except EOFError:
> +            # EOF can occur as an Exception here when using the Async
> +            # QMP backend. It indicates that the server closed the
> +            # stream. If we successfully issued 'quit' at any point,
> +            # then this was expected. If the remote went away without
> +            # our permission, it's worth reporting that as an abnormal
> +            # shutdown case.
> +            if not self._quit_issued:
>

I strongly suspect that this line needs to actually be "if not
(self._user_killed or self._quit_issued):" to also handle the force-kill
cases.

I wrote a different sync compatibility layer in a yet-to-be-published
branch and noticed this code still allows EOFError through. Why it did not
seem to come up in *this* series I still don't know -- I think the timing
of when exactly the coroutines get scheduled can be finnicky depending on
what else is running. So, sometimes, we manage to close the loop before we
get EOF and sometimes we don't.

I am mulling over adding a "tolerate_eof: bool = False" argument to
disconnect() to allow the caller to handle this stuff with a little less
boilerplate. Anyone have strong feelings on that?

(Ultimately, because there's no canonical "goodbye" in-band message, the
QMP layer itself can never really know if EOF was expected or not -- that's
really up to whomever is managing the connection, but it does add a layer
of complexity around the exit path that I am starting to find is a
nuisance. Not to mention that there are likely many possible cases in which
we might expect the remote to disappear on us that don't involve using QMP
at all -- kill is one, using the guest agent to coerce the guest into
shutdown is another. Networking and migration are other theoretical(?)
avenues for intended disconnects.)


> +                raise
> +        finally:
> +            self._qmp_connection = None
> +
>      def _early_cleanup(self) -> None:
>          """
>          Perform any cleanup that needs to happen before the VM exits.
> @@ -460,9 +491,14 @@ def _soft_shutdown(self, timeout: Optional[int]) ->
> None:
>          self._early_cleanup()
>
>          if self._qmp_connection:
> -            if not self._quit_issued:
> -                # Might raise ConnectionReset
> -                self.qmp('quit')
> +            try:
> +                if not self._quit_issued:
> +                    # May raise ExecInterruptedError or StateError if the
> +                    # connection dies or has *already* died.
> +                    self.qmp('quit')
> +            finally:
> +                # Regardless, we want to quiesce the connection.
> +                self._close_qmp_connection()
>
>          # May raise subprocess.TimeoutExpired
>          self._subp.wait(timeout=timeout)
> --
> 2.31.1
>
>

[-- Attachment #2: Type: text/html, Size: 7202 bytes --]

  parent reply	other threads:[~2021-10-07 16:56 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-23  0:49 [PATCH v2 00/17] Switch iotests to using Async QMP John Snow
2021-09-23  0:49 ` [PATCH v2 01/17] python/aqmp: add greeting property to QMPClient John Snow
2021-09-23  0:49 ` [PATCH v2 02/17] python/aqmp: add .empty() method to EventListener John Snow
2021-09-23  0:49 ` [PATCH v2 03/17] python/aqmp: Return cleared events from EventListener.clear() John Snow
2021-09-23  0:49 ` [PATCH v2 04/17] python/aqmp: add send_fd_scm John Snow
2021-10-07 14:52   ` Eric Blake
2021-10-07 16:27     ` John Snow
2021-10-07 21:46       ` Eric Blake
2021-09-23  0:49 ` [PATCH v2 05/17] python/aqmp: Add dict conversion method to Greeting object John Snow
2021-10-07 14:53   ` Eric Blake
2021-09-23  0:49 ` [PATCH v2 06/17] python/aqmp: Reduce severity of EOFError-caused loop terminations John Snow
2021-10-07 14:55   ` Eric Blake
2021-09-23  0:49 ` [PATCH v2 07/17] python/aqmp: Disable logging messages by default John Snow
2021-10-07 15:02   ` Eric Blake
2021-09-23  0:49 ` [PATCH v2 08/17] python/qmp: clear events on get_events() call John Snow
2021-09-23  0:49 ` [PATCH v2 09/17] python/qmp: add send_fd_scm directly to QEMUMonitorProtocol John Snow
2021-09-23  0:49 ` [PATCH v2 10/17] python, iotests: remove socket_scm_helper John Snow
2021-09-23  0:49 ` [PATCH v2 11/17] python/machine: remove has_quit argument John Snow
2021-10-12 15:30   ` Hanna Reitz
2021-09-23  0:49 ` [PATCH v2 12/17] python/machine: Handle QMP errors on close more meticulously John Snow
2021-10-07 15:07   ` Eric Blake
2021-10-07 16:34     ` John Snow
2021-10-07 16:52   ` John Snow [this message]
2021-10-12 15:56     ` Hanna Reitz
2021-09-23  0:49 ` [PATCH v2 13/17] iotests: Accommodate async QMP Exception classes John Snow
2021-10-12 16:06   ` Hanna Reitz
2021-10-12 16:38     ` John Snow
2021-09-23  0:49 ` [PATCH v2 14/17] iotests: Conditionally silence certain AQMP errors John Snow
2021-09-23  0:49 ` [PATCH v2 15/17] python/aqmp: Create sync QMP wrapper for iotests John Snow
2021-10-06 10:13   ` Paolo Bonzini
2021-10-06 14:24     ` John Snow
2021-10-06 14:32       ` Paolo Bonzini
2021-10-06 15:12         ` John Snow
2021-09-23  0:49 ` [PATCH v2 16/17] python/aqmp: Remove scary message John Snow
2021-09-23  0:49 ` [PATCH v2 17/17] python, iotests: replace qmp with aqmp John Snow
2021-10-06 10:14 ` [PATCH v2 00/17] Switch iotests to using Async QMP Paolo Bonzini
2021-10-06 15:01   ` John Snow

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAFn=p-YwL+v7_rvC40z_T3DiKhH0cdgu53KTghwTjmixakHC+Q@mail.gmail.com' \
    --to=jsnow@redhat.com \
    --cc=crosa@redhat.com \
    --cc=eblake@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=hreitz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).