qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: John Snow <jsnow@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: Cleber Rosa <crosa@redhat.com>,
	philmd@redhat.com, qemu-devel@nongnu.org,
	Eduardo Habkost <ehabkost@redhat.com>,
	Markus Armbruster <armbru@redhat.com>
Subject: Re: [PATCH v2 1/1] python/machine.py: refactor shutdown
Date: Thu, 4 Jun 2020 15:03:57 -0400	[thread overview]
Message-ID: <1a02be9a-0a7c-fedb-2139-e79095b201dd@redhat.com> (raw)
In-Reply-To: <20200604123916.GF4512@linux.fritz.box>



On 6/4/20 8:39 AM, Kevin Wolf wrote:
> Am 02.06.2020 um 21:48 hat John Snow geschrieben:
>> This is done primarily to avoid the 'bare except' pattern, which
>> suppresses ALL exceptions and not just ones that we are anticipating to
>> see.
>>
>> Replace this with a pattern that isolates the different kind of shutdown
>> paradigms and a new fallback shutdown handler that gracefully attempts
>> one before the other.
>>
>> Move all of the post-shutdown code into _post_shutdown, and adjust the
>> main shutdown() function to ALWAYS call the post_shutdown logic no
>> matter what kind of error we encountered: An expected one (subprocess
>> wait timeout) or an unexpected one (Everything else).
>>
>> In cases where we encounter an expected error in the graceful shutdown
>> timeout, we will not re-raise an exception above shutdown(). Otherwise,
>> after post_shutdown cleanup, we will.
>>
>> I anticipate that this WILL lead to additional bug reports filed against
>> this module, but that is unfortunately somewhat the point: This code
>> shouldn't be hiding failures that exist elsewhere within the python
>> code.
>>
>> Signed-off-by: John Snow <jsnow@redhat.com>
>> ---
>>  python/qemu/machine.py | 93 +++++++++++++++++++++++++++++-------------
>>  1 file changed, 65 insertions(+), 28 deletions(-)
>>
>> diff --git a/python/qemu/machine.py b/python/qemu/machine.py
>> index 041c615052e..a2abd2c35e3 100644
>> --- a/python/qemu/machine.py
>> +++ b/python/qemu/machine.py
>> @@ -283,6 +283,8 @@ def _post_launch(self):
>>              self._qmp.accept()
>>  
>>      def _post_shutdown(self):
>> +        self._load_io_log()
>> +
>>          if self._qemu_log_file is not None:
>>              self._qemu_log_file.close()
>>              self._qemu_log_file = None
>> @@ -296,6 +298,17 @@ def _post_shutdown(self):
>>          while len(self._remove_files) > 0:
>>              self._remove_if_exists(self._remove_files.pop())
>>  
>> +        exitcode = self.exitcode()
>> +        if exitcode is not None and exitcode < 0:
>> +            msg = 'qemu received signal %i; command: "%s"'
>> +            if self._qemu_full_args:
>> +                command = ' '.join(self._qemu_full_args)
>> +            else:
>> +                command = ''
>> +            LOG.warning(msg, -int(exitcode), command)
>> +
>> +        self._launched = False
> 
> It would have been a little easier to review if this code motion to
> _post_shutdown() were a separate patch. It looks mostly independent of
> the other changes.
> 

That was hasty of me, you're right.

> The big change here is that we now log warnings for exit due to a signal
> for QEMUMachine.wait(), too. I assume this is okay.
> 

My assumption is that it's never wrong to see that QEMU was killed
instead of shut down gracefully.

>>      def launch(self):
>>          """
>>          Launch the VM and make sure we cleanup and expose the
>> @@ -344,12 +357,53 @@ def wait(self):
>>          self._popen.wait()
>>          if self._qmp:
>>              self._qmp.close()
>> -        self._load_io_log()
>>          self._post_shutdown()
>>  
>> -    def shutdown(self, has_quit=False, hard=False):
>> +    def _hard_shutdown(self) -> None:
>>          """
>> -        Terminate the VM and clean up
>> +        Kill the VM if it is running.
>> +        """
>> +        if not self.is_running():
>> +            return
>> +
>> +        self._popen.kill()
>> +        self._popen.wait(timeout=60)
>> +
>> +    def _soft_shutdown(self, has_quit: bool = False, timeout: int = 3) -> None:
>> +        """
>> +        Attempt to shutdown the VM gracefully if it is running.
>> +
>> +        :param has_quit: When True, don't attempt to issue 'quit' QMP command
>> +        :param timeout: Timeout for graceful shutdown. Default 3 seconds.
>> +        """
>> +        if not self.is_running():
>> +            return
>> +
>> +        if self._qmp is not None:
>> +            if not has_quit:
>> +                self._qmp.cmd('quit')
>> +            self._qmp.close()
>> +
>> +        self._popen.wait(timeout=timeout)
>> +
>> +    def _do_shutdown(self, has_quit: bool = False, timeout: int = 3) -> None:
>> +        """
>> +        Attempt to shutdown the VM gracefully; fallback to a hard shutdown.
>> +
>> +        :param has_quit: When True, don't attempt to issue 'quit' QMP command
>> +        :param timeout: Timeout for graceful shutdown. Default 3 seconds.
>> +        """
>> +        try:
>> +            self._soft_shutdown(has_quit, timeout)
>> +        except subprocess.TimeoutExpired:
>> +            self._hard_shutdown()
>> +        except:
>> +            self._hard_shutdown()
>> +            raise
>> +
>> +    def shutdown(self, has_quit: bool = False, hard: bool = False) -> None:
>> +        """
>> +        Terminate the VM (gracefully if possible) and perform cleanup.
>>          """
>>          # If we keep the console socket open, we may deadlock waiting
>>          # for QEMU to exit, while QEMU is waiting for the socket to
>> @@ -358,35 +412,18 @@ def shutdown(self, has_quit=False, hard=False):
>>              self._console_socket.close()
>>              self._console_socket = None
>>  
>> -        if self.is_running():
>> +        try:
>>              if hard:
>> -                self._popen.kill()
>> -            elif self._qmp:
>> -                try:
>> -                    if not has_quit:
>> -                        self._qmp.cmd('quit')
>> -                    self._qmp.close()
>> -                    self._popen.wait(timeout=3)
>> -                except:
>> -                    self._popen.kill()
>> -            self._popen.wait()
>> -
>> -        self._load_io_log()
>> -        self._post_shutdown()
>> -
>> -        exitcode = self.exitcode()
>> -        if exitcode is not None and exitcode < 0 and \
>> -                not (exitcode == -9 and hard):
> 
> The condition in the second line went away without a replacement. This
> means that previously, intentionally killing a VM with hard=True would
> not log a warning (which makes sense to me), but now it does.
> 
> On the other hand, I couldn't find a single user of hard=True or the
> kill() method in the whole tree. So I guess it doesn't matter anyway?
> 

I was thinking that it was never wrong to log a warning when QEMU ends
non-gracefully, but I suppose this is indeed not for exclusive use of
iotests and I should not question a user command to kill QEMU.

I addressed this in the commit message, but now I'm thinking I actually
should reinstate this logic.

>> -            msg = 'qemu received signal %i: %s'
>> -            if self._qemu_full_args:
>> -                command = ' '.join(self._qemu_full_args)
>> +                self._hard_shutdown()
>>              else:
>> -                command = ''
>> -            LOG.warning(msg, -int(exitcode), command)
>> -
>> -        self._launched = False
>> +                self._do_shutdown(has_quit)
>> +        finally:
>> +            self._post_shutdown()
>>  
>>      def kill(self):
>> +        """
>> +        Terminate the VM forcefully and perform cleanup.
>> +        """
>>          self.shutdown(hard=True)
>>  
>>      def set_qmp_monitor(self, enabled=True):
> 
> In the end, while the new behaviour is a bit different in details, it's
> not unreasonable, and if it does cause any problems, we can still change
> it back.
> 
> Reviewed-by: Kevin Wolf <kwolf@redhat.com>
> 

I'll touch this up based on your feedback before taking your RB. Thank
you for the reviews so far! Our python cleaned up a lot faster and more
easily than I had expected.

I know we shouldn't over-focus on making tests beautiful, but I
perceived a real problem with script rot that I wanted to address.

Thanks for all of your help,
--js



      reply	other threads:[~2020-06-04 19:05 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-02 19:48 [PATCH v2 0/1] python/machine.py: refactor shutdown John Snow
2020-06-02 19:48 ` [PATCH v2 1/1] " John Snow
2020-06-04 12:39   ` Kevin Wolf
2020-06-04 19:03     ` John Snow [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1a02be9a-0a7c-fedb-2139-e79095b201dd@redhat.com \
    --to=jsnow@redhat.com \
    --cc=armbru@redhat.com \
    --cc=crosa@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=philmd@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).