qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/1] python/machine.py: refactor shutdown
@ 2020-06-02 19:48 John Snow
  2020-06-02 19:48 ` [PATCH v2 1/1] " John Snow
  0 siblings, 1 reply; 4+ messages in thread
From: John Snow @ 2020-06-02 19:48 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, Eduardo Habkost, philmd, Markus Armbruster, Cleber Rosa,
	John Snow

v2: Philippe took patches 1, 3 and 4.

This is a re-write of what was:
[PATCH RFC 03/32] python//machine.py: remove bare except
[PATCH 2/4] python/machine.py: remove bare except

It's a bit heavier handed, but it should address some of kwolf's
feedback from the RFC version.

Applies straight to origin/master, ought to pass pylint and flake8:

> cd ~/src/qemu/python/qemu
> pylint *.py
> flake8 *.py

John Snow (1):
  python/machine.py: refactor shutdown

 python/qemu/machine.py | 93 +++++++++++++++++++++++++++++-------------
 1 file changed, 65 insertions(+), 28 deletions(-)

-- 
2.21.3



^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v2 1/1] python/machine.py: refactor shutdown
  2020-06-02 19:48 [PATCH v2 0/1] python/machine.py: refactor shutdown John Snow
@ 2020-06-02 19:48 ` John Snow
  2020-06-04 12:39   ` Kevin Wolf
  0 siblings, 1 reply; 4+ messages in thread
From: John Snow @ 2020-06-02 19:48 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, Eduardo Habkost, philmd, Markus Armbruster, Cleber Rosa,
	John Snow

This is done primarily to avoid the 'bare except' pattern, which
suppresses ALL exceptions and not just ones that we are anticipating to
see.

Replace this with a pattern that isolates the different kind of shutdown
paradigms and a new fallback shutdown handler that gracefully attempts
one before the other.

Move all of the post-shutdown code into _post_shutdown, and adjust the
main shutdown() function to ALWAYS call the post_shutdown logic no
matter what kind of error we encountered: An expected one (subprocess
wait timeout) or an unexpected one (Everything else).

In cases where we encounter an expected error in the graceful shutdown
timeout, we will not re-raise an exception above shutdown(). Otherwise,
after post_shutdown cleanup, we will.

I anticipate that this WILL lead to additional bug reports filed against
this module, but that is unfortunately somewhat the point: This code
shouldn't be hiding failures that exist elsewhere within the python
code.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 python/qemu/machine.py | 93 +++++++++++++++++++++++++++++-------------
 1 file changed, 65 insertions(+), 28 deletions(-)

diff --git a/python/qemu/machine.py b/python/qemu/machine.py
index 041c615052e..a2abd2c35e3 100644
--- a/python/qemu/machine.py
+++ b/python/qemu/machine.py
@@ -283,6 +283,8 @@ def _post_launch(self):
             self._qmp.accept()
 
     def _post_shutdown(self):
+        self._load_io_log()
+
         if self._qemu_log_file is not None:
             self._qemu_log_file.close()
             self._qemu_log_file = None
@@ -296,6 +298,17 @@ def _post_shutdown(self):
         while len(self._remove_files) > 0:
             self._remove_if_exists(self._remove_files.pop())
 
+        exitcode = self.exitcode()
+        if exitcode is not None and exitcode < 0:
+            msg = 'qemu received signal %i; command: "%s"'
+            if self._qemu_full_args:
+                command = ' '.join(self._qemu_full_args)
+            else:
+                command = ''
+            LOG.warning(msg, -int(exitcode), command)
+
+        self._launched = False
+
     def launch(self):
         """
         Launch the VM and make sure we cleanup and expose the
@@ -344,12 +357,53 @@ def wait(self):
         self._popen.wait()
         if self._qmp:
             self._qmp.close()
-        self._load_io_log()
         self._post_shutdown()
 
-    def shutdown(self, has_quit=False, hard=False):
+    def _hard_shutdown(self) -> None:
         """
-        Terminate the VM and clean up
+        Kill the VM if it is running.
+        """
+        if not self.is_running():
+            return
+
+        self._popen.kill()
+        self._popen.wait(timeout=60)
+
+    def _soft_shutdown(self, has_quit: bool = False, timeout: int = 3) -> None:
+        """
+        Attempt to shutdown the VM gracefully if it is running.
+
+        :param has_quit: When True, don't attempt to issue 'quit' QMP command
+        :param timeout: Timeout for graceful shutdown. Default 3 seconds.
+        """
+        if not self.is_running():
+            return
+
+        if self._qmp is not None:
+            if not has_quit:
+                self._qmp.cmd('quit')
+            self._qmp.close()
+
+        self._popen.wait(timeout=timeout)
+
+    def _do_shutdown(self, has_quit: bool = False, timeout: int = 3) -> None:
+        """
+        Attempt to shutdown the VM gracefully; fallback to a hard shutdown.
+
+        :param has_quit: When True, don't attempt to issue 'quit' QMP command
+        :param timeout: Timeout for graceful shutdown. Default 3 seconds.
+        """
+        try:
+            self._soft_shutdown(has_quit, timeout)
+        except subprocess.TimeoutExpired:
+            self._hard_shutdown()
+        except:
+            self._hard_shutdown()
+            raise
+
+    def shutdown(self, has_quit: bool = False, hard: bool = False) -> None:
+        """
+        Terminate the VM (gracefully if possible) and perform cleanup.
         """
         # If we keep the console socket open, we may deadlock waiting
         # for QEMU to exit, while QEMU is waiting for the socket to
@@ -358,35 +412,18 @@ def shutdown(self, has_quit=False, hard=False):
             self._console_socket.close()
             self._console_socket = None
 
-        if self.is_running():
+        try:
             if hard:
-                self._popen.kill()
-            elif self._qmp:
-                try:
-                    if not has_quit:
-                        self._qmp.cmd('quit')
-                    self._qmp.close()
-                    self._popen.wait(timeout=3)
-                except:
-                    self._popen.kill()
-            self._popen.wait()
-
-        self._load_io_log()
-        self._post_shutdown()
-
-        exitcode = self.exitcode()
-        if exitcode is not None and exitcode < 0 and \
-                not (exitcode == -9 and hard):
-            msg = 'qemu received signal %i: %s'
-            if self._qemu_full_args:
-                command = ' '.join(self._qemu_full_args)
+                self._hard_shutdown()
             else:
-                command = ''
-            LOG.warning(msg, -int(exitcode), command)
-
-        self._launched = False
+                self._do_shutdown(has_quit)
+        finally:
+            self._post_shutdown()
 
     def kill(self):
+        """
+        Terminate the VM forcefully and perform cleanup.
+        """
         self.shutdown(hard=True)
 
     def set_qmp_monitor(self, enabled=True):
-- 
2.21.3



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2 1/1] python/machine.py: refactor shutdown
  2020-06-02 19:48 ` [PATCH v2 1/1] " John Snow
@ 2020-06-04 12:39   ` Kevin Wolf
  2020-06-04 19:03     ` John Snow
  0 siblings, 1 reply; 4+ messages in thread
From: Kevin Wolf @ 2020-06-04 12:39 UTC (permalink / raw)
  To: John Snow
  Cc: Cleber Rosa, philmd, qemu-devel, Eduardo Habkost,
	Markus Armbruster

Am 02.06.2020 um 21:48 hat John Snow geschrieben:
> This is done primarily to avoid the 'bare except' pattern, which
> suppresses ALL exceptions and not just ones that we are anticipating to
> see.
> 
> Replace this with a pattern that isolates the different kind of shutdown
> paradigms and a new fallback shutdown handler that gracefully attempts
> one before the other.
> 
> Move all of the post-shutdown code into _post_shutdown, and adjust the
> main shutdown() function to ALWAYS call the post_shutdown logic no
> matter what kind of error we encountered: An expected one (subprocess
> wait timeout) or an unexpected one (Everything else).
> 
> In cases where we encounter an expected error in the graceful shutdown
> timeout, we will not re-raise an exception above shutdown(). Otherwise,
> after post_shutdown cleanup, we will.
> 
> I anticipate that this WILL lead to additional bug reports filed against
> this module, but that is unfortunately somewhat the point: This code
> shouldn't be hiding failures that exist elsewhere within the python
> code.
> 
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  python/qemu/machine.py | 93 +++++++++++++++++++++++++++++-------------
>  1 file changed, 65 insertions(+), 28 deletions(-)
> 
> diff --git a/python/qemu/machine.py b/python/qemu/machine.py
> index 041c615052e..a2abd2c35e3 100644
> --- a/python/qemu/machine.py
> +++ b/python/qemu/machine.py
> @@ -283,6 +283,8 @@ def _post_launch(self):
>              self._qmp.accept()
>  
>      def _post_shutdown(self):
> +        self._load_io_log()
> +
>          if self._qemu_log_file is not None:
>              self._qemu_log_file.close()
>              self._qemu_log_file = None
> @@ -296,6 +298,17 @@ def _post_shutdown(self):
>          while len(self._remove_files) > 0:
>              self._remove_if_exists(self._remove_files.pop())
>  
> +        exitcode = self.exitcode()
> +        if exitcode is not None and exitcode < 0:
> +            msg = 'qemu received signal %i; command: "%s"'
> +            if self._qemu_full_args:
> +                command = ' '.join(self._qemu_full_args)
> +            else:
> +                command = ''
> +            LOG.warning(msg, -int(exitcode), command)
> +
> +        self._launched = False

It would have been a little easier to review if this code motion to
_post_shutdown() were a separate patch. It looks mostly independent of
the other changes.

The big change here is that we now log warnings for exit due to a signal
for QEMUMachine.wait(), too. I assume this is okay.

>      def launch(self):
>          """
>          Launch the VM and make sure we cleanup and expose the
> @@ -344,12 +357,53 @@ def wait(self):
>          self._popen.wait()
>          if self._qmp:
>              self._qmp.close()
> -        self._load_io_log()
>          self._post_shutdown()
>  
> -    def shutdown(self, has_quit=False, hard=False):
> +    def _hard_shutdown(self) -> None:
>          """
> -        Terminate the VM and clean up
> +        Kill the VM if it is running.
> +        """
> +        if not self.is_running():
> +            return
> +
> +        self._popen.kill()
> +        self._popen.wait(timeout=60)
> +
> +    def _soft_shutdown(self, has_quit: bool = False, timeout: int = 3) -> None:
> +        """
> +        Attempt to shutdown the VM gracefully if it is running.
> +
> +        :param has_quit: When True, don't attempt to issue 'quit' QMP command
> +        :param timeout: Timeout for graceful shutdown. Default 3 seconds.
> +        """
> +        if not self.is_running():
> +            return
> +
> +        if self._qmp is not None:
> +            if not has_quit:
> +                self._qmp.cmd('quit')
> +            self._qmp.close()
> +
> +        self._popen.wait(timeout=timeout)
> +
> +    def _do_shutdown(self, has_quit: bool = False, timeout: int = 3) -> None:
> +        """
> +        Attempt to shutdown the VM gracefully; fallback to a hard shutdown.
> +
> +        :param has_quit: When True, don't attempt to issue 'quit' QMP command
> +        :param timeout: Timeout for graceful shutdown. Default 3 seconds.
> +        """
> +        try:
> +            self._soft_shutdown(has_quit, timeout)
> +        except subprocess.TimeoutExpired:
> +            self._hard_shutdown()
> +        except:
> +            self._hard_shutdown()
> +            raise
> +
> +    def shutdown(self, has_quit: bool = False, hard: bool = False) -> None:
> +        """
> +        Terminate the VM (gracefully if possible) and perform cleanup.
>          """
>          # If we keep the console socket open, we may deadlock waiting
>          # for QEMU to exit, while QEMU is waiting for the socket to
> @@ -358,35 +412,18 @@ def shutdown(self, has_quit=False, hard=False):
>              self._console_socket.close()
>              self._console_socket = None
>  
> -        if self.is_running():
> +        try:
>              if hard:
> -                self._popen.kill()
> -            elif self._qmp:
> -                try:
> -                    if not has_quit:
> -                        self._qmp.cmd('quit')
> -                    self._qmp.close()
> -                    self._popen.wait(timeout=3)
> -                except:
> -                    self._popen.kill()
> -            self._popen.wait()
> -
> -        self._load_io_log()
> -        self._post_shutdown()
> -
> -        exitcode = self.exitcode()
> -        if exitcode is not None and exitcode < 0 and \
> -                not (exitcode == -9 and hard):

The condition in the second line went away without a replacement. This
means that previously, intentionally killing a VM with hard=True would
not log a warning (which makes sense to me), but now it does.

On the other hand, I couldn't find a single user of hard=True or the
kill() method in the whole tree. So I guess it doesn't matter anyway?

> -            msg = 'qemu received signal %i: %s'
> -            if self._qemu_full_args:
> -                command = ' '.join(self._qemu_full_args)
> +                self._hard_shutdown()
>              else:
> -                command = ''
> -            LOG.warning(msg, -int(exitcode), command)
> -
> -        self._launched = False
> +                self._do_shutdown(has_quit)
> +        finally:
> +            self._post_shutdown()
>  
>      def kill(self):
> +        """
> +        Terminate the VM forcefully and perform cleanup.
> +        """
>          self.shutdown(hard=True)
>  
>      def set_qmp_monitor(self, enabled=True):

In the end, while the new behaviour is a bit different in details, it's
not unreasonable, and if it does cause any problems, we can still change
it back.

Reviewed-by: Kevin Wolf <kwolf@redhat.com>



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2 1/1] python/machine.py: refactor shutdown
  2020-06-04 12:39   ` Kevin Wolf
@ 2020-06-04 19:03     ` John Snow
  0 siblings, 0 replies; 4+ messages in thread
From: John Snow @ 2020-06-04 19:03 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Cleber Rosa, philmd, qemu-devel, Eduardo Habkost,
	Markus Armbruster



On 6/4/20 8:39 AM, Kevin Wolf wrote:
> Am 02.06.2020 um 21:48 hat John Snow geschrieben:
>> This is done primarily to avoid the 'bare except' pattern, which
>> suppresses ALL exceptions and not just ones that we are anticipating to
>> see.
>>
>> Replace this with a pattern that isolates the different kind of shutdown
>> paradigms and a new fallback shutdown handler that gracefully attempts
>> one before the other.
>>
>> Move all of the post-shutdown code into _post_shutdown, and adjust the
>> main shutdown() function to ALWAYS call the post_shutdown logic no
>> matter what kind of error we encountered: An expected one (subprocess
>> wait timeout) or an unexpected one (Everything else).
>>
>> In cases where we encounter an expected error in the graceful shutdown
>> timeout, we will not re-raise an exception above shutdown(). Otherwise,
>> after post_shutdown cleanup, we will.
>>
>> I anticipate that this WILL lead to additional bug reports filed against
>> this module, but that is unfortunately somewhat the point: This code
>> shouldn't be hiding failures that exist elsewhere within the python
>> code.
>>
>> Signed-off-by: John Snow <jsnow@redhat.com>
>> ---
>>  python/qemu/machine.py | 93 +++++++++++++++++++++++++++++-------------
>>  1 file changed, 65 insertions(+), 28 deletions(-)
>>
>> diff --git a/python/qemu/machine.py b/python/qemu/machine.py
>> index 041c615052e..a2abd2c35e3 100644
>> --- a/python/qemu/machine.py
>> +++ b/python/qemu/machine.py
>> @@ -283,6 +283,8 @@ def _post_launch(self):
>>              self._qmp.accept()
>>  
>>      def _post_shutdown(self):
>> +        self._load_io_log()
>> +
>>          if self._qemu_log_file is not None:
>>              self._qemu_log_file.close()
>>              self._qemu_log_file = None
>> @@ -296,6 +298,17 @@ def _post_shutdown(self):
>>          while len(self._remove_files) > 0:
>>              self._remove_if_exists(self._remove_files.pop())
>>  
>> +        exitcode = self.exitcode()
>> +        if exitcode is not None and exitcode < 0:
>> +            msg = 'qemu received signal %i; command: "%s"'
>> +            if self._qemu_full_args:
>> +                command = ' '.join(self._qemu_full_args)
>> +            else:
>> +                command = ''
>> +            LOG.warning(msg, -int(exitcode), command)
>> +
>> +        self._launched = False
> 
> It would have been a little easier to review if this code motion to
> _post_shutdown() were a separate patch. It looks mostly independent of
> the other changes.
> 

That was hasty of me, you're right.

> The big change here is that we now log warnings for exit due to a signal
> for QEMUMachine.wait(), too. I assume this is okay.
> 

My assumption is that it's never wrong to see that QEMU was killed
instead of shut down gracefully.

>>      def launch(self):
>>          """
>>          Launch the VM and make sure we cleanup and expose the
>> @@ -344,12 +357,53 @@ def wait(self):
>>          self._popen.wait()
>>          if self._qmp:
>>              self._qmp.close()
>> -        self._load_io_log()
>>          self._post_shutdown()
>>  
>> -    def shutdown(self, has_quit=False, hard=False):
>> +    def _hard_shutdown(self) -> None:
>>          """
>> -        Terminate the VM and clean up
>> +        Kill the VM if it is running.
>> +        """
>> +        if not self.is_running():
>> +            return
>> +
>> +        self._popen.kill()
>> +        self._popen.wait(timeout=60)
>> +
>> +    def _soft_shutdown(self, has_quit: bool = False, timeout: int = 3) -> None:
>> +        """
>> +        Attempt to shutdown the VM gracefully if it is running.
>> +
>> +        :param has_quit: When True, don't attempt to issue 'quit' QMP command
>> +        :param timeout: Timeout for graceful shutdown. Default 3 seconds.
>> +        """
>> +        if not self.is_running():
>> +            return
>> +
>> +        if self._qmp is not None:
>> +            if not has_quit:
>> +                self._qmp.cmd('quit')
>> +            self._qmp.close()
>> +
>> +        self._popen.wait(timeout=timeout)
>> +
>> +    def _do_shutdown(self, has_quit: bool = False, timeout: int = 3) -> None:
>> +        """
>> +        Attempt to shutdown the VM gracefully; fallback to a hard shutdown.
>> +
>> +        :param has_quit: When True, don't attempt to issue 'quit' QMP command
>> +        :param timeout: Timeout for graceful shutdown. Default 3 seconds.
>> +        """
>> +        try:
>> +            self._soft_shutdown(has_quit, timeout)
>> +        except subprocess.TimeoutExpired:
>> +            self._hard_shutdown()
>> +        except:
>> +            self._hard_shutdown()
>> +            raise
>> +
>> +    def shutdown(self, has_quit: bool = False, hard: bool = False) -> None:
>> +        """
>> +        Terminate the VM (gracefully if possible) and perform cleanup.
>>          """
>>          # If we keep the console socket open, we may deadlock waiting
>>          # for QEMU to exit, while QEMU is waiting for the socket to
>> @@ -358,35 +412,18 @@ def shutdown(self, has_quit=False, hard=False):
>>              self._console_socket.close()
>>              self._console_socket = None
>>  
>> -        if self.is_running():
>> +        try:
>>              if hard:
>> -                self._popen.kill()
>> -            elif self._qmp:
>> -                try:
>> -                    if not has_quit:
>> -                        self._qmp.cmd('quit')
>> -                    self._qmp.close()
>> -                    self._popen.wait(timeout=3)
>> -                except:
>> -                    self._popen.kill()
>> -            self._popen.wait()
>> -
>> -        self._load_io_log()
>> -        self._post_shutdown()
>> -
>> -        exitcode = self.exitcode()
>> -        if exitcode is not None and exitcode < 0 and \
>> -                not (exitcode == -9 and hard):
> 
> The condition in the second line went away without a replacement. This
> means that previously, intentionally killing a VM with hard=True would
> not log a warning (which makes sense to me), but now it does.
> 
> On the other hand, I couldn't find a single user of hard=True or the
> kill() method in the whole tree. So I guess it doesn't matter anyway?
> 

I was thinking that it was never wrong to log a warning when QEMU ends
non-gracefully, but I suppose this is indeed not for exclusive use of
iotests and I should not question a user command to kill QEMU.

I addressed this in the commit message, but now I'm thinking I actually
should reinstate this logic.

>> -            msg = 'qemu received signal %i: %s'
>> -            if self._qemu_full_args:
>> -                command = ' '.join(self._qemu_full_args)
>> +                self._hard_shutdown()
>>              else:
>> -                command = ''
>> -            LOG.warning(msg, -int(exitcode), command)
>> -
>> -        self._launched = False
>> +                self._do_shutdown(has_quit)
>> +        finally:
>> +            self._post_shutdown()
>>  
>>      def kill(self):
>> +        """
>> +        Terminate the VM forcefully and perform cleanup.
>> +        """
>>          self.shutdown(hard=True)
>>  
>>      def set_qmp_monitor(self, enabled=True):
> 
> In the end, while the new behaviour is a bit different in details, it's
> not unreasonable, and if it does cause any problems, we can still change
> it back.
> 
> Reviewed-by: Kevin Wolf <kwolf@redhat.com>
> 

I'll touch this up based on your feedback before taking your RB. Thank
you for the reviews so far! Our python cleaned up a lot faster and more
easily than I had expected.

I know we shouldn't over-focus on making tests beautiful, but I
perceived a real problem with script rot that I wanted to address.

Thanks for all of your help,
--js



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-06-04 19:05 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-06-02 19:48 [PATCH v2 0/1] python/machine.py: refactor shutdown John Snow
2020-06-02 19:48 ` [PATCH v2 1/1] " John Snow
2020-06-04 12:39   ` Kevin Wolf
2020-06-04 19:03     ` John Snow

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).