* [PATCH v2 0/1] python/machine.py: refactor shutdown
@ 2020-06-02 19:48 John Snow
2020-06-02 19:48 ` [PATCH v2 1/1] " John Snow
0 siblings, 1 reply; 4+ messages in thread
From: John Snow @ 2020-06-02 19:48 UTC (permalink / raw)
To: qemu-devel
Cc: kwolf, Eduardo Habkost, philmd, Markus Armbruster, Cleber Rosa,
John Snow
v2: Philippe took patches 1, 3 and 4.
This is a re-write of what was:
[PATCH RFC 03/32] python//machine.py: remove bare except
[PATCH 2/4] python/machine.py: remove bare except
It's a bit heavier handed, but it should address some of kwolf's
feedback from the RFC version.
Applies straight to origin/master, ought to pass pylint and flake8:
> cd ~/src/qemu/python/qemu
> pylint *.py
> flake8 *.py
John Snow (1):
python/machine.py: refactor shutdown
python/qemu/machine.py | 93 +++++++++++++++++++++++++++++-------------
1 file changed, 65 insertions(+), 28 deletions(-)
--
2.21.3
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH v2 1/1] python/machine.py: refactor shutdown 2020-06-02 19:48 [PATCH v2 0/1] python/machine.py: refactor shutdown John Snow @ 2020-06-02 19:48 ` John Snow 2020-06-04 12:39 ` Kevin Wolf 0 siblings, 1 reply; 4+ messages in thread From: John Snow @ 2020-06-02 19:48 UTC (permalink / raw) To: qemu-devel Cc: kwolf, Eduardo Habkost, philmd, Markus Armbruster, Cleber Rosa, John Snow This is done primarily to avoid the 'bare except' pattern, which suppresses ALL exceptions and not just ones that we are anticipating to see. Replace this with a pattern that isolates the different kind of shutdown paradigms and a new fallback shutdown handler that gracefully attempts one before the other. Move all of the post-shutdown code into _post_shutdown, and adjust the main shutdown() function to ALWAYS call the post_shutdown logic no matter what kind of error we encountered: An expected one (subprocess wait timeout) or an unexpected one (Everything else). In cases where we encounter an expected error in the graceful shutdown timeout, we will not re-raise an exception above shutdown(). Otherwise, after post_shutdown cleanup, we will. I anticipate that this WILL lead to additional bug reports filed against this module, but that is unfortunately somewhat the point: This code shouldn't be hiding failures that exist elsewhere within the python code. Signed-off-by: John Snow <jsnow@redhat.com> --- python/qemu/machine.py | 93 +++++++++++++++++++++++++++++------------- 1 file changed, 65 insertions(+), 28 deletions(-) diff --git a/python/qemu/machine.py b/python/qemu/machine.py index 041c615052e..a2abd2c35e3 100644 --- a/python/qemu/machine.py +++ b/python/qemu/machine.py @@ -283,6 +283,8 @@ def _post_launch(self): self._qmp.accept() def _post_shutdown(self): + self._load_io_log() + if self._qemu_log_file is not None: self._qemu_log_file.close() self._qemu_log_file = None @@ -296,6 +298,17 @@ def _post_shutdown(self): while len(self._remove_files) > 0: self._remove_if_exists(self._remove_files.pop()) + exitcode = self.exitcode() + if exitcode is not None and exitcode < 0: + msg = 'qemu received signal %i; command: "%s"' + if self._qemu_full_args: + command = ' '.join(self._qemu_full_args) + else: + command = '' + LOG.warning(msg, -int(exitcode), command) + + self._launched = False + def launch(self): """ Launch the VM and make sure we cleanup and expose the @@ -344,12 +357,53 @@ def wait(self): self._popen.wait() if self._qmp: self._qmp.close() - self._load_io_log() self._post_shutdown() - def shutdown(self, has_quit=False, hard=False): + def _hard_shutdown(self) -> None: """ - Terminate the VM and clean up + Kill the VM if it is running. + """ + if not self.is_running(): + return + + self._popen.kill() + self._popen.wait(timeout=60) + + def _soft_shutdown(self, has_quit: bool = False, timeout: int = 3) -> None: + """ + Attempt to shutdown the VM gracefully if it is running. + + :param has_quit: When True, don't attempt to issue 'quit' QMP command + :param timeout: Timeout for graceful shutdown. Default 3 seconds. + """ + if not self.is_running(): + return + + if self._qmp is not None: + if not has_quit: + self._qmp.cmd('quit') + self._qmp.close() + + self._popen.wait(timeout=timeout) + + def _do_shutdown(self, has_quit: bool = False, timeout: int = 3) -> None: + """ + Attempt to shutdown the VM gracefully; fallback to a hard shutdown. + + :param has_quit: When True, don't attempt to issue 'quit' QMP command + :param timeout: Timeout for graceful shutdown. Default 3 seconds. + """ + try: + self._soft_shutdown(has_quit, timeout) + except subprocess.TimeoutExpired: + self._hard_shutdown() + except: + self._hard_shutdown() + raise + + def shutdown(self, has_quit: bool = False, hard: bool = False) -> None: + """ + Terminate the VM (gracefully if possible) and perform cleanup. """ # If we keep the console socket open, we may deadlock waiting # for QEMU to exit, while QEMU is waiting for the socket to @@ -358,35 +412,18 @@ def shutdown(self, has_quit=False, hard=False): self._console_socket.close() self._console_socket = None - if self.is_running(): + try: if hard: - self._popen.kill() - elif self._qmp: - try: - if not has_quit: - self._qmp.cmd('quit') - self._qmp.close() - self._popen.wait(timeout=3) - except: - self._popen.kill() - self._popen.wait() - - self._load_io_log() - self._post_shutdown() - - exitcode = self.exitcode() - if exitcode is not None and exitcode < 0 and \ - not (exitcode == -9 and hard): - msg = 'qemu received signal %i: %s' - if self._qemu_full_args: - command = ' '.join(self._qemu_full_args) + self._hard_shutdown() else: - command = '' - LOG.warning(msg, -int(exitcode), command) - - self._launched = False + self._do_shutdown(has_quit) + finally: + self._post_shutdown() def kill(self): + """ + Terminate the VM forcefully and perform cleanup. + """ self.shutdown(hard=True) def set_qmp_monitor(self, enabled=True): -- 2.21.3 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v2 1/1] python/machine.py: refactor shutdown 2020-06-02 19:48 ` [PATCH v2 1/1] " John Snow @ 2020-06-04 12:39 ` Kevin Wolf 2020-06-04 19:03 ` John Snow 0 siblings, 1 reply; 4+ messages in thread From: Kevin Wolf @ 2020-06-04 12:39 UTC (permalink / raw) To: John Snow Cc: Cleber Rosa, philmd, qemu-devel, Eduardo Habkost, Markus Armbruster Am 02.06.2020 um 21:48 hat John Snow geschrieben: > This is done primarily to avoid the 'bare except' pattern, which > suppresses ALL exceptions and not just ones that we are anticipating to > see. > > Replace this with a pattern that isolates the different kind of shutdown > paradigms and a new fallback shutdown handler that gracefully attempts > one before the other. > > Move all of the post-shutdown code into _post_shutdown, and adjust the > main shutdown() function to ALWAYS call the post_shutdown logic no > matter what kind of error we encountered: An expected one (subprocess > wait timeout) or an unexpected one (Everything else). > > In cases where we encounter an expected error in the graceful shutdown > timeout, we will not re-raise an exception above shutdown(). Otherwise, > after post_shutdown cleanup, we will. > > I anticipate that this WILL lead to additional bug reports filed against > this module, but that is unfortunately somewhat the point: This code > shouldn't be hiding failures that exist elsewhere within the python > code. > > Signed-off-by: John Snow <jsnow@redhat.com> > --- > python/qemu/machine.py | 93 +++++++++++++++++++++++++++++------------- > 1 file changed, 65 insertions(+), 28 deletions(-) > > diff --git a/python/qemu/machine.py b/python/qemu/machine.py > index 041c615052e..a2abd2c35e3 100644 > --- a/python/qemu/machine.py > +++ b/python/qemu/machine.py > @@ -283,6 +283,8 @@ def _post_launch(self): > self._qmp.accept() > > def _post_shutdown(self): > + self._load_io_log() > + > if self._qemu_log_file is not None: > self._qemu_log_file.close() > self._qemu_log_file = None > @@ -296,6 +298,17 @@ def _post_shutdown(self): > while len(self._remove_files) > 0: > self._remove_if_exists(self._remove_files.pop()) > > + exitcode = self.exitcode() > + if exitcode is not None and exitcode < 0: > + msg = 'qemu received signal %i; command: "%s"' > + if self._qemu_full_args: > + command = ' '.join(self._qemu_full_args) > + else: > + command = '' > + LOG.warning(msg, -int(exitcode), command) > + > + self._launched = False It would have been a little easier to review if this code motion to _post_shutdown() were a separate patch. It looks mostly independent of the other changes. The big change here is that we now log warnings for exit due to a signal for QEMUMachine.wait(), too. I assume this is okay. > def launch(self): > """ > Launch the VM and make sure we cleanup and expose the > @@ -344,12 +357,53 @@ def wait(self): > self._popen.wait() > if self._qmp: > self._qmp.close() > - self._load_io_log() > self._post_shutdown() > > - def shutdown(self, has_quit=False, hard=False): > + def _hard_shutdown(self) -> None: > """ > - Terminate the VM and clean up > + Kill the VM if it is running. > + """ > + if not self.is_running(): > + return > + > + self._popen.kill() > + self._popen.wait(timeout=60) > + > + def _soft_shutdown(self, has_quit: bool = False, timeout: int = 3) -> None: > + """ > + Attempt to shutdown the VM gracefully if it is running. > + > + :param has_quit: When True, don't attempt to issue 'quit' QMP command > + :param timeout: Timeout for graceful shutdown. Default 3 seconds. > + """ > + if not self.is_running(): > + return > + > + if self._qmp is not None: > + if not has_quit: > + self._qmp.cmd('quit') > + self._qmp.close() > + > + self._popen.wait(timeout=timeout) > + > + def _do_shutdown(self, has_quit: bool = False, timeout: int = 3) -> None: > + """ > + Attempt to shutdown the VM gracefully; fallback to a hard shutdown. > + > + :param has_quit: When True, don't attempt to issue 'quit' QMP command > + :param timeout: Timeout for graceful shutdown. Default 3 seconds. > + """ > + try: > + self._soft_shutdown(has_quit, timeout) > + except subprocess.TimeoutExpired: > + self._hard_shutdown() > + except: > + self._hard_shutdown() > + raise > + > + def shutdown(self, has_quit: bool = False, hard: bool = False) -> None: > + """ > + Terminate the VM (gracefully if possible) and perform cleanup. > """ > # If we keep the console socket open, we may deadlock waiting > # for QEMU to exit, while QEMU is waiting for the socket to > @@ -358,35 +412,18 @@ def shutdown(self, has_quit=False, hard=False): > self._console_socket.close() > self._console_socket = None > > - if self.is_running(): > + try: > if hard: > - self._popen.kill() > - elif self._qmp: > - try: > - if not has_quit: > - self._qmp.cmd('quit') > - self._qmp.close() > - self._popen.wait(timeout=3) > - except: > - self._popen.kill() > - self._popen.wait() > - > - self._load_io_log() > - self._post_shutdown() > - > - exitcode = self.exitcode() > - if exitcode is not None and exitcode < 0 and \ > - not (exitcode == -9 and hard): The condition in the second line went away without a replacement. This means that previously, intentionally killing a VM with hard=True would not log a warning (which makes sense to me), but now it does. On the other hand, I couldn't find a single user of hard=True or the kill() method in the whole tree. So I guess it doesn't matter anyway? > - msg = 'qemu received signal %i: %s' > - if self._qemu_full_args: > - command = ' '.join(self._qemu_full_args) > + self._hard_shutdown() > else: > - command = '' > - LOG.warning(msg, -int(exitcode), command) > - > - self._launched = False > + self._do_shutdown(has_quit) > + finally: > + self._post_shutdown() > > def kill(self): > + """ > + Terminate the VM forcefully and perform cleanup. > + """ > self.shutdown(hard=True) > > def set_qmp_monitor(self, enabled=True): In the end, while the new behaviour is a bit different in details, it's not unreasonable, and if it does cause any problems, we can still change it back. Reviewed-by: Kevin Wolf <kwolf@redhat.com> ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v2 1/1] python/machine.py: refactor shutdown 2020-06-04 12:39 ` Kevin Wolf @ 2020-06-04 19:03 ` John Snow 0 siblings, 0 replies; 4+ messages in thread From: John Snow @ 2020-06-04 19:03 UTC (permalink / raw) To: Kevin Wolf Cc: Cleber Rosa, philmd, qemu-devel, Eduardo Habkost, Markus Armbruster On 6/4/20 8:39 AM, Kevin Wolf wrote: > Am 02.06.2020 um 21:48 hat John Snow geschrieben: >> This is done primarily to avoid the 'bare except' pattern, which >> suppresses ALL exceptions and not just ones that we are anticipating to >> see. >> >> Replace this with a pattern that isolates the different kind of shutdown >> paradigms and a new fallback shutdown handler that gracefully attempts >> one before the other. >> >> Move all of the post-shutdown code into _post_shutdown, and adjust the >> main shutdown() function to ALWAYS call the post_shutdown logic no >> matter what kind of error we encountered: An expected one (subprocess >> wait timeout) or an unexpected one (Everything else). >> >> In cases where we encounter an expected error in the graceful shutdown >> timeout, we will not re-raise an exception above shutdown(). Otherwise, >> after post_shutdown cleanup, we will. >> >> I anticipate that this WILL lead to additional bug reports filed against >> this module, but that is unfortunately somewhat the point: This code >> shouldn't be hiding failures that exist elsewhere within the python >> code. >> >> Signed-off-by: John Snow <jsnow@redhat.com> >> --- >> python/qemu/machine.py | 93 +++++++++++++++++++++++++++++------------- >> 1 file changed, 65 insertions(+), 28 deletions(-) >> >> diff --git a/python/qemu/machine.py b/python/qemu/machine.py >> index 041c615052e..a2abd2c35e3 100644 >> --- a/python/qemu/machine.py >> +++ b/python/qemu/machine.py >> @@ -283,6 +283,8 @@ def _post_launch(self): >> self._qmp.accept() >> >> def _post_shutdown(self): >> + self._load_io_log() >> + >> if self._qemu_log_file is not None: >> self._qemu_log_file.close() >> self._qemu_log_file = None >> @@ -296,6 +298,17 @@ def _post_shutdown(self): >> while len(self._remove_files) > 0: >> self._remove_if_exists(self._remove_files.pop()) >> >> + exitcode = self.exitcode() >> + if exitcode is not None and exitcode < 0: >> + msg = 'qemu received signal %i; command: "%s"' >> + if self._qemu_full_args: >> + command = ' '.join(self._qemu_full_args) >> + else: >> + command = '' >> + LOG.warning(msg, -int(exitcode), command) >> + >> + self._launched = False > > It would have been a little easier to review if this code motion to > _post_shutdown() were a separate patch. It looks mostly independent of > the other changes. > That was hasty of me, you're right. > The big change here is that we now log warnings for exit due to a signal > for QEMUMachine.wait(), too. I assume this is okay. > My assumption is that it's never wrong to see that QEMU was killed instead of shut down gracefully. >> def launch(self): >> """ >> Launch the VM and make sure we cleanup and expose the >> @@ -344,12 +357,53 @@ def wait(self): >> self._popen.wait() >> if self._qmp: >> self._qmp.close() >> - self._load_io_log() >> self._post_shutdown() >> >> - def shutdown(self, has_quit=False, hard=False): >> + def _hard_shutdown(self) -> None: >> """ >> - Terminate the VM and clean up >> + Kill the VM if it is running. >> + """ >> + if not self.is_running(): >> + return >> + >> + self._popen.kill() >> + self._popen.wait(timeout=60) >> + >> + def _soft_shutdown(self, has_quit: bool = False, timeout: int = 3) -> None: >> + """ >> + Attempt to shutdown the VM gracefully if it is running. >> + >> + :param has_quit: When True, don't attempt to issue 'quit' QMP command >> + :param timeout: Timeout for graceful shutdown. Default 3 seconds. >> + """ >> + if not self.is_running(): >> + return >> + >> + if self._qmp is not None: >> + if not has_quit: >> + self._qmp.cmd('quit') >> + self._qmp.close() >> + >> + self._popen.wait(timeout=timeout) >> + >> + def _do_shutdown(self, has_quit: bool = False, timeout: int = 3) -> None: >> + """ >> + Attempt to shutdown the VM gracefully; fallback to a hard shutdown. >> + >> + :param has_quit: When True, don't attempt to issue 'quit' QMP command >> + :param timeout: Timeout for graceful shutdown. Default 3 seconds. >> + """ >> + try: >> + self._soft_shutdown(has_quit, timeout) >> + except subprocess.TimeoutExpired: >> + self._hard_shutdown() >> + except: >> + self._hard_shutdown() >> + raise >> + >> + def shutdown(self, has_quit: bool = False, hard: bool = False) -> None: >> + """ >> + Terminate the VM (gracefully if possible) and perform cleanup. >> """ >> # If we keep the console socket open, we may deadlock waiting >> # for QEMU to exit, while QEMU is waiting for the socket to >> @@ -358,35 +412,18 @@ def shutdown(self, has_quit=False, hard=False): >> self._console_socket.close() >> self._console_socket = None >> >> - if self.is_running(): >> + try: >> if hard: >> - self._popen.kill() >> - elif self._qmp: >> - try: >> - if not has_quit: >> - self._qmp.cmd('quit') >> - self._qmp.close() >> - self._popen.wait(timeout=3) >> - except: >> - self._popen.kill() >> - self._popen.wait() >> - >> - self._load_io_log() >> - self._post_shutdown() >> - >> - exitcode = self.exitcode() >> - if exitcode is not None and exitcode < 0 and \ >> - not (exitcode == -9 and hard): > > The condition in the second line went away without a replacement. This > means that previously, intentionally killing a VM with hard=True would > not log a warning (which makes sense to me), but now it does. > > On the other hand, I couldn't find a single user of hard=True or the > kill() method in the whole tree. So I guess it doesn't matter anyway? > I was thinking that it was never wrong to log a warning when QEMU ends non-gracefully, but I suppose this is indeed not for exclusive use of iotests and I should not question a user command to kill QEMU. I addressed this in the commit message, but now I'm thinking I actually should reinstate this logic. >> - msg = 'qemu received signal %i: %s' >> - if self._qemu_full_args: >> - command = ' '.join(self._qemu_full_args) >> + self._hard_shutdown() >> else: >> - command = '' >> - LOG.warning(msg, -int(exitcode), command) >> - >> - self._launched = False >> + self._do_shutdown(has_quit) >> + finally: >> + self._post_shutdown() >> >> def kill(self): >> + """ >> + Terminate the VM forcefully and perform cleanup. >> + """ >> self.shutdown(hard=True) >> >> def set_qmp_monitor(self, enabled=True): > > In the end, while the new behaviour is a bit different in details, it's > not unreasonable, and if it does cause any problems, we can still change > it back. > > Reviewed-by: Kevin Wolf <kwolf@redhat.com> > I'll touch this up based on your feedback before taking your RB. Thank you for the reviews so far! Our python cleaned up a lot faster and more easily than I had expected. I know we shouldn't over-focus on making tests beautiful, but I perceived a real problem with script rot that I wanted to address. Thanks for all of your help, --js ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2020-06-04 19:05 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-06-02 19:48 [PATCH v2 0/1] python/machine.py: refactor shutdown John Snow 2020-06-02 19:48 ` [PATCH v2 1/1] " John Snow 2020-06-04 12:39 ` Kevin Wolf 2020-06-04 19:03 ` John Snow
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).