qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Lukáš Doktor" <ldoktor@redhat.com>
To: Amador Pahim <apahim@redhat.com>, Eduardo Habkost <ehabkost@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>,
	kwolf@redhat.com, Fam Zheng <famz@redhat.com>,
	qemu-devel@nongnu.org, mreitz@redhat.com,
	Cleber Rosa <crosa@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v3 1/3] qemu.py: fix is_running()
Date: Fri, 21 Jul 2017 09:34:37 +0200	[thread overview]
Message-ID: <7550b799-e417-5dac-40b6-d2207fc9d5d2@redhat.com> (raw)
In-Reply-To: <CALAZnb3W+NnjKafbmZ1Bn77Dm=YqfkVDKinz+VU2BF7D-x-7mA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 7800 bytes --]

Dne 20.7.2017 v 22:14 Amador Pahim napsal(a):
> On Thu, Jul 20, 2017 at 7:49 PM, Eduardo Habkost <ehabkost@redhat.com> wrote:
>> On Thu, Jul 20, 2017 at 05:09:11PM +0200, Markus Armbruster wrote:
>>> Amador Pahim <apahim@redhat.com> writes:
>>>
>>>> On Thu, Jul 20, 2017 at 1:49 PM, Markus Armbruster <armbru@redhat.com> wrote:
>>>>> Amador Pahim <apahim@redhat.com> writes:
>>>>>
>>>>>> Current implementation is broken. It does not really test if the child
>>>>>> process is running.
>>>>>
>>>>> What usage exactly is broken by this?  Got a reproducer for me?
>>>>
>>>> Problem is that 'returncode' is not set without a calling
>>>> poll()/wait()/communicate(), so it's only useful to test if the
>>>> process is running after such calls. But if we use 'poll()' instead,
>>>> it will, according to the docs, "Check if child process has
>>>> terminated. Set and return returncode attribute."
>>>>
>>>> Reproducer is:
>>>>
>>>>  >>> import subprocess
>>>>  >>> devnull = open('/dev/null', 'rb')
>>>>  >>> p = subprocess.Popen(['qemu-system-x86_64', '-broken'],
>>>> stdin=devnull, stdout=devnull, stderr=devnull, shell=False)
>>>>  >>> print p.returncode
>>>>  None
>>>>  >>> print p.poll()
>>>>  1
>>>>  >>> print p.returncode
>>>>  1
>>>>
>>>>>> The Popen.returncode will only be set after by a poll(), wait() or
>>>>>> communicate(). If the Popen fails to launch a VM, the Popen.returncode
>>>>>> will not turn to None by itself.
>>>>>
>>>>> Hmm.  What is the value of .returncode then?
>>>>
>>>> returncode starts with None and becomes the process exit code when the
>>>> process is over and one of that three methods is called (poll(),
>>>> wait() or communicate()).
>>>>
>>>> There's an error in my description though. The correct would be: "The
>>>> Popen.returncode will only be set after a call to poll(), wait() or
>>>> communicate(). If the Popen fails to launch a VM, the Popen.returncode
>>>> will not turn from None to the actual return code by itself."
>>>
>>> Suggest to add ", and is_running() continues to report True".
>>>
>>>>>> Instead of using Popen.returncode, let's use Popen.poll(), which
>>>>>> actually checks if child process has terminated.
>>>>>>
>>>>>> Signed-off-by: Amador Pahim <apahim@redhat.com>
>>>>>> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
>>>>>> Reviewed-by: Fam Zheng <famz@redhat.com>
>>>>>> ---
>>>>>>  scripts/qemu.py | 2 +-
>>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/scripts/qemu.py b/scripts/qemu.py
>>>>>> index 880e3e8219..f0fade32bd 100644
>>>>>> --- a/scripts/qemu.py
>>>>>> +++ b/scripts/qemu.py
>>>>>> @@ -86,7 +86,7 @@ class QEMUMachine(object):
>>>>>>              raise
>>>>>>
>>>>>>      def is_running(self):
>>>>>> -        return self._popen and (self._popen.returncode is None)
>>>>>> +        return self._popen and (self._popen.poll() is None)
>>>>>>
>>
>> After re-reading shutdown(), I think this is _not_ OK: if
>> is_running() return False before we call .wait(), we will never
>> load the log file or run _post_shutdown() if QEMU exits between
>> the launch() and shutdown() calls.
> 
> Yes, I just noticed that while cleaning up the code.
> 
>>
>> Yes, it's fragile.
>>
>> The root problem on both launch() and shutdown() seems to be
>> coupling the external "is QEMU running?" state with the internal
>> "did we load the log file and ran _post_shutdown() already?"
>> state.
>>
>> I see two possible approaches for this:
>>
>> 1) Benefit from the fact that the internal Popen state will not
>>    change under our feet unless we explicitly call
>>    poll()/wait()/etc, and keep the existing code.  (Not my
>>    favorite option)
>>
>> 2) Rewrite the code so that we don't depend on the subtle Popen
>>    internal state rules, and track our own internal state in
>>    a QEMUMachine attribute.  e.g.:
> 
> +1 for this approach. I'm working on something similar, thanks for the
> detailed "e.g." code here.
> 
>>
>>     def _handle_shutdown(self):
>>         '''Load log file and call _post_shutdown() hook if necessary'''
>>         # Must be called only after QEMU actually exited.
>>         assert not self.is_running()
>>         if self._shutdown_pending:
>>             if self.exitcode() < 0:
>>                 sys.stderr.write('qemu received signal %i: %s\n' % (-exitcode, ' '.join(self._args)))
>>             self._load_io_log()
>>             self._post_shutdown()
>>             self._shutdown_pending = False
>>
>>     def _terminate(self):
>>         '''Terminate QEMU if it's still running'''
>>         if self.is_running():
>>             try:
>>                 self._qmp.cmd('quit')
>>                 self._qmp.close()
>>             except:
>>                 self._popen.kill()
>>                 self._popen.wait()
>>
>>     def _launch(self):
>>         '''Launch the VM and establish a QMP connection'''
>>         devnull = open('/dev/null', 'rb')
>>         qemulog = open(self._qemu_log_path, 'wb')
>>         self._shutdown_pending = True
>>         self._pre_launch()
>>         args = self._wrapper + [self._binary] + self._base_args() + self._args
>>         self._popen = subprocess.Popen(args, stdin=devnull, stdout=qemulog,
>>                                        stderr=subprocess.STDOUT, shell=False)
>>         self._post_launch()
>>
>>     def launch(self):
>>         try:
>>             self._launch()
>>         except:
>>             self._terminate()
>>             self._handle_shutdown()
>>             raise
>>
>>     def shutdown(self):
>>         '''Terminate the VM and clean up'''
>>         self._terminate()
>>         self._handle_shutdown()
>>
This part also caught my attention and I also meant to improve it when this series is merged. Anyway let's state my suggestions here, take it or let it go:

1. `get_log` should check whether `self._iolog` is `None` and then it should check for process status
2. the `self._iolog` is `None` is the indication whether `shutdown` was called or not (not whether the process exists or not)
3. add `__del__` to cleanup in case one forgets to call `shutdown` (currently the files and processes are left behind)
4. use `name = "qemu-%d-%d" % (os.getpid(), id(self))` to allow multiple instances with default name at the same time.


Also I just realized that even with just this patch (as is) files/processes can be left behind:

    >>> import qemu, os
    >>> a=qemu.QEMUMachine("/usr/bin/qemu-kvm", debug=True)
    >>> a.launch()
    QMP:>>> {'execute': 'qmp_capabilities'}
    QMP:<<< {u'return': {}}
    >>> a.is_running()
    False
    >>> a.shutdown()
    >>> os.path.exists(a._qemu_log_path)
    True

Before this patch it worked well as (as Eduardo mentioned) the `is_running` was tracing internal state, not the process state.

Regards,
Lukáš

>>
>> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
>>
>>>>>>      def exitcode(self):
>>>>>>          if self._popen is None:
>>>>>                return None
>>>>>            return self._popen.returncode
>>>>>
>>>>> Why is this one safe?
>>>>
>>>> Here it's used just to retrieve the value from the Popen.returncode.
>>>> It's not being used to check whether the process is running or not.
>>>
>>> If self._popen is not None, we return self._popen.returncode.  It's None
>>> if .poll() etc. haven't been called.  Can this happen?  If not, why not?
>>> If yes, why is returning None then okay?
>>
>> It can't happen because the only caller of exitcode()
>> (device-crash-test) calls it immediately after shutdown().  But
>> it would be nice to make exitcode() behavior consistent with
>> is_running().
>>
>> --
>> Eduardo



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 502 bytes --]

  reply	other threads:[~2017-07-21  7:34 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-20  9:18 [Qemu-devel] [PATCH v3 0/3] scripts/qemu.py small fixes Amador Pahim
2017-07-20  9:19 ` [Qemu-devel] [PATCH v3 1/3] qemu.py: fix is_running() Amador Pahim
2017-07-20 11:49   ` Markus Armbruster
2017-07-20 12:57     ` Amador Pahim
2017-07-20 15:09       ` Markus Armbruster
2017-07-20 15:46         ` Amador Pahim
2017-07-20 17:49         ` Eduardo Habkost
2017-07-20 20:14           ` Amador Pahim
2017-07-21  7:34             ` Lukáš Doktor [this message]
2017-07-20  9:19 ` [Qemu-devel] [PATCH v3 2/3] qemu.py: include debug information on launch error Amador Pahim
2017-07-20 11:58   ` Markus Armbruster
2017-07-20 13:14     ` Amador Pahim
2017-07-20 14:43       ` Eduardo Habkost
2017-07-20 15:51         ` Amador Pahim
2017-07-20 15:01       ` Markus Armbruster
2017-07-20 15:50         ` Amador Pahim
2017-07-20  9:19 ` [Qemu-devel] [PATCH v3 3/3] qemu.py: make 'args' public Amador Pahim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7550b799-e417-5dac-40b6-d2207fc9d5d2@redhat.com \
    --to=ldoktor@redhat.com \
    --cc=apahim@redhat.com \
    --cc=armbru@redhat.com \
    --cc=crosa@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=famz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).