Re: [PATCH] tests/qemu_iotests: Minimize usage of used ports

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Max Reitz <mreitz@redhat.com>
To: "Lukáš Doktor" <ldoktor@redhat.com>, qemu-block@nongnu.org
Cc: kwolf@redhat.com, Stefano Garzarella <sgarzare@redhat.com>,
	qemu-devel@nongnu.org, Stefan Hajnoczi <stefanha@redhat.com>,
	Aarushi Mehta <mehta.aaru20@gmail.com>
Subject: Re: [PATCH] tests/qemu_iotests: Minimize usage of used ports
Date: Thu, 6 Feb 2020 17:37:45 +0100	[thread overview]
Message-ID: <726ca911-be83-c2d5-ff3f-efa32bc2233e@redhat.com> (raw)
In-Reply-To: <595ffedf-4213-f000-96e1-3216e272b1c1@redhat.com>

On 06.02.20 17:27, Lukáš Doktor wrote:
> Dne 06. 02. 20 v 16:03 Max Reitz napsal(a):
>> On 03.02.20 08:59, Lukáš Doktor wrote:
>>> Using a range of ports from 32768 to 65538 is dangerous as some
>>> application might already be listening there and interfere with the
>>> testing. There is no way to reserve ports, but let's decrease the chance
>>> of interactions by only using ports that were free at the time of
>>> importing this module.
>>>
>>> Without this patch CI occasionally fails with used ports. Additionally I
>>> tried listening on the first port to be tried via "nc -l localhost
>>> $port" and no matter how many other ports were available it always
>>> hanged for infinity.
>>
>> I’m afraid I don’t quite understand.  The new functions check whether
>> the ports are available for use by creating a server on them (i.e.,
>> binding a socket there).  The current code just lets qemu create a
>> server there, and see if that works or not.
>>
>> So the only difference I can see is that instead of trying out random
>> ports during the test and see whether they’re free to use we do this
>> check only once when the test is started.
>>
>> And the only problem I can imagine from your description is that there
>> is some other tool on the system that tries to set up a server but
>> cannot because we already have an NBD server there (by accident).
>>
>> But I don’t see how checking for free ports once at startup solves that
>> problem reliably.
>>
>> If what I guessed above is right, the only reliable solution I can
>> imagine would be to allow users to specify the port range through
>> environment variables, and then you’d have to specify a range that you
>> know is free for use.
>>
>> Max
>>
> 
> Hello Max,
> 
> thank you and I am sorry for not digging deep enough. This week my CI failed with:
> 
> 01:24:06 DEBUG| [stdout] +ERROR: test_inet (__main__.QemuNBD)
> 01:24:06 DEBUG| [stdout] +----------------------------------------------------------------------
> 01:24:06 DEBUG| [stdout] +Traceback (most recent call last):
> 01:24:06 DEBUG| [stdout] +  File "147", line 85, in setUp
> 01:24:06 DEBUG| [stdout] +    self.vm.launch()
> 01:24:06 DEBUG| [stdout] +  File "/home/jenkins/ppc64le/qemu-master/tests/qemu-iotests/../../python/qemu/machine.py", line 302, in launch
> 01:24:06 DEBUG| [stdout] +    self._launch()
> 01:24:06 DEBUG| [stdout] +  File "/home/jenkins/ppc64le/qemu-master/tests/qemu-iotests/../../python/qemu/machine.py", line 319, in _launch
> 01:24:06 DEBUG| [stdout] +    self._pre_launch()
> 01:24:06 DEBUG| [stdout] +  File "/home/jenkins/ppc64le/qemu-master/tests/qemu-iotests/../../python/qemu/qtest.py", line 106, in _pre_launch
> 01:24:06 DEBUG| [stdout] +    super(QEMUQtestMachine, self)._pre_launch()
> 01:24:06 DEBUG| [stdout] +  File "/home/jenkins/ppc64le/qemu-master/tests/qemu-iotests/../../python/qemu/machine.py", line 270, in _pre_launch
> 01:24:06 DEBUG| [stdout] +    self._qmp = qmp.QEMUMonitorProtocol(self._vm_monitor, server=True)
> 01:24:06 DEBUG| [stdout] +  File "/home/jenkins/ppc64le/qemu-master/tests/qemu-iotests/../../python/qemu/qmp.py", line 60, in __init__
> 01:24:06 DEBUG| [stdout] +    self.__sock.bind(self.__address)
> 01:24:06 DEBUG| [stdout] +OSError: [Errno 98] Address already in use
> 
> I made the mistake of reproducing this on my home system using the qemu revision that I had and assuming it's caused by a used port. So I limited the port range and used nc to occupy the port. It sort-of reproduced but instead of Address already in use it hanged until I kill the nc process. Then it failed with:
> 
> +Traceback (most recent call last):
> +  File "147", line 124, in test_inet
> +    flatten_sock_addr(address))
> +  File "147", line 59, in client_test
> +    self.assert_qmp(result, 'return', {})
> +  File "/home/medic/Work/Projekty/qemu/tests/qemu-iotests/iotests.py", line 821, in assert_qmp
> +    result = self.dictpath(d, path)
> +  File "/home/medic/Work/Projekty/qemu/tests/qemu-iotests/iotests.py", line 797, in dictpath
> +    self.fail('failed path traversal for "%s" in "%s"' % (path, str(d)))
> +AssertionError: failed path traversal for "return" in "{'error': {'class': 'GenericError', 'desc': 'Failed to read initial magic: Unexpected end-of-file before all bytes were read'}}"
> 
> After a brief study I thought qemu is not doing the job well enough and wanted to add a protection. Anyway after a more thorough overview I came to a different conclusion and that is that we are facing the same issue as with incoming migration about a year ago. What happened is that I started "nc -l localhost 32789" which results in:
> 
> COMMAND   PID  USER   FD   TYPE  DEVICE SIZE/OFF NODE NAME
> nc      26758 medic    3u  IPv6 9579487      0t0  TCP localhost:32789 (LISTEN)
> 
> Then we start the server by "_try_server_up" where qemu-nbd detects the port is occupied on IPv6 but available on IPv4, so it claims it:
> COMMAND   PID  USER   FD   TYPE  DEVICE SIZE/OFF NODE NAME
> nc        26758 medic    3u  IPv6 9579487      0t0  TCP localhost:32789 (LISTEN)
> qemu-nbd  26927 medic    4u  IPv4 9591857      0t0  TCP localhost:32789 (LISTEN)
> 
> and reports success. Then we try to connect but the hotplugged VM first attempts to connect on the IPv6 address and hangs for infinity.
> 
> Now is this an expected behavior? If so then we need the find_free_address (but preferably directly in _try_server_up just before starting the qemu-nbd) to leave as little time-frame for collision as possible. Otherwise the test is alright and qemu-nbd needs a fix to bail out in case some address is already used (IIRC this is what incoming migration does).

Ah, OK.

Well, expected behavior...  It’s a shame, that’s what it is.

> My second mistake was testing this on the old code-base and rebasing it only before sending the patch (without testing after the rebase). If I were to test it first, I would have found out that the real reproducer is simply running the test as the commit 8dff69b9415b4287e900358744b732195e1ab2e2 broke it.
> 
> 
> So basically there are 2 actions:
> 
> 1. fix the test as on my system it fails in 100% of cases, bisect says the first bad commit is 8dff69b9415b4287e900358744b732195e1ab2e2. Would anyone have time in digging into this? I already spent way too much time on this and don't really know what that commit is trying to do.

Yep, I’ve sent a patch:

https://lists.nongnu.org/archive/html/qemu-block/2020-02/msg00294.html

> 2. decide on the behavior when IPv4/6 is already in use (bail-out or start).
> 2a. In case it should bail-out than the test is correct and there is no need for my patch. On the other hand qemu-nbd has to be fixed

I don’t think it makes much sense to let qemu’s NBD server ensure that
it claims both IPv4 and IPv6 in case the user specifies a
non-descriptive hostname.

> 2b. Otherwise I can send a v2 that will check the port in the _try_server_up just before starting qemu-nbd to minimize the risk of using a utilized port (or should you decide it's not worth checking, I can simply forget about this)

Hm.  It wouldn’t be fully reliable, but, well...  The risk would be minimal.

OTOH, would it work if we just did a %s/localhost/127.0.0.1/ in the
test?  We have specific cases for IPv6, so I think it makes sense to
force IPv4 in all other cases.

Max

next prev parent reply	other threads:[~2020-02-06 16:38 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-03  7:59 [PATCH] tests/qemu_iotests: Minimize usage of used ports Lukáš Doktor
2020-02-03 15:32 ` Eric Blake
2020-02-06 15:03 ` Max Reitz
2020-02-06 16:27   ` Lukáš Doktor
2020-02-06 16:37     ` Max Reitz [this message]
2020-02-06 16:48       ` Eric Blake
2020-02-06 16:59         ` Max Reitz
2020-02-06 18:33           ` Lukáš Doktor
2020-02-07  8:24             ` Max Reitz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=726ca911-be83-c2d5-ff3f-efa32bc2233e@redhat.com \
    --to=mreitz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=ldoktor@redhat.com \
    --cc=mehta.aaru20@gmail.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=sgarzare@redhat.com \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).