qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Daniel P. Berrange" <berrange@redhat.com>
To: Knut Omang <knut.omang@oracle.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH v4 4/4] sockets: Handle race condition between binds to the same port
Date: Mon, 26 Jun 2017 11:22:54 +0100	[thread overview]
Message-ID: <20170626102254.GG495@redhat.com> (raw)
In-Reply-To: <51d7f54d100e9dedecf6dc65691ca65adfc8394f.1498213152.git-series.knut.omang@oracle.com>

On Fri, Jun 23, 2017 at 12:31:08PM +0200, Knut Omang wrote:
> If an offset of ports is specified to the inet_listen_saddr function(),
> and two or more processes tries to bind from these ports at the same time,
> occasionally more than one process may be able to bind to the same
> port. The condition is detected by listen() but too late to avoid a failure.
> 
> This function is called by socket_listen() and used
> by all socket listening code in QEMU, so all cases where any form of dynamic
> port selection is used should be subject to this issue.
> 
> Add code to close and re-establish the socket when this
> condition is observed, hiding the race condition from the user.
> 
> This has been developed and tested by means of the
> test-listen unit test in the previous commit.
> Enable the test for make check now that it passes.
> 
> Signed-off-by: Knut Omang <knut.omang@oracle.com>
> Reviewed-by: Bhavesh Davda <bhavesh.davda@oracle.com>
> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
> Reviewed-by: Girish Moodalbail <girish.moodalbail@oracle.com>
> ---
>  tests/Makefile.include |  2 +-
>  util/qemu-sockets.c    | 68 ++++++++++++++++++++++++++++++++-----------
>  2 files changed, 53 insertions(+), 17 deletions(-)
> 
> diff --git a/tests/Makefile.include b/tests/Makefile.include
> index 22bb97e..c38f94e 100644
> --- a/tests/Makefile.include
> +++ b/tests/Makefile.include
> @@ -127,7 +127,7 @@ check-unit-y += tests/test-bufferiszero$(EXESUF)
>  gcov-files-check-bufferiszero-y = util/bufferiszero.c
>  check-unit-y += tests/test-uuid$(EXESUF)
>  check-unit-y += tests/ptimer-test$(EXESUF)
> -#check-unit-y += tests/test-listen$(EXESUF)
> +check-unit-y += tests/test-listen$(EXESUF)
>  gcov-files-ptimer-test-y = hw/core/ptimer.c
>  check-unit-y += tests/test-qapi-util$(EXESUF)
>  gcov-files-test-qapi-util-y = qapi/qapi-util.c
> diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
> index 48b9319..7b118b4 100644
> --- a/util/qemu-sockets.c
> +++ b/util/qemu-sockets.c
> @@ -201,6 +201,42 @@ static int try_bind(int socket, InetSocketAddress *saddr, struct addrinfo *e)
>  #endif
>  }
>  
> +static int try_bind_listen(int *socket, InetSocketAddress *saddr,
> +                           struct addrinfo *e, int port, Error **errp)
> +{
> +    int s = *socket;
> +    int ret;
> +
> +    inet_setport(e, port);
> +    ret = try_bind(s, saddr, e);
> +    if (ret) {
> +        if (errno != EADDRINUSE) {
> +            error_setg_errno(errp, errno, "Failed to bind socket");
> +        }
> +        return errno;
> +    }
> +    if (listen(s, 1) == 0) {
> +            return 0;
> +    }
> +    if (errno == EADDRINUSE) {
> +        /* We got to bind the socket to a port but someone else managed
> +         * to bind to the same port and beat us to listen on it!
> +         * Recreate the socket and return EADDRINUSE to preserve the
> +         * expected state by the caller:
> +         */
> +        closesocket(s);
> +        s = create_fast_reuse_socket(e, errp);
> +        if (s < 0) {
> +            return errno;
> +        }
> +        *socket = s;

I don't really like this at all - if we need to close + recreate the
socket, IMHO that should remain the job of the caller, since it owns
the socket FD ultimately.

> +        errno = EADDRINUSE;
> +        return errno;
> +    }
> +    error_setg_errno(errp, errno, "Failed to listen on socket");
> +    return errno;
> +}
> +
>  static int inet_listen_saddr(InetSocketAddress *saddr,
>                               int port_offset,
>                               bool update_addr,
> @@ -210,7 +246,9 @@ static int inet_listen_saddr(InetSocketAddress *saddr,
>      char port[33];
>      char uaddr[INET6_ADDRSTRLEN+1];
>      char uport[33];
> -    int slisten, rc, port_min, port_max, p;
> +    int rc, port_min, port_max, p;
> +    int slisten = 0;
> +    int saved_errno = 0;
>      Error *err = NULL;
>  
>      memset(&ai,0, sizeof(ai));
> @@ -276,28 +314,26 @@ static int inet_listen_saddr(InetSocketAddress *saddr,

Just above this line is the original 'create_fast_reuse_socket' call.

I'd suggest that we push that call down into the body of the loop
below:

>          port_min = inet_getport(e);
>          port_max = saddr->has_to ? saddr->to + port_offset : port_min;
>          for (p = port_min; p <= port_max; p++) {
> -            inet_setport(e, p);
> -            if (try_bind(slisten, saddr, e) >= 0) {
> -                goto listen;
> -            }
> -            if (p == port_max) {
> -                if (!e->ai_next) {
> -                    error_setg_errno(errp, errno, "Failed to bind socket");
> -                }
> +            int eno = try_bind_listen(&slisten, saddr, e, p, &err);

Which would mean try_bind_listen no longer needs the magic to close +
recreate the socket.

The only cost of doing this is that you end up closing + recreating the
socket after bind hits EADDRINUSE, as well as after listen() hits it.

I think that's acceptable tradeoff for simpler code, since this is not
a performance critical operation.

> +            if (!eno) {
> +                goto listen_ok;
> +            } else if (eno != EADDRINUSE) {
> +                goto listen_failed;
>              }
>          }
> +    }
> +    error_setg_errno(errp, errno, "Failed to find available port");

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

  reply	other threads:[~2017-06-26 10:23 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-23 10:31 [Qemu-devel] [PATCH v4 0/4] Unit test+fix for problem with QEMU handling of multiple bind()s to the same port Knut Omang
2017-06-23 10:31 ` [Qemu-devel] [PATCH v4 1/4] tests: Add test-listen - a stress test for QEMU socket listen Knut Omang
2017-06-23 10:31 ` [Qemu-devel] [PATCH v4 2/4] sockets: factor out create_fast_reuse_socket Knut Omang
2017-06-26 10:28   ` Daniel P. Berrange
2017-06-26 11:56     ` Knut Omang
2017-06-26 12:00       ` Daniel P. Berrange
2017-07-02  6:26     ` Knut Omang
2017-06-23 10:31 ` [Qemu-devel] [PATCH v4 3/4] sockets: factor out a new try_bind() function Knut Omang
2017-06-23 10:31 ` [Qemu-devel] [PATCH v4 4/4] sockets: Handle race condition between binds to the same port Knut Omang
2017-06-26 10:22   ` Daniel P. Berrange [this message]
2017-06-26 12:32     ` Knut Omang
2017-06-26 12:49       ` Daniel P. Berrange
2017-07-02  8:17         ` Knut Omang
2017-06-26 10:34   ` Daniel P. Berrange
2017-07-02  8:15     ` Knut Omang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170626102254.GG495@redhat.com \
    --to=berrange@redhat.com \
    --cc=knut.omang@oracle.com \
    --cc=kraxel@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).