All of lore.kernel.org
 help / color / mirror / Atom feed
From: g1pi@libero.it
To: qemu-devel@nongnu.org
Subject: possible race condition in qemu nat layer or virtio-net
Date: Sun, 25 Feb 2024 19:22:50 +0100	[thread overview]
Message-ID: <ZduFeuu-QvWe6OG7@moon> (raw)

[-- Attachment #1: Type: text/plain, Size: 3179 bytes --]

Hi all.

I believe I spotted a race condition in virtio-net or qemu/kvm (but
only when virtio-net is involved).

To replicate, one needs a virtualization environment similar to

Host:
- debian 12 x86_64, kernel 6.1.0-18-amd64
- caching name server listening on 127.0.0.1
- qemu version 7.2.9 (Debian 1:7.2+dfsg-7+deb12u5)
- command line:
    qemu-system-x86_64 \
	-enable-kvm \
	-daemonize \
	-parallel none \
	-serial none \
	-m 256 \
	-drive if=virtio,format=raw,file=void.raw \
	-monitor unix:run/void.mon,server,nowait \
	-nic user,model=virtio,hostfwd=tcp:127.0.0.1:3822-:22

Guest:
- x86_64, linux/musl or linux/glibc or freebsd or openbsd
- /etc/resolv.conf:
    nameserver 10.0.2.2         i.e. the caching dns in the host
    nameserver 192.168.1.123    non existent

and run the attached program in the guest.

The program opens a UDP socket, sends out a bunch of (dns) requests,
poll()s on the socket, and then receives the responses.

If a delay is inserted between the sendto() calls, the (unique) response
from the host is received correctly:

    $ ./a.out 10.0.2.2 >/dev/null # to warm up the host cache
    $ ./a.out 10.0.2.2 delay 192.168.1.123
    poll: 1 1 1
    recvfrom() 45
    <response packet>
    recvfrom() -1

If the sento()s are performed in short order, the response packet
gets lost:

    $ ./a.out 10.0.2.2 >/dev/null # to warm up the host cache
    $ ./a.out 10.0.2.2 192.168.1.123
    poll: 0 1 0
    recvfrom() -1
    recvfrom() -1

A tcpdump capture on the host side shows no difference between the two cases.

Tcpdump on the guest side is another story: in the good case, it looks like
this

7:32:44.332 IP 10.0.2.15.43276 > 10.0.2.2.53: 33452+ A? example.com. (29)
7:32:44.333 IP 10.0.2.2.53 > 10.0.2.15.43276: 33452 1/0/0 A 93.184.216.34 (45)
7:32:44.349 IP 10.0.2.15.43276 > 192.168.1.123.53: 33452+ A? example.com. (29)

while in the bad case it looks like this

7:32:55.358 IP 10.0.2.15.46537 > 10.0.2.2.53: 33452+ A? example.com. (29)
7:32:55.358 IP 10.0.2.15.46537 > 192.168.1.123.53: 33452+ A? example.com. (29)
7:32:55.358 IP *127.0.0.1*.53 > 10.0.2.15.46537: 33452 1/0/0 A 93.184.216.34 (45)

where the response packet has wrong src ip.

Looks like a failure of the NAT layer, but it does not happen when
the guest uses another emulated network driver: don't know whether it's
because the relevant code is in virtio-net or because other drivers add
overhead that masks the issue.

There's nothing special in port 53: I was just investigating
a weird failure in name resolution in a MUSL based guest
(https://www.openwall.com/lists/musl/2024/02/17/3) and wrote the program
to mimic MUSL resolver's behaviour.

But it succeeds/fails consistently with a different port, and in all
guests I tried (as long as the emulated network device is virtio-net).

To see the issue, it's important that the response to the first request
is so fast that it's simultaneous with the second request: that's the reason
behind the caching nameserver in the host.

I also opened a bug report to debian
(https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1064634).

I'm not subscribed to qemu-devel, so please CC me in replies.

Best regards,
        g.b.

[-- Attachment #2: m.c --]
[-- Type: text/x-csrc, Size: 2049 bytes --]

#include <stdio.h>
#include <time.h>
#include <poll.h>
#include <assert.h>
#include <string.h>

#include <arpa/inet.h>
#include <netdb.h>
#include <netinet/in.h>
#include <sys/socket.h>
#include <sys/socket.h>
#include <sys/types.h>

static void dump(const char *s, size_t len) {
    while (len--) {
        char t = *s++;
        if (' ' <= t && t <= '~' && t != '\\')
            printf("%c", t);
        else
            printf("\\%o", t & 0xff);
    }
    printf("\n");
}

int main(int argc, char *argv[]) {
    int sock, rv, n;
    const char req[] =
        "\202\254\1\0\0\1\0\0\0\0\0\0\7example\3com\0\0\1\0\1";
    struct timespec delay_l = { 1, 0 }; /* 1 sec */
    struct pollfd pfs;
    struct sockaddr_in me = { 0 };

    sock = socket(AF_INET, SOCK_DGRAM | SOCK_CLOEXEC | SOCK_NONBLOCK,
                  IPPROTO_IP);
    assert(sock >= 0);

    me.sin_family = AF_INET;
    me.sin_port = 0;
    me.sin_addr.s_addr = inet_addr("0.0.0.0");
    rv = bind(sock, (struct sockaddr *) &me, sizeof me);
    assert(0 == rv);

    for (n = 1; n < argc; n++) {
        if (0 == strcmp("delay", argv[n])) {
            struct timespec delay_s = { 0, (1 << 24) }; /* ~ 16 msec */
            nanosleep(&delay_s, NULL);
        } else {
            struct sockaddr_in dst = { 0 };
            dst.sin_family = AF_INET;
            dst.sin_port = htons(53);
            dst.sin_addr.s_addr = inet_addr(argv[n]);
            rv = sendto(sock, req, sizeof req - 1, MSG_NOSIGNAL,
                        (struct sockaddr *) &dst, sizeof dst);
            assert(rv >= 0);
        }
    }

    nanosleep(&delay_l, NULL);
    pfs.fd = sock;
    pfs.events = POLLIN;
    rv = poll(&pfs, 1, 2000);
    printf("poll: %d %d %d\n", rv, pfs.events, pfs.revents);

    for (n = 1; n < argc; n++) {
        char resp[4000];
        if (0 == strcmp("delay", argv[n]))
            continue;
        rv = recvfrom(sock, resp, sizeof resp, 0, NULL, NULL);
        printf("recvfrom() %d\n", rv);
        if (rv > 0)
            dump(resp, rv);
    }

    return 0;
}

                 reply	other threads:[~2024-02-25 21:03 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZduFeuu-QvWe6OG7@moon \
    --to=g1pi@libero.it \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.