All of lore.kernel.org
 help / color / mirror / Atom feed
From: Philipp Hahn <hahn@univention.de>
To: linux-kernel@vger.kernel.org,
	Herbert Xu <herbert@gondor.apana.org.au>,
	Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	"David S. Miller" <davem@davemloft.net>
Subject: Strange PF_NETLINK NETLINK_ROUTE stall: netlink: Fix autobind race condition that leads to zero port ID
Date: Fri, 27 Nov 2015 16:23:54 +0100	[thread overview]
Message-ID: <5658758A.3020602@univention.de> (raw)

[-- Attachment #1: Type: text/plain, Size: 5997 bytes --]

Hello,

I observed a strange stall in a multi-threaded process (originally
OpenLDAP slapd), which stalls during a call to getaddrinfo() after
upgrading the kernel from 3.16.x to 4.1.12:

> #0  0x00007f8e6d97b87d in recvmsg () at ../sysdeps/unix/syscall-template.S:82
> #1  0x00007f8e6d99bfbc in make_request (fd=12, pid=2871, seen_ipv4=<optimized out>, seen_ipv6=<optimized out>, in6ai=<optimized out>, in6ailen=<optimized out>) at ../sysdeps/unix/sysv/linux/check_pf.c:119
> #2  0x00007f8e6d99c42a in __check_pf (seen_ipv4=0x7f8e6a0969ff, seen_ipv6=0x7f8e6a0969fe, in6ai=0x7f8e6a0969e0, in6ailen=0x7f8e6a0969d8) at ../sysdeps/unix/sysv/linux/check_pf.c:272
> #3  0x00007f8e6d959007 in *__GI_getaddrinfo (name=0x7f8e6a096a80 "0.0.0.0", service=0x400d04 "7636", hints=0x7f8e6a096a50, pai=0x7f8e6a096a48) at ../sysdeps/posix/getaddrinfo.c:2389

(The glibc tries to get the routing table to see via netlink, if it
should support IPv4 and/or IPv6. Neither
<https://sourceware.org/bugzilla/show_bug.cgi?id=12926> nor
<https://sourceware.org/git/?p=glibc.git;a=commit;h=636064eb4c03397c86aa26e489e68f952bd5e53f>
fixed the problem.)

I've been able to reproduce it by writing the attached program, which
calls getaddrinfo() from 10 thread 10^6 times. On my 8-core Intel I see
several threads hanging in the above call after some seconds - you don't
have to wait for the process to terminate):

> gdb --batch --ex 'set pagination off' --ex 'thread apply all bt' -p `pidof a.out` | grep rec
...
> #0  0x00007f3817bcc87d in recvmsg () at ../sysdeps/unix/syscall-template.S:82
> #0  0x00007f3817bcc87d in recvmsg () at ../sysdeps/unix/syscall-template.S:82
> #0  0x00007f3817bcc87d in recvmsg () at ../sysdeps/unix/syscall-template.S:82
> #0  0x00007f3817bcc87d in recvmsg () at ../sysdeps/unix/syscall-template.S:82
> #0  0x00007f3817bcc87d in recvmsg () at ../sysdeps/unix/syscall-template.S:82
> #0  0x00007f3817bcc87d in recvmsg () at ../sysdeps/unix/syscall-template.S:82
> #0  0x00007f3817bcc87d in recvmsg () at ../sysdeps/unix/syscall-template.S:82
> #0  0x00007f3817bcc87d in recvmsg () at ../sysdeps/unix/syscall-template.S:82
> #0  0x00007f3817bcc87d in recvmsg () at ../sysdeps/unix/syscall-template.S:82

Using "git bisect start v4.1.11 v.4.10" I've been able to identify
4e27762417669cb459971635be550eb7b5598286 (v4.1.9-26-g4e27762), which is
a back-port of 1f770c0a09da855a2b51af6d19de97fb955eca85
(v4.2-11228-g1f770c0). I've attached the log.

> # git log -1 4e27762417669cb459971635be550eb7b5598286
> commit 4e27762417669cb459971635be550eb7b5598286
> Author: Herbert Xu <herbert@gondor.apana.org.au>
> Date:   Fri Sep 18 19:16:50 2015 +0800
> 
>     netlink: Fix autobind race condition that leads to zero port ID
>     
>     [ Upstream commit 1f770c0a09da855a2b51af6d19de97fb955eca85 ]
>     
>     The commit c0bb07df7d981e4091432754e30c9c720e2c0c78 ("netlink:
>     Reset portid after netlink_insert failure") introduced a race
>     condition where if two threads try to autobind the same socket
>     one of them may end up with a zero port ID.  This led to kernel
>     deadlocks that were observed by multiple people.
>     
>     This patch reverts that commit and instead fixes it by introducing
>     a separte rhash_portid variable so that the real portid is only set
>     after the socket has been successfully hashed.
>     
>     Fixes: c0bb07df7d98 ("netlink: Reset portid after netlink_insert failure")
>     Reported-by: Tejun Heo <tj@kernel.org>
>     Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
>     Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

I verified 4e27762417669cb459971635be550eb7b5598286~1 does not show the
problem.

v4.1.13 still has the problem.
> # git log --oneline  v4.1..v4.1.13 -- net/netlink/af_netlink.c
> e962218 netlink: Trim skb to alloc size to avoid MSG_TRUNC
> d486236 netlink: Replace rhash_portid with bound
> 4e27762 netlink: Fix autobind race condition that leads to zero port ID
> 65d48c6 netlink, mmap: transform mmap skb into full skb on taps
> d397617 netlink: make sure -EBUSY won't escape from netlink_insert
> b265c30 netlink: don't hold mutex in rcu callback when releasing mmapd ring

v4.3 does not show the problem.
> # git log --oneline  v4.2..v4.3 -- net/netlink/af_netlink.c
> 47191d6 netlink: fix locking around NETLINK_LIST_MEMBERSHIPS
> db65a3a netlink: Trim skb to alloc size to avoid MSG_TRUNC
> da314c9 netlink: Replace rhash_portid with bound
> 1f770c0 netlink: Fix autobind race condition that leads to zero port ID
> 1853c94 netlink, mmap: transform mmap skb into full skb on taps
> 6bb0fef netlink, mmap: fix edge-case leakages in nf queue zero-copy
> a66e365 netlink, mmap: don't walk rx ring on poll if receive queue non-empty
> 0ef7077 netlink: rx mmap: fix POLLIN condition
> 7084a31 netlink: mmap: fix lookup frame position
> 0a6a3a2 netlink: add NETLINK_CAP_ACK socket option

v4.2 does not show the problem (as expected, as the patch isn't in 4.2.0
> # git log --oneline  v4.2..v4.2.6 -- net/netlink/af_netlink.c
> 326ce2c netlink: Trim skb to alloc size to avoid MSG_TRUNC
> e2a3131 netlink: Replace rhash_portid with bound
> 6e32e73 netlink: Fix autobind race condition that leads to zero port ID
> 62f43b5 netlink, mmap: transform mmap skb into full skb on taps ring

I no longer have an idea where to look next, as something outside of
af_netlink.c seems to have fixed the issue for at least 4.3. Any idea?

Kernel: 4.1.12 (Debian)
GLIBC: 2.13-38+deb7u8
CPU: Intel(R) Core(TM) i7 CPU         860  @ 2.80GHz

Thank you and a (hopefully) happy weekend.
Philipp
-- 
Philipp Hahn
Open Source Software Engineer

Univention GmbH
be open.
Mary-Somerville-Str. 1
D-28359 Bremen
Tel.: +49 421 22232-0
Fax : +49 421 22232-99
hahn@univention.de

http://www.univention.de/
Geschäftsführer: Peter H. Ganten
HRB 20755 Amtsgericht Bremen
Steuer-Nr.: 71-597-02876

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 40059_glic-NL.c --]
[-- Type: text/x-csrc; name="40059_glic-NL.c", Size: 2189 bytes --]

/*
 * <https://forge.univention.org/bugzilla/show_bug.cgi?id=40059>
 */
#define _GNU_SOURCE
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <pthread.h>
#include <grp.h>
#include <string.h>
#include <limits.h>
#include <stdarg.h>
#include <ctype.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>

#define ARRAY_SIZE(a) ((sizeof (a)) / sizeof (*(a)))
#define BUFLEN 4096

// cyrus-sasl
int check(void) {
	const char *addr = "0.0.0.0;7636";
	char hbuf[NI_MAXHOST];
	struct addrinfo hints, *ai = NULL;
	int i, j;

	/* Parse the address */
	for (i = 0; addr[i] != '\0' && addr[i] != ';'; i++) {
		if (i >= NI_MAXHOST)
			return 3;
		hbuf[i] = addr[i];
	}
	hbuf[i] = '\0';

	if (addr[i] == ';')
		i++;
	/* XXX: Do we need this check? */
	for (j = i; addr[j] != '\0'; j++)
		if (!isdigit((int)(addr[j])))
			return 2;

	memset(&hints, 0, sizeof(hints));
	hints.ai_family = PF_UNSPEC;
	hints.ai_socktype = SOCK_STREAM;
	hints.ai_flags = AI_PASSIVE | AI_NUMERICHOST;
	if (getaddrinfo(hbuf, &addr[i], &hints, &ai) != 0)
		return 1;

	freeaddrinfo(ai);

	return 0;
}

static pthread_t thread[10];

static void *run(void *data) {
	unsigned int round, j;

	unsigned int self = (unsigned int)(unsigned long)data;
	pthread_t tid = pthread_self();

	printf("[%d] POSIX thread ID is %u\n", self, (unsigned int)tid);

	for (round = 0; round < 1000; round++) {
		for (j = 0; j < 1000; j++) {
			int i;
			i = check();
			if (i)
				break;
		}
		printf("%d\n", round);

		unsigned int usecs = (round * (unsigned int)tid) % 1000000;
		usleep(usecs);
		if (round % 10 == 0)
			putchar('0' + (self % 10));
	}

	printf("[%d] end\n", self);
	pthread_exit(NULL);
}

int main(int argc, char *argv[], char *envp[]) {
	pid_t pid = getpid();
	printf("pid=%d\n", pid);

	unsigned int t;
	for (t = 0; t < ARRAY_SIZE(thread); t++)
		pthread_create(&thread[t], NULL, run, (void *)(long)t);

	int ret = 0;
	void *value_ptr;
	for (t = 0; t < ARRAY_SIZE(thread); t++)
		ret |= pthread_join(thread[t], &value_ptr);

	return ret;
}
/* :!gcc -O0 -g -Wall -lpthread % */

[-- Attachment #3: 40059_linux-NL.bisect --]
[-- Type: text/plain, Size: 1832 bytes --]

git bisect start
# bad: [205a8514e63a221c3a5071f27521013444e43e5f] Linux 4.1.11
git bisect bad 205a8514e63a221c3a5071f27521013444e43e5f
# good: [b953c0d234bc72e8489d3bf51a276c5c4ec85345] Linux 4.1
git bisect good b953c0d234bc72e8489d3bf51a276c5c4ec85345
# good: [4b6bf170e83e666974ce00d7251dd0d4b5ca5439] ocfs2: fix shift left overflow
git bisect good 4b6bf170e83e666974ce00d7251dd0d4b5ca5439
# good: [0796f55ba4db268f3a6a6c6bb9e481ce9eaf5d48] ideapad-laptop: Add Lenovo Yoga 3 14 to no_hw_rfkill dmi list
git bisect good 0796f55ba4db268f3a6a6c6bb9e481ce9eaf5d48
# bad: [dc7a3d707c9784f26048d6f059e7467a8a92e04b] spi: Fix documentation of spi_alloc_master()
git bisect bad dc7a3d707c9784f26048d6f059e7467a8a92e04b
# good: [8bb9225a7bd3a07aadd58b19a633f1f7a8884d01] net: eth: altera: fix napi poll_list corruption
git bisect good 8bb9225a7bd3a07aadd58b19a633f1f7a8884d01
# bad: [b6ac3aee1c73d5b107ae6ec6e3715b39db782781] iser-target: Put the reference on commands waiting for unsol data
git bisect bad b6ac3aee1c73d5b107ae6ec6e3715b39db782781
# bad: [4e27762417669cb459971635be550eb7b5598286] netlink: Fix autobind race condition that leads to zero port ID
git bisect bad 4e27762417669cb459971635be550eb7b5598286
# good: [cf9cf6bc253c5d83a8cb593d4ed7042bb4f04e7c] macvtap: fix TUNSETSNDBUF values > 64k
git bisect good cf9cf6bc253c5d83a8cb593d4ed7042bb4f04e7c
# good: [fce134644303a4ebe0892edc9af7d46e0ebda9ab] fib_rules: fix fib rule dumps across multiple skbs
git bisect good fce134644303a4ebe0892edc9af7d46e0ebda9ab
# good: [282117acdfac7b5d621b18f002a9751d7f583daf] net: phy: fixed_phy: handle link-down case
git bisect good 282117acdfac7b5d621b18f002a9751d7f583daf
# good: [d600176461e85a0bf3b963587e4ca92d494a7a47] mvneta: use inband status only when explicitly enabled
git bisect good d600176461e85a0bf3b963587e4ca92d494a7a47

             reply	other threads:[~2015-11-27 15:32 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-27 15:23 Philipp Hahn [this message]
2015-12-05  1:10 ` Strange PF_NETLINK NETLINK_ROUTE stall: netlink: Fix autobind race condition that leads to zero port ID Herbert Xu
2015-12-05 14:16   ` Philipp Matthias Hahn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5658758A.3020602@univention.de \
    --to=hahn@univention.de \
    --cc=davem@davemloft.net \
    --cc=herbert@gondor.apana.org.au \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.