Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net 1/1] qed: Fix mask for physical address in ILT entry
From: David Miller @ 2018-05-22 19:33 UTC (permalink / raw)
  To: shahed.shaikh; +Cc: netdev, Ariel.Elior, Dept-EngEverestLinuxL2
In-Reply-To: <20180521193147.18628-1-shahed.shaikh@cavium.com>

From: Shahed Shaikh <shahed.shaikh@cavium.com>
Date: Mon, 21 May 2018 12:31:47 -0700

> ILT entry requires 12 bit right shifted physical address.
> Existing mask for ILT entry of physical address i.e.
> ILT_ENTRY_PHY_ADDR_MASK is not sufficient to handle 64bit
> address because upper 8 bits of 64 bit address were getting
> masked which resulted in completer abort error on
> PCIe bus due to invalid address.
> 
> Fix that mask to handle 64bit physical address.
> 
> Fixes: fe56b9e6a8d9 ("qed: Add module with basic common support")
> 
> Signed-off-by: Shahed Shaikh <shahed.shaikh@cavium.com>
> Signed-off-by: Ariel Elior <ariel.elior@cavium.com>

Please do not put an empty line between Fixes: and other tags in the
future.  I fixed it up this time.

Applied, thanks.

^ permalink raw reply

* [PATCH] selftests: uevent filtering
From: Christian Brauner @ 2018-05-22 19:34 UTC (permalink / raw)
  To: shuah, keescook, tglx, kstewart, gregkh, mic, linux-kernel,
	linux-kselftest
  Cc: ebiederm, netdev, davem, Christian Brauner

Recent discussions around uevent filtering (cf. net-next commit [1], [2],
and [3] and discussions in [4], [5], and [6]) have shown that the semantics
around uevent filtering where not well understood.
Now that we have settled - at least for the moment - how uevent filtering
should look like let's add some selftests to ensure we don't regress
anything in the future.
Note, the semantics of uevent filtering are described in detail in my
commit message to [2] so I won't repeat them here.

[1]: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=90d52d4fd82007005125d9a8d2d560a1ca059b9d
[2]: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=a3498436b3a0f8ec289e6847e1de40b4123e1639
[3]: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=26045a7b14bc7a5455e411d820110f66557d6589
[4]: https://lkml.org/lkml/2018/4/4/739
[5]: https://lkml.org/lkml/2018/4/26/767
[6]: https://lkml.org/lkml/2018/4/26/738

Signed-off-by: Christian Brauner <christian@brauner.io>
---
 tools/testing/selftests/uevent/Makefile       |  17 +
 tools/testing/selftests/uevent/config         |   2 +
 .../selftests/uevent/uevent_filtering.c       | 486 ++++++++++++++++++
 3 files changed, 505 insertions(+)
 create mode 100644 tools/testing/selftests/uevent/Makefile
 create mode 100644 tools/testing/selftests/uevent/config
 create mode 100644 tools/testing/selftests/uevent/uevent_filtering.c

diff --git a/tools/testing/selftests/uevent/Makefile b/tools/testing/selftests/uevent/Makefile
new file mode 100644
index 000000000000..f7baa9aa2932
--- /dev/null
+++ b/tools/testing/selftests/uevent/Makefile
@@ -0,0 +1,17 @@
+# SPDX-License-Identifier: GPL-2.0
+all:
+
+include ../lib.mk
+
+.PHONY: all clean
+
+BINARIES := uevent_filtering
+CFLAGS += -Wl,-no-as-needed -Wall
+
+uevent_filtering: uevent_filtering.c ../kselftest.h ../kselftest_harness.h
+	$(CC) $(CFLAGS) $< -o $@
+
+TEST_PROGS += $(BINARIES)
+EXTRA_CLEAN := $(BINARIES)
+
+all: $(BINARIES)
diff --git a/tools/testing/selftests/uevent/config b/tools/testing/selftests/uevent/config
new file mode 100644
index 000000000000..1038f4515be8
--- /dev/null
+++ b/tools/testing/selftests/uevent/config
@@ -0,0 +1,2 @@
+CONFIG_USER_NS=y
+CONFIG_NET=y
diff --git a/tools/testing/selftests/uevent/uevent_filtering.c b/tools/testing/selftests/uevent/uevent_filtering.c
new file mode 100644
index 000000000000..f83391aa42cf
--- /dev/null
+++ b/tools/testing/selftests/uevent/uevent_filtering.c
@@ -0,0 +1,486 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+#include <errno.h>
+#include <fcntl.h>
+#include <linux/netlink.h>
+#include <signal.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/prctl.h>
+#include <sys/socket.h>
+#include <sched.h>
+#include <sys/eventfd.h>
+#include <sys/stat.h>
+#include <sys/syscall.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <unistd.h>
+
+#include "../kselftest.h"
+#include "../kselftest_harness.h"
+
+#define __DEV_FULL "/sys/devices/virtual/mem/full/uevent"
+#define __UEVENT_BUFFER_SIZE (2048 * 2)
+#define __UEVENT_HEADER "add@/devices/virtual/mem/full"
+#define __UEVENT_HEADER_LEN sizeof("add@/devices/virtual/mem/full")
+#define __UEVENT_LISTEN_ALL -1
+
+ssize_t read_nointr(int fd, void *buf, size_t count)
+{
+	ssize_t ret;
+
+again:
+	ret = read(fd, buf, count);
+	if (ret < 0 && errno == EINTR)
+		goto again;
+
+	return ret;
+}
+
+ssize_t write_nointr(int fd, const void *buf, size_t count)
+{
+	ssize_t ret;
+
+again:
+	ret = write(fd, buf, count);
+	if (ret < 0 && errno == EINTR)
+		goto again;
+
+	return ret;
+}
+
+int wait_for_pid(pid_t pid)
+{
+	int status, ret;
+
+again:
+	ret = waitpid(pid, &status, 0);
+	if (ret == -1) {
+		if (errno == EINTR)
+			goto again;
+
+		return -1;
+	}
+
+	if (ret != pid)
+		goto again;
+
+	if (!WIFEXITED(status) || WEXITSTATUS(status) != 0)
+		return -1;
+
+	return 0;
+}
+
+static int uevent_listener(unsigned long post_flags, bool expect_uevent,
+			   int sync_fd)
+{
+	int sk_fd, ret;
+	socklen_t sk_addr_len;
+	int fret = -1, rcv_buf_sz = __UEVENT_BUFFER_SIZE;
+	uint64_t sync_add = 1;
+	struct sockaddr_nl sk_addr = { 0 }, rcv_addr = { 0 };
+	char buf[__UEVENT_BUFFER_SIZE] = { 0 };
+	struct iovec iov = { buf, __UEVENT_BUFFER_SIZE };
+	char control[CMSG_SPACE(sizeof(struct ucred))];
+	struct msghdr hdr = {
+		&rcv_addr, sizeof(rcv_addr), &iov, 1,
+		control,   sizeof(control),  0,
+	};
+
+	sk_fd = socket(AF_NETLINK, SOCK_RAW | SOCK_CLOEXEC,
+		       NETLINK_KOBJECT_UEVENT);
+	if (sk_fd < 0) {
+		fprintf(stderr, "%s - Failed to open uevent socket\n", strerror(errno));
+		return -1;
+	}
+
+	ret = setsockopt(sk_fd, SOL_SOCKET, SO_RCVBUF, &rcv_buf_sz,
+			 sizeof(rcv_buf_sz));
+	if (ret < 0) {
+		fprintf(stderr, "%s - Failed to set socket options\n", strerror(errno));
+		goto on_error;
+	}
+
+	sk_addr.nl_family = AF_NETLINK;
+	sk_addr.nl_groups = __UEVENT_LISTEN_ALL;
+
+	sk_addr_len = sizeof(sk_addr);
+	ret = bind(sk_fd, (struct sockaddr *)&sk_addr, sk_addr_len);
+	if (ret < 0) {
+		fprintf(stderr, "%s - Failed to bind socket\n", strerror(errno));
+		goto on_error;
+	}
+
+	ret = getsockname(sk_fd, (struct sockaddr *)&sk_addr, &sk_addr_len);
+	if (ret < 0) {
+		fprintf(stderr, "%s - Failed to retrieve socket name\n", strerror(errno));
+		goto on_error;
+	}
+
+	if ((size_t)sk_addr_len != sizeof(sk_addr)) {
+		fprintf(stderr, "Invalid socket address size\n");
+		goto on_error;
+	}
+
+	if (post_flags & CLONE_NEWUSER) {
+		ret = unshare(CLONE_NEWUSER);
+		if (ret < 0) {
+			fprintf(stderr,
+				"%s - Failed to unshare user namespace\n",
+				strerror(errno));
+			goto on_error;
+		}
+	}
+
+	if (post_flags & CLONE_NEWNET) {
+		ret = unshare(CLONE_NEWNET);
+		if (ret < 0) {
+			fprintf(stderr,
+				"%s - Failed to unshare network namespace\n",
+				strerror(errno));
+			goto on_error;
+		}
+	}
+
+	ret = write_nointr(sync_fd, &sync_add, sizeof(sync_add));
+	close(sync_fd);
+	if (ret != sizeof(sync_add)) {
+		fprintf(stderr, "Failed to synchronize with parent process\n");
+		goto on_error;
+	}
+
+	fret = 0;
+	for (;;) {
+		ssize_t r;
+
+		r = recvmsg(sk_fd, &hdr, 0);
+		if (r <= 0) {
+			fprintf(stderr, "%s - Failed to receive uevent\n", strerror(errno));
+			ret = -1;
+			break;
+		}
+
+		/* ignore libudev messages */
+		if (memcmp(buf, "libudev", 8) == 0)
+			continue;
+
+		/* ignore uevents we didn't trigger */
+		if (memcmp(buf, __UEVENT_HEADER, __UEVENT_HEADER_LEN) != 0)
+			continue;
+
+		if (!expect_uevent) {
+			fprintf(stderr, "Received unexpected uevent:\n");
+			ret = -1;
+		}
+
+		if (TH_LOG_ENABLED) {
+			/* If logging is enabled dump the received uevent. */
+			(void)write_nointr(STDERR_FILENO, buf, r);
+			(void)write_nointr(STDERR_FILENO, "\n", 1);
+		}
+
+		break;
+	}
+
+on_error:
+	close(sk_fd);
+
+	return fret;
+}
+
+int trigger_uevent(unsigned int times)
+{
+	int fd, ret;
+	unsigned int i;
+
+	fd = open(__DEV_FULL, O_RDWR | O_CLOEXEC);
+	if (fd < 0) {
+		if (errno != ENOENT)
+			return -EINVAL;
+
+		return -1;
+	}
+
+	for (i = 0; i < times; i++) {
+		ret = write_nointr(fd, "add\n", sizeof("add\n") - 1);
+		if (ret < 0) {
+			fprintf(stderr, "Failed to trigger uevent\n");
+			break;
+		}
+	}
+	close(fd);
+
+	return ret;
+}
+
+int set_death_signal(void)
+{
+	int ret;
+	pid_t ppid;
+
+	ret = prctl(PR_SET_PDEATHSIG, SIGKILL, 0, 0, 0);
+
+	/* Check whether we have been orphaned. */
+	ppid = getppid();
+	if (ppid == 1) {
+		pid_t self;
+
+		self = getpid();
+		ret = kill(self, SIGKILL);
+	}
+
+	if (ret < 0)
+		return -1;
+
+	return 0;
+}
+
+static int do_test(unsigned long pre_flags, unsigned long post_flags,
+		   bool expect_uevent, int sync_fd)
+{
+	int ret;
+	uint64_t wait_val;
+	pid_t pid;
+	sigset_t mask;
+	sigset_t orig_mask;
+	struct timespec timeout;
+
+	sigemptyset(&mask);
+	sigaddset(&mask, SIGCHLD);
+
+	ret = sigprocmask(SIG_BLOCK, &mask, &orig_mask);
+	if (ret < 0) {
+		fprintf(stderr, "%s- Failed to block SIGCHLD\n", strerror(errno));
+		return -1;
+	}
+
+	pid = fork();
+	if (pid < 0) {
+		fprintf(stderr, "%s - Failed to fork() new process\n", strerror(errno));
+		return -1;
+	}
+
+	if (pid == 0) {
+		/* Make sure that we go away when our parent dies. */
+		ret = set_death_signal();
+		if (ret < 0) {
+			fprintf(stderr, "Failed to set PR_SET_PDEATHSIG to SIGKILL\n");
+			_exit(EXIT_FAILURE);
+		}
+
+		if (pre_flags & CLONE_NEWUSER) {
+			ret = unshare(CLONE_NEWUSER);
+			if (ret < 0) {
+				fprintf(stderr,
+					"%s - Failed to unshare user namespace\n",
+					strerror(errno));
+				_exit(EXIT_FAILURE);
+			}
+		}
+
+		if (pre_flags & CLONE_NEWNET) {
+			ret = unshare(CLONE_NEWNET);
+			if (ret < 0) {
+				fprintf(stderr,
+					"%s - Failed to unshare network namespace\n",
+					strerror(errno));
+				_exit(EXIT_FAILURE);
+			}
+		}
+
+		if (uevent_listener(post_flags, expect_uevent, sync_fd) < 0)
+			_exit(EXIT_FAILURE);
+
+		_exit(EXIT_SUCCESS);
+	}
+
+	ret = read_nointr(sync_fd, &wait_val, sizeof(wait_val));
+	if (ret != sizeof(wait_val)) {
+		fprintf(stderr, "Failed to synchronize with child process\n");
+		_exit(EXIT_FAILURE);
+	}
+
+	/* Trigger 10 uevents to account for the case where the kernel might
+	 * drop some.
+	 */
+	ret = trigger_uevent(10);
+	if (ret < 0)
+		fprintf(stderr, "Failed triggering uevents\n");
+
+	/* Wait for 2 seconds before considering this failed. This should be
+	 * plenty of time for the kernel to deliver the uevent even under heavy
+	 * load.
+	 */
+	timeout.tv_sec = 2;
+	timeout.tv_nsec = 0;
+
+again:
+	ret = sigtimedwait(&mask, NULL, &timeout);
+	if (ret < 0) {
+		if (errno == EINTR)
+			goto again;
+
+		if (!expect_uevent)
+			ret = kill(pid, SIGTERM); /* success */
+		else
+			ret = kill(pid, SIGUSR1); /* error */
+		if (ret < 0)
+			return -1;
+	}
+
+	ret = wait_for_pid(pid);
+	if (ret < 0)
+		return -1;
+
+	return ret;
+}
+
+static void signal_handler(int sig)
+{
+	if (sig == SIGTERM)
+		_exit(EXIT_SUCCESS);
+
+	_exit(EXIT_FAILURE);
+}
+
+TEST(uevent_filtering)
+{
+	int ret, sync_fd;
+	struct sigaction act;
+
+	if (geteuid()) {
+		TH_LOG("Uevent filtering tests require root privileges. Skipping test");
+		_exit(KSFT_SKIP);
+	}
+
+	ret = access(__DEV_FULL, F_OK);
+	EXPECT_EQ(0, ret) {
+		if (errno == ENOENT) {
+			TH_LOG(__DEV_FULL " does not exist. Skipping test");
+			_exit(KSFT_SKIP);
+		}
+
+		_exit(KSFT_FAIL);
+	}
+
+	act.sa_handler = signal_handler;
+	act.sa_flags = 0;
+	sigemptyset(&act.sa_mask);
+
+	ret = sigaction(SIGTERM, &act, NULL);
+	ASSERT_EQ(0, ret);
+
+	sync_fd = eventfd(0, EFD_CLOEXEC);
+	ASSERT_GE(sync_fd, 0);
+
+	/*
+	 * Setup:
+	 * - Open uevent listening socket in initial network namespace owned by
+	 *   initial user namespace.
+	 * - Trigger uevent in initial network namespace owned by initial user
+	 *   namespace.
+	 * Expected Result:
+	 * - uevent listening socket receives uevent
+	 */
+	ret = do_test(0, 0, true, sync_fd);
+	ASSERT_EQ(0, ret) {
+		goto do_cleanup;
+	}
+
+	/*
+	 * Setup:
+	 * - Open uevent listening socket in non-initial network namespace
+	 *   owned by initial user namespace.
+	 * - Trigger uevent in initial network namespace owned by initial user
+	 *   namespace.
+	 * Expected Result:
+	 * - uevent listening socket receives uevent
+	 */
+	ret = do_test(CLONE_NEWNET, 0, true, sync_fd);
+	ASSERT_EQ(0, ret) {
+		goto do_cleanup;
+	}
+
+	/*
+	 * Setup:
+	 * - unshare user namespace
+	 * - Open uevent listening socket in initial network namespace
+	 *   owned by initial user namespace.
+	 * - Trigger uevent in initial network namespace owned by initial user
+	 *   namespace.
+	 * Expected Result:
+	 * - uevent listening socket receives uevent
+	 */
+	ret = do_test(CLONE_NEWUSER, 0, true, sync_fd);
+	ASSERT_EQ(0, ret) {
+		goto do_cleanup;
+	}
+
+	/*
+	 * Setup:
+	 * - Open uevent listening socket in non-initial network namespace
+	 *   owned by non-initial user namespace.
+	 * - Trigger uevent in initial network namespace owned by initial user
+	 *   namespace.
+	 * Expected Result:
+	 * - uevent listening socket receives no uevent
+	 */
+	ret = do_test(CLONE_NEWUSER | CLONE_NEWNET, 0, false, sync_fd);
+	ASSERT_EQ(0, ret) {
+		goto do_cleanup;
+	}
+
+	/*
+	 * Setup:
+	 * - Open uevent listening socket in initial network namespace
+	 *   owned by initial user namespace.
+	 * - unshare network namespace
+	 * - Trigger uevent in initial network namespace owned by initial user
+	 *   namespace.
+	 * Expected Result:
+	 * - uevent listening socket receives uevent
+	 */
+	ret = do_test(0, CLONE_NEWNET, true, sync_fd);
+	ASSERT_EQ(0, ret) {
+		goto do_cleanup;
+	}
+
+	/*
+	 * Setup:
+	 * - Open uevent listening socket in initial network namespace
+	 *   owned by initial user namespace.
+	 * - unshare user namespace
+	 * - Trigger uevent in initial network namespace owned by initial user
+	 *   namespace.
+	 * Expected Result:
+	 * - uevent listening socket receives uevent
+	 */
+	ret = do_test(0, CLONE_NEWUSER, true, sync_fd);
+	ASSERT_EQ(0, ret) {
+		goto do_cleanup;
+	}
+
+	/*
+	 * Setup:
+	 * - Open uevent listening socket in initial network namespace
+	 *   owned by initial user namespace.
+	 * - unshare user namespace
+	 * - unshare network namespace
+	 * - Trigger uevent in initial network namespace owned by initial user
+	 *   namespace.
+	 * Expected Result:
+	 * - uevent listening socket receives uevent
+	 */
+	ret = do_test(0, CLONE_NEWUSER | CLONE_NEWNET, true, sync_fd);
+	ASSERT_EQ(0, ret) {
+		goto do_cleanup;
+	}
+
+do_cleanup:
+	close(sync_fd);
+}
+
+TEST_HARNESS_MAIN
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH bpf-next v3 06/10] tools: bpftool: resolve calls without using imm field
From: Jakub Kicinski @ 2018-05-22 19:36 UTC (permalink / raw)
  To: Sandipan Das; +Cc: ast, daniel, netdev, linuxppc-dev, mpe, naveen.n.rao
In-Reply-To: <02b16a573269b492d4449c0e587ccc0020973e8c.1527008647.git.sandipan@linux.vnet.ibm.com>

On Tue, 22 May 2018 22:46:09 +0530, Sandipan Das wrote:
> Currently, we resolve the callee's address for a JITed function
> call by using the imm field of the call instruction as an offset
> from __bpf_call_base. If bpf_jit_kallsyms is enabled, we further
> use this address to get the callee's kernel symbol's name.
> 
> For some architectures, such as powerpc64, the imm field is not
> large enough to hold this offset. So, instead of assigning this
> offset to the imm field, the verifier now assigns the subprog
> id. Also, a list of kernel symbol addresses for all the JITed
> functions is provided in the program info. We now use the imm
> field as an index for this list to lookup a callee's symbol's
> address and resolve its name.
> 
> Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
> Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>

^ permalink raw reply

* Re: [pull request][net-next 0/6] Mellanox, mlx5e updates 2018-05-19
From: David Miller @ 2018-05-22 19:38 UTC (permalink / raw)
  To: saeedm; +Cc: netdev
In-Reply-To: <20180521210502.11082-1-saeedm@mellanox.com>

From: Saeed Mahameed <saeedm@mellanox.com>
Date: Mon, 21 May 2018 14:04:56 -0700

> This is a mlx5e only pull request, for more information please see tag
> log below.
> 
> Please pull and let me know if there's any problem.

The dcbnl vs. devlink shared buffer API issue needs to be discussed more
thoroughly.

Even if now changes happen to the code in the end, the results of the
discussion and example configurations for the new mechanism need to
be added to the commit message.  At a minimum.

Thanks.

^ permalink raw reply

* Re: [PATCH] pcnet32: add an error handling path in pcnet32_probe_pci()
From: David Miller @ 2018-05-22 19:40 UTC (permalink / raw)
  To: chenbo; +Cc: pcnet32, netdev, linux-kernel
In-Reply-To: <20180521214449.18516-1-chenbo@pdx.edu>

From: Bo Chen <chenbo@pdx.edu>
Date: Mon, 21 May 2018 14:44:49 -0700

> Make sure to invoke pci_disable_device() when errors occur in
> pcnet32_probe_pci().
> 
> Signed-off-by: Bo Chen <chenbo@pdx.edu>

Applied, thank you.

^ permalink raw reply

* Re: [PATCH net-next v2 1/7] net: dsa: qca8k: Add QCA8334 binding documentation
From: Rob Herring @ 2018-05-22 19:40 UTC (permalink / raw)
  To: Michal Vokáč
  Cc: netdev, linux-kernel, devicetree, f.fainelli, vivien.didelot,
	andrew, mark.rutland, davem, michal.vokac
In-Reply-To: <1526987792-56861-2-git-send-email-michal.vokac@ysoft.com>

On Tue, May 22, 2018 at 01:16:26PM +0200, Michal Vokáč wrote:
> Add support for the four-port variant of the Qualcomm QCA833x switch.
> 
> The CPU port default link settings can be reconfigured using
> a fixed-link sub-node.
> 
> Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com>
> ---
> Changes in v2:
>  - Add commit message and document fixed-link binding.
> 
>  .../devicetree/bindings/net/dsa/qca8k.txt          | 23 +++++++++++++++++++++-
>  1 file changed, 22 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/devicetree/bindings/net/dsa/qca8k.txt b/Documentation/devicetree/bindings/net/dsa/qca8k.txt
> index 9c67ee4..15b9057 100644
> --- a/Documentation/devicetree/bindings/net/dsa/qca8k.txt
> +++ b/Documentation/devicetree/bindings/net/dsa/qca8k.txt
> @@ -2,7 +2,10 @@
>  
>  Required properties:
>  
> -- compatible: should be "qca,qca8337"
> +- compatible: should be one of:
> +    "qca,qca8334"
> +    "qca,qca8337"
> +
>  - #size-cells: must be 0
>  - #address-cells: must be 1
>  
> @@ -14,6 +17,20 @@ port and PHY id, each subnode describing a port needs to have a valid phandle
>  referencing the internal PHY connected to it. The CPU port of this switch is
>  always port 0.
>  
> +A CPU port node has the following optional property:

s/property/node/

Otherwise,

Reviewed-by: Rob Herring <robh@kernel.org>

^ permalink raw reply

* RE: [PATCH net 1/1] qed: Fix mask for physical address in ILT entry
From: Shaikh, Shahed @ 2018-05-22 19:42 UTC (permalink / raw)
  To: David Miller
  Cc: netdev@vger.kernel.org, Elior, Ariel, Dept-Eng Everest Linux L2
In-Reply-To: <20180522.153354.796330958257215059.davem@davemloft.net>



> -----Original Message-----
> From: David Miller <davem@davemloft.net>
> Sent: Tuesday, May 22, 2018 12:34 PM
> To: Shaikh, Shahed <Shahed.Shaikh@cavium.com>
> Cc: netdev@vger.kernel.org; Elior, Ariel <Ariel.Elior@cavium.com>; Dept-Eng
> Everest Linux L2 <Dept-EngEverestLinuxL2@cavium.com>
> Subject: Re: [PATCH net 1/1] qed: Fix mask for physical address in ILT entry
> 
> From: Shahed Shaikh <shahed.shaikh@cavium.com>
> Date: Mon, 21 May 2018 12:31:47 -0700
> 
> > ILT entry requires 12 bit right shifted physical address.
> > Existing mask for ILT entry of physical address i.e.
> > ILT_ENTRY_PHY_ADDR_MASK is not sufficient to handle 64bit address
> > because upper 8 bits of 64 bit address were getting masked which
> > resulted in completer abort error on PCIe bus due to invalid address.
> >
> > Fix that mask to handle 64bit physical address.
> >
> > Fixes: fe56b9e6a8d9 ("qed: Add module with basic common support")
> >
> > Signed-off-by: Shahed Shaikh <shahed.shaikh@cavium.com>
> > Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
> 
> Please do not put an empty line between Fixes: and other tags in the future.  I
> fixed it up this time.

Sorry about this. Thanks for fixing.

> 
> Applied, thanks.
Can you please queues this fix for -stable?

Thanks,
Shahed

^ permalink raw reply

* Re: [PATCH net-next 0/2] tcp: reduce quickack pressure for ECN
From: David Miller @ 2018-05-22 19:43 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, vanj, ncardwell, ycheng, soheil, eric.dumazet
In-Reply-To: <20180521220857.229273-1-edumazet@google.com>

From: Eric Dumazet <edumazet@google.com>
Date: Mon, 21 May 2018 15:08:55 -0700

> Small patch series changing TCP behavior vs quickack and ECN
> 
> First patch is a refactoring, adding parameter to tcp_incr_quickack()
> and tcp_enter_quickack_mode() helpers.
> 
> Second patch implements the change, lowering number of ACK packets
> sent after an ECN event.

Series applied, thanks Eric.

^ permalink raw reply

* Re: [net-next 0/9][pull request] 40GbE Intel Wired LAN Driver Updates 2018-05-22
From: David Miller @ 2018-05-22 19:46 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, nhorman, sassmann, jogreene
In-Reply-To: <20180522174527.19680-1-jeffrey.t.kirsher@intel.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Tue, 22 May 2018 10:45:18 -0700

> This series contains updates to i40e only.
> 
> Jake provides all the changes in this series starting with making it
> consistent in how we approach the bit lock.  Fixed the reporting of the
> VEB statistics and the queue statistics to always return every queue
> even if it is not currently in use.  Use WARN_ONCE() so that the first
> time we end up with an incorrect size we will dump a stack trace and a
> message to help highlight the issue early in testing.  Folded the fixed
> string prefix into the stat string definition.  Instead of using a
> separate char *p pointer when copying strings, use the data pointer
> directly.  Added code comments for several of the statistic functions to
> better explain the number and ordering of statistics.

Pulled, thanks Jeff.

^ permalink raw reply

* Re: [PATCH bpf-next v3 07/10] bpf: fix multi-function JITed dump obtained via syscall
From: Jakub Kicinski @ 2018-05-22 19:47 UTC (permalink / raw)
  To: Sandipan Das; +Cc: ast, daniel, netdev, linuxppc-dev, mpe, naveen.n.rao
In-Reply-To: <6f245a366d5a2957e2256f4bd89ab56ade6508d5.1527008647.git.sandipan@linux.vnet.ibm.com>

On Tue, 22 May 2018 22:46:10 +0530, Sandipan Das wrote:
> Currently, for multi-function programs, we cannot get the JITed
> instructions using the bpf system call's BPF_OBJ_GET_INFO_BY_FD
> command. Because of this, userspace tools such as bpftool fail
> to identify a multi-function program as being JITed or not.
> 
> With the JIT enabled and the test program running, this can be
> verified as follows:
> 
>   # cat /proc/sys/net/core/bpf_jit_enable
>   1
> 
> Before applying this patch:
> 
>   # bpftool prog list
>   1: kprobe  name foo  tag b811aab41a39ad3d  gpl
>           loaded_at 2018-05-16T11:43:38+0530  uid 0
>           xlated 216B  not jited  memlock 65536B
>   ...
> 
>   # bpftool prog dump jited id 1
>   no instructions returned
> 
> After applying this patch:
> 
>   # bpftool prog list
>   1: kprobe  name foo  tag b811aab41a39ad3d  gpl
>           loaded_at 2018-05-16T12:13:01+0530  uid 0
>           xlated 216B  jited 308B  memlock 65536B
>   ...
> 
>   # bpftool prog dump jited id 1
>      0:   nop
>      4:   nop
>      8:   mflr    r0
>      c:   std     r0,16(r1)
>     10:   stdu    r1,-112(r1)
>     14:   std     r31,104(r1)
>     18:   addi    r31,r1,48
>     1c:   li      r3,10
>   ...
> 
> Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
> ---
>  kernel/bpf/syscall.c | 36 +++++++++++++++++++++++++++++++++---
>  1 file changed, 33 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index f0ad4b5f0224..1c4cba91e523 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -1970,13 +1970,43 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
>  	 * for offload.
>  	 */
>  	ulen = info.jited_prog_len;
> -	info.jited_prog_len = prog->jited_len;
> +	if (prog->aux->func_cnt) {
> +		u32 i;
> +
> +		info.jited_prog_len = 0;
> +		for (i = 0; i < prog->aux->func_cnt; i++)
> +			info.jited_prog_len += prog->aux->func[i]->jited_len;
> +	} else {
> +		info.jited_prog_len = prog->jited_len;
> +	}
> +
>  	if (info.jited_prog_len && ulen) {
>  		if (bpf_dump_raw_ok()) {
>  			uinsns = u64_to_user_ptr(info.jited_prog_insns);
>  			ulen = min_t(u32, info.jited_prog_len, ulen);
> -			if (copy_to_user(uinsns, prog->bpf_func, ulen))
> -				return -EFAULT;
> +
> +			/* for multi-function programs, copy the JITed
> +			 * instructions for all the functions
> +			 */
> +			if (prog->aux->func_cnt) {
> +				u32 len, free, i;
> +				u8 *img;
> +
> +				free = ulen;
> +				for (i = 0; i < prog->aux->func_cnt; i++) {
> +					len = prog->aux->func[i]->jited_len;
> +					img = (u8 *) prog->aux->func[i]->bpf_func;
> +					if (len > free)
> +						break;

nit: interesting, the previous code used to fill up the space
completely, I would personally vote to keep that behaviour and do:

    len = min(len, free);
    copy();
    free -= len;
    if (!free)
        break;

otherwise the user space doesn't know when to stop disassembling
truncated output.  But that's really a corner case, so not sure we care.

> +					if (copy_to_user(uinsns, img, len))
> +						return -EFAULT;
> +					uinsns += len;
> +					free -= len;
> +				}
> +			} else {
> +				if (copy_to_user(uinsns, prog->bpf_func, ulen))
> +					return -EFAULT;
> +			}
>  		} else {
>  			info.jited_prog_insns = 0;
>  		}

^ permalink raw reply

* Re: [PATCH net 1/1] qed: Fix mask for physical address in ILT entry
From: David Miller @ 2018-05-22 19:48 UTC (permalink / raw)
  To: Shahed.Shaikh; +Cc: netdev, Ariel.Elior, Dept-EngEverestLinuxL2
In-Reply-To: <DM2PR07MB154780CFD55C394E7D3212E89D940@DM2PR07MB1547.namprd07.prod.outlook.com>

From: "Shaikh, Shahed" <Shahed.Shaikh@cavium.com>
Date: Tue, 22 May 2018 19:42:46 +0000

> Can you please queues this fix for -stable?

I did, see:

http://patchwork.ozlabs.org/bundle/davem/stable/?series=&submitter=&state=*&q=&archive=

^ permalink raw reply

* Re: [PATCH net-next v11 2/5] netvsc: refactor notifier/event handling code to use the failover framework
From: Michael S. Tsirkin @ 2018-05-22 19:54 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Sridhar Samudrala, stephen, davem, netdev, virtualization,
	virtio-dev, jesse.brandeburg, alexander.h.duyck, kubakici,
	jasowang, loseweigh, aaron.f.brown, anjali.singhai
In-Reply-To: <20180522173844.GP2149@nanopsycho>

On Tue, May 22, 2018 at 07:38:44PM +0200, Jiri Pirko wrote:
> >> >> In private
> >> >> flag. I don't see no reason to break this pattern here.
> >> >
> >> >Other masters are setup from userspace, this one is set up automatically
> >> >by kernel. So the bar is higher, we need an interface that existing
> >> >userspace knows about.  We can't just say "oh if userspace set this up
> >> >it should know to skip lowerdevs".
> >> >
> >> >Otherwise multiple interfaces with same mac tend to confuse userspace.
> >> 
> >> No difference, really.
> >> Regardless who does the setup, and independent userspace deamon should
> >> react accordingly.
> >
> >If the deamon does the setup itself, it's reasonable to require that it
> >learns about new flags each time we add a new driver.  If it doesn't,
> >then I think it's less reasonable.
> 
> No need. The "IFLA_MASTER" attr is always there to be looked at. That is
> enough.

Oh so if it has an master, skip it? Sorry, I misunderstood what you were
saying earlier.

Thanks, this makes sense to me.

-- 
MST

^ permalink raw reply

* Re: [PATCH bpf-next v3 10/10] tools: bpftool: add delimiters to multi-function JITed dumps
From: Jakub Kicinski @ 2018-05-22 19:55 UTC (permalink / raw)
  To: Sandipan Das
  Cc: ast, daniel, netdev, linuxppc-dev, mpe, naveen.n.rao,
	Quentin Monnet
In-Reply-To: <88b61b11ebca5b44bad0c34225b6f2383e5983a5.1527008647.git.sandipan@linux.vnet.ibm.com>

On Tue, 22 May 2018 22:46:13 +0530, Sandipan Das wrote:
> +		if (info.nr_jited_func_lens && info.jited_func_lens) {
> +			struct kernel_sym *sym = NULL;
> +			unsigned char *img = buf;
> +			__u64 *ksyms = NULL;
> +			__u32 *lens;
> +			__u32 i;
> +
> +			if (info.nr_jited_ksyms) {
> +				kernel_syms_load(&dd);
> +				ksyms = (__u64 *) info.jited_ksyms;
> +			}
> +
> +			lens = (__u32 *) info.jited_func_lens;
> +			for (i = 0; i < info.nr_jited_func_lens; i++) {
> +				if (ksyms) {
> +					sym = kernel_syms_search(&dd, ksyms[i]);
> +					if (sym)
> +						printf("%s:\n", sym->name);
> +					else
> +						printf("%016llx:\n", ksyms[i]);
> +				}
> +
> +				disasm_print_insn(img, lens[i], opcodes, name);
> +				img += lens[i];
> +				printf("\n");
> +			}
> +		} else {

The output doesn't seem to be JSON-compatible :(  We try to make sure
all bpftool command can produce valid JSON when run with -j (or -p)
switch.

Would it be possible to make each function a separate JSON object with
"name" and "insn" array?  Would that work?

^ permalink raw reply

* Re: Regression: Approximate 34% performance hit in receive throughput over ixgbe seen due to build_skb patch
From: Alexander Duyck @ 2018-05-22 20:03 UTC (permalink / raw)
  To: William Kucharski
  Cc: LKML, Netdev, intel-wired-lan, Jeff Kirsher, Duyck, Alexander H
In-Reply-To: <4F646FBB-FE0B-4FEE-98E5-3CA2DF0598DE@oracle.com>

On Tue, May 22, 2018 at 12:29 PM, William Kucharski
<william.kucharski@oracle.com> wrote:
>
>
>> On May 22, 2018, at 12:23 PM, Alexander Duyck <alexander.duyck@gmail.com> wrote:
>>
>> 3. There should be a private flag that can be updated via "ethtool
>> --set-priv-flags" called "legacy-rx" that you can enable that will
>> roll back to the original that did the copy-break type approach for
>> small packets and the headers of the frame.
>
> With legacy-rx enabled, most of the regression goes away, but it's still present
> as compared to the code without the patch; the regression then drops to about 6%:
>
> # ethtool --show-priv-flags eno1
> Private flags for eno1:
> legacy-rx: on
>
> Socket  Message  Elapsed      Messages
> Size    Size     Time         Okay Errors   Throughput
> bytes   bytes    secs            #      #   10^6bits/sec
>
>  65536      64   60.00     35934709      0     306.64
>  65536           60.00     33791739            288.35
>
> Socket  Message  Elapsed      Messages
> Size    Size     Time         Okay Errors   Throughput
> bytes   bytes    secs            #      #   10^6bits/sec
>
>  65536      64   60.00     39254351      0     334.97
>  65536           60.00     36761069            313.69
>
> Is this variance to be expected, or do you think modification of the
> interrupt delay would achieve better results?
>
>
>     William Kucharski
>

I would think with modification of interrupt delay you could probably
do much better if my assumption is correct and the issue is us sitting
on packets for too long so we overrun the socket buffer and start
dropping packets or stalling the Tx.

Thanks.

- Alex

^ permalink raw reply

* Re: [Cake] [PATCH net-next v14 6/7] sch_cake: Add overhead compensation support to the rate shaper
From: Marcelo Ricardo Leitner @ 2018-05-22 20:22 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: netdev, cake
In-Reply-To: <52F9132E-4FDC-495A-A020-BCD963B3E3CF@toke.dk>

On Tue, May 22, 2018 at 10:44:53AM +0200, Toke Høiland-Jørgensen wrote:
> 
> 
> On 22 May 2018 01:45:13 CEST, Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> wrote:
> >On Mon, May 21, 2018 at 10:35:58PM +0200, Toke Høiland-Jørgensen wrote:
> >> +static u32 cake_overhead(struct cake_sched_data *q, const struct
> >sk_buff *skb)
> >> +{
> >> +	const struct skb_shared_info *shinfo = skb_shinfo(skb);
> >> +	unsigned int hdr_len, last_len = 0;
> >> +	u32 off = skb_network_offset(skb);
> >> +	u32 len = qdisc_pkt_len(skb);
> >> +	u16 segs = 1;
> >> +
> >> +	q->avg_netoff = cake_ewma(q->avg_netoff, off << 16, 8);
> >> +
> >> +	if (!shinfo->gso_size)
> >> +		return cake_calc_overhead(q, len, off);
> >> +
> >> +	/* borrowed from qdisc_pkt_len_init() */
> >> +	hdr_len = skb_transport_header(skb) - skb_mac_header(skb);
> >> +
> >> +	/* + transport layer */
> >> +	if (likely(shinfo->gso_type & (SKB_GSO_TCPV4 |
> >> +						SKB_GSO_TCPV6))) {
> >> +		const struct tcphdr *th;
> >> +		struct tcphdr _tcphdr;
> >> +
> >> +		th = skb_header_pointer(skb, skb_transport_offset(skb),
> >> +					sizeof(_tcphdr), &_tcphdr);
> >> +		if (likely(th))
> >> +			hdr_len += __tcp_hdrlen(th);
> >> +	} else {
> >
> >I didn't see some code limiting GSO packets to just TCP or UDP. Is it
> >safe to assume that this packet is an UDP one, and not SCTP or ESP,
> >for example?
> 
> As the comment says, I nicked this from the qdisc init code.
> So I assume it's safe? :)

As long as it doesn't go further than this, it is. As in, it is just
validating if it can contain an UDP header, and if so, account for its
size, without actually reading the header.

Considering everything !TCP as UDP work as an approximation, which is
quite accurate. SCTP header is just 4 bytes bigger than UDP header and
is equal to ESP header size.

> 
> >> +		struct udphdr _udphdr;
> >> +
> >> +		if (skb_header_pointer(skb, skb_transport_offset(skb),
> >> +				       sizeof(_udphdr), &_udphdr))
> >> +			hdr_len += sizeof(struct udphdr);
> >> +	}
> >> +
> >> +	if (unlikely(shinfo->gso_type & SKB_GSO_DODGY))
> >> +		segs = DIV_ROUND_UP(skb->len - hdr_len,
> >> +				    shinfo->gso_size);
> >> +	else
> >> +		segs = shinfo->gso_segs;
> >> +
> >> +	len = shinfo->gso_size + hdr_len;
> >> +	last_len = skb->len - shinfo->gso_size * (segs - 1);
> >> +
> >> +	return (cake_calc_overhead(q, len, off) * (segs - 1) +
> >> +		cake_calc_overhead(q, last_len, off));
> >> +}
> >> +
> 

^ permalink raw reply

* [net-next] i40iw/i40e: Remove link dependency on i40e
From: Jeff Kirsher @ 2018-05-22 20:38 UTC (permalink / raw)
  To: davem, dledford, jgg
  Cc: Sindhu Devale, netdev, linux-rdma, nhorman, sassmann, jogreene,
	Shiraz Saleem, Jeff Kirsher

From: Sindhu Devale <sindhu.devale@intel.com>

Currently i40iw is dependent on i40e symbols
i40e_register_client and i40e_unregister_client due to
which i40iw cannot be loaded without i40e being loaded.

This patch allows RDMA driver to build and load without
linking to LAN driver and without LAN driver being loaded
first. Once the LAN driver is loaded, the RDMA driver
is notified through the netdevice notifiers to register
as client to the LAN driver. Add function pointers to IDC
register/unregister in the private VSI structure. This
allows a RDMA driver to build without linking to i40e.

Signed-off-by: Sindhu Devale <sindhu.devale@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/infiniband/hw/i40iw/i40iw.h           |  24 +++
 drivers/infiniband/hw/i40iw/i40iw_main.c      | 141 ++++++++++++++++--
 drivers/infiniband/hw/i40iw/i40iw_utils.c     |   5 +-
 drivers/net/ethernet/intel/i40e/i40e.h        |   1 +
 drivers/net/ethernet/intel/i40e/i40e_client.h |   9 ++
 drivers/net/ethernet/intel/i40e/i40e_main.c   |   6 +
 6 files changed, 173 insertions(+), 13 deletions(-)

diff --git a/drivers/infiniband/hw/i40iw/i40iw.h b/drivers/infiniband/hw/i40iw/i40iw.h
index d5d8c1be345a..c6398b73a8da 100644
--- a/drivers/infiniband/hw/i40iw/i40iw.h
+++ b/drivers/infiniband/hw/i40iw/i40iw.h
@@ -119,6 +119,30 @@
 #define I40IW_CQP_COMPL_SQ_WQE_FLUSHED    3
 #define I40IW_CQP_COMPL_RQ_SQ_WQE_FLUSHED 4
 
+enum I40IW_IDC_STATE {
+	I40IW_STATE_INVALID,
+	I40IW_STATE_VALID,
+	I40IW_STATE_REG_FAILED
+};
+
+struct i40iw_peer {
+	struct module *module;
+#define MAX_PEER_NAME_SIZE 8
+	char name[MAX_PEER_NAME_SIZE];
+	enum I40IW_IDC_STATE state;
+	atomic_t ref_count;
+	int (*idc_reg_peer_driver)(struct i40e_client *i40iw_client);
+	int (*idc_unreg_peer_driver)(struct i40e_client *i40iw_client);
+};
+
+struct i40iw_peer_drv {
+	struct i40e_client i40iw_client;
+	struct i40iw_peer peer;
+};
+
+bool i40iw_is_new_peer(struct net_device *netdev);
+void i40iw_reg_peer(void);
+
 struct i40iw_cqp_compl_info {
 	u32 op_ret_val;
 	u16 maj_err_code;
diff --git a/drivers/infiniband/hw/i40iw/i40iw_main.c b/drivers/infiniband/hw/i40iw/i40iw_main.c
index 9cd0d3ef9057..f4c5be11c1d4 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_main.c
+++ b/drivers/infiniband/hw/i40iw/i40iw_main.c
@@ -78,6 +78,7 @@ MODULE_AUTHOR("Intel Corporation, <e1000-rdma@lists.sourceforge.net>");
 MODULE_DESCRIPTION("Intel(R) Ethernet Connection X722 iWARP RDMA Driver");
 MODULE_LICENSE("Dual BSD/GPL");
 
+static struct i40iw_peer_drv peer_drv;
 static struct i40e_client i40iw_client;
 static char i40iw_client_name[I40E_CLIENT_STR_LENGTH] = "i40iw";
 
@@ -103,6 +104,30 @@ static struct notifier_block i40iw_netdevice_notifier = {
 	.notifier_call = i40iw_netdevice_event
 };
 
+/**
+ * i40iw_open_inc_ref - Increment ref count for a open
+ */
+static void i40iw_open_inc_ref(void)
+{
+	atomic_inc(&peer_drv.peer.ref_count);
+}
+
+/**
+ * i40iw_open_dec_ref - Decrement ref count for a open
+ */
+static void i40iw_open_dec_ref(void)
+{
+	struct i40iw_peer *peer;
+
+	peer = &peer_drv.peer;
+	if (peer->state == I40IW_STATE_VALID &&
+	    atomic_dec_and_test(&peer->ref_count)) {
+		peer->state = I40IW_STATE_INVALID;
+		peer->idc_unreg_peer_driver(&peer_drv.i40iw_client);
+		module_put(peer->module);
+	}
+}
+
 /**
  * i40iw_find_i40e_handler - find a handler given a client info
  * @ldev: pointer to a client info
@@ -1710,6 +1735,7 @@ static int i40iw_open(struct i40e_info *ldev, struct i40e_client *client)
 		if(iwdev->param_wq == NULL)
 			break;
 		i40iw_pr_info("i40iw_open completed\n");
+		i40iw_open_inc_ref();
 		return 0;
 	} while (0);
 
@@ -1801,6 +1827,7 @@ static void i40iw_close(struct i40e_info *ldev, struct i40e_client *client, bool
 	i40iw_cm_teardown_connections(iwdev, NULL, NULL, true);
 	destroy_workqueue(iwdev->virtchnl_wq);
 	i40iw_deinit_device(iwdev);
+	i40iw_open_dec_ref();
 }
 
 /**
@@ -2024,6 +2051,104 @@ static const struct i40e_client_ops i40e_ops = {
 	.vf_capable = i40iw_vf_capable
 };
 
+/**
+ * i40iw_is_new_peer - check netdev of the peer driver
+ * @netdev: netdev of peer driver
+ */
+bool i40iw_is_new_peer(struct net_device *netdev)
+{
+	struct idc_srv_provider *sp;
+	struct i40iw_peer *peer;
+
+	peer = &peer_drv.peer;
+	if (peer->state == I40IW_STATE_VALID)
+		return false;
+
+	if (netdev->dev.parent && netdev->dev.parent->driver &&
+	    !strncmp(netdev->dev.parent->driver->name, peer->name, sizeof(peer->name))) {
+		sp = (struct idc_srv_provider *)netdev_priv(netdev);
+		if (sp->signature != IDC_SIGNATURE || sp->version)
+			return false;
+
+		/* Found the driver */
+		peer->idc_reg_peer_driver = sp->idc_reg_peer_driver;
+		peer->idc_unreg_peer_driver = sp->idc_unreg_peer_driver;
+		peer->module = netdev->dev.parent->driver->owner;
+
+		return true;
+	}
+
+	return false;
+}
+
+/**
+ * i40iw_initialize_client - Setup client struct
+ */
+static void i40iw_initialize_client(void)
+{
+	struct i40e_client *i40iw_client = &peer_drv.i40iw_client;
+
+	i40iw_client->version.major = CLIENT_IW_INTERFACE_VERSION_MAJOR;
+	i40iw_client->version.minor = CLIENT_IW_INTERFACE_VERSION_MINOR;
+	i40iw_client->version.build = CLIENT_IW_INTERFACE_VERSION_BUILD;
+	i40iw_client->ops = &i40e_ops;
+	memcpy(i40iw_client->name, i40iw_client_name, I40E_CLIENT_STR_LENGTH);
+	i40iw_client->type = I40E_CLIENT_IWARP;
+	strncpy(peer_drv.peer.name, "i40e", sizeof(peer_drv.peer.name));
+}
+
+/**
+ * i40iw_reg_peer - Register with peer
+ */
+void i40iw_reg_peer(void)
+{
+	struct i40iw_peer *peer;
+
+	peer = &peer_drv.peer;
+
+	if (peer->state == I40IW_STATE_VALID)
+		return;
+
+	if (peer->idc_reg_peer_driver &&
+	    !peer->idc_reg_peer_driver(&peer_drv.i40iw_client)) {
+		peer->state = I40IW_STATE_VALID;
+		try_module_get(peer->module);
+	} else {
+		peer->state = I40IW_STATE_REG_FAILED;
+	}
+}
+
+/**
+ * i40iw_find_idc_peer - Search netdevs for a peer driver
+ */
+static void i40iw_find_idc_peer(void)
+{
+	struct net_device *dev;
+
+	rcu_read_lock();
+	for_each_netdev_rcu(&init_net, dev) {
+		if (i40iw_is_new_peer(dev))
+			break;
+	}
+	rcu_read_unlock();
+	i40iw_reg_peer();
+}
+
+/**
+ * i40iw_unreg_peer - Unregister with peer
+ */
+static void i40iw_unreg_peer(void)
+{
+	struct i40iw_peer *peer;
+
+	peer = &peer_drv.peer;
+	if (peer->state == I40IW_STATE_VALID) {
+		peer->state = I40IW_STATE_INVALID;
+		peer->idc_unreg_peer_driver(&peer_drv.i40iw_client);
+		module_put(peer->module);
+	}
+}
+
 /**
  * i40iw_init_module - driver initialization function
  *
@@ -2032,20 +2157,12 @@ static const struct i40e_client_ops i40e_ops = {
  */
 static int __init i40iw_init_module(void)
 {
-	int ret;
-
-	memset(&i40iw_client, 0, sizeof(i40iw_client));
-	i40iw_client.version.major = CLIENT_IW_INTERFACE_VERSION_MAJOR;
-	i40iw_client.version.minor = CLIENT_IW_INTERFACE_VERSION_MINOR;
-	i40iw_client.version.build = CLIENT_IW_INTERFACE_VERSION_BUILD;
-	i40iw_client.ops = &i40e_ops;
-	memcpy(i40iw_client.name, i40iw_client_name, I40E_CLIENT_STR_LENGTH);
-	i40iw_client.type = I40E_CLIENT_IWARP;
 	spin_lock_init(&i40iw_handler_lock);
-	ret = i40e_register_client(&i40iw_client);
+	i40iw_initialize_client();
+	i40iw_find_idc_peer();
 	i40iw_register_notifiers();
 
-	return ret;
+	return 0;
 }
 
 /**
@@ -2057,7 +2174,7 @@ static int __init i40iw_init_module(void)
 static void __exit i40iw_exit_module(void)
 {
 	i40iw_unregister_notifiers();
-	i40e_unregister_client(&i40iw_client);
+	i40iw_unreg_peer();
 }
 
 module_init(i40iw_init_module);
diff --git a/drivers/infiniband/hw/i40iw/i40iw_utils.c b/drivers/infiniband/hw/i40iw/i40iw_utils.c
index a9ea966877f2..264939942da0 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_utils.c
+++ b/drivers/infiniband/hw/i40iw/i40iw_utils.c
@@ -314,8 +314,11 @@ int i40iw_netdevice_event(struct notifier_block *notifier,
 	event_netdev = netdev_notifier_info_to_dev(ptr);
 
 	hdl = i40iw_find_netdev(event_netdev);
-	if (!hdl)
+	if (!hdl) {
+		if (i40iw_is_new_peer(event_netdev))
+			i40iw_reg_peer();
 		return NOTIFY_DONE;
+	}
 
 	iwdev = &hdl->device;
 	if (iwdev->init_state < RDMA_DEV_REGISTERED || iwdev->closing)
diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index 7a80652e2500..e3171b696848 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -789,6 +789,7 @@ struct i40e_vsi {
 } ____cacheline_internodealigned_in_smp;
 
 struct i40e_netdev_priv {
+	struct idc_srv_provider prov_callbacks;
 	struct i40e_vsi *vsi;
 };
 
diff --git a/drivers/net/ethernet/intel/i40e/i40e_client.h b/drivers/net/ethernet/intel/i40e/i40e_client.h
index 72994baf4941..95a47df9c104 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_client.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_client.h
@@ -44,6 +44,15 @@ struct i40e_client;
 #define I40E_QUEUE_TYPE_PE_AEQ  0x80
 #define I40E_QUEUE_INVALID_IDX	0xFFFF
 
+#define IDC_SIGNATURE 0x494e54454c494443ULL	/* INTELIDC */
+struct idc_srv_provider {
+	u64 signature;
+	u8 version;
+	u8 rsvd[7];
+	int (*idc_reg_peer_driver)(struct i40e_client *client);
+	int (*idc_unreg_peer_driver)(struct i40e_client *client);
+};
+
 struct i40e_qv_info {
 	u32 v_idx; /* msix_vector */
 	u16 ceq_idx;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index b5daa5c9c7de..984001ae7680 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -11913,6 +11913,12 @@ static int i40e_config_netdev(struct i40e_vsi *vsi)
 	np = netdev_priv(netdev);
 	np->vsi = vsi;
 
+	np->prov_callbacks.signature = IDC_SIGNATURE;
+	np->prov_callbacks.version = 0;
+	memset(np->prov_callbacks.rsvd, 0, sizeof(np->prov_callbacks.rsvd));
+	np->prov_callbacks.idc_reg_peer_driver = i40e_register_client;
+	np->prov_callbacks.idc_unreg_peer_driver = i40e_unregister_client;
+
 	hw_enc_features = NETIF_F_SG			|
 			  NETIF_F_IP_CSUM		|
 			  NETIF_F_IPV6_CSUM		|
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH 07/33] iwlwifi: mvm: use match_string() helper
From: Andy Shevchenko @ 2018-05-22 20:41 UTC (permalink / raw)
  To: Yisheng Xie, Luca Coelho
  Cc: Linux Kernel Mailing List, Kalle Valo, Intel Linux Wireless,
	Johannes Berg, Emmanuel Grumbach, open list:TI WILINK WIRELES...,
	netdev
In-Reply-To: <63a78572-b7d3-9cc7-9e22-5bd19cad3333@huawei.com>

On Tue, May 22, 2018 at 6:30 AM, Yisheng Xie <xieyisheng1@huawei.com> wrote:

>> But it's up tu Loca.

Shame on me. I meant Luca, of course!
Luca, sorry.

> OK, I will change it if Loca agree your opinion.


-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply

* Re: [PATCH net-next 0/7] net/ipv6: Fix route append and replace use cases
From: David Ahern @ 2018-05-22 20:44 UTC (permalink / raw)
  To: David Miller, dsahern; +Cc: netdev, Thomas.Winter, idosch, sharpd, roopa
In-Reply-To: <20180522.144617.1003575784676243779.davem@davemloft.net>

On 5/22/18 12:46 PM, David Miller wrote:
> 
> Ok, I'll apply this series.
> 
> But if this breaks things for anyone in a practical way, I am unfortunately
> going to have to revert no matter how silly the current behavior may be.
> 

Understood. I have to try the best option first. I'll look at
regressions if they happen.

^ permalink raw reply

* [PATCH net] net: ipv4: add missing RTA_TABLE to rtm_ipv4_policy
From: Roopa Prabhu @ 2018-05-22 20:44 UTC (permalink / raw)
  To: davem; +Cc: netdev, eric.dumazet

From: Roopa Prabhu <roopa@cumulusnetworks.com>

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
---
 net/ipv4/fib_frontend.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 4d622112..e66172a 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -649,6 +649,7 @@ const struct nla_policy rtm_ipv4_policy[RTA_MAX + 1] = {
 	[RTA_ENCAP]		= { .type = NLA_NESTED },
 	[RTA_UID]		= { .type = NLA_U32 },
 	[RTA_MARK]		= { .type = NLA_U32 },
+	[RTA_TABLE]		= { .type = NLA_U32 },
 };
 
 static int rtm_to_fib_config(struct net *net, struct sk_buff *skb,
-- 
2.1.4

^ permalink raw reply related

* [PATCH net-next] enic: set DMA mask to 47 bit
From: Govindarajulu Varadarajan @ 2018-05-22 13:37 UTC (permalink / raw)
  To: davem, netdev; +Cc: benve, Govindarajulu Varadarajan

In commit 624dbf55a359b ("driver/net: enic: Try DMA 64 first, then
failover to DMA") DMA mask was changed from 40 bits to 64 bits.
Hardware actually supports only 47 bits.

Fixes: 624dbf55a359b("driver/net: enic: Try DMA 64 first, then failover
to DMA")
Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
---
 drivers/net/ethernet/cisco/enic/enic_main.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/cisco/enic/enic_main.c b/drivers/net/ethernet/cisco/enic/enic_main.c
index 81684acf52af..8a8b12b720ef 100644
--- a/drivers/net/ethernet/cisco/enic/enic_main.c
+++ b/drivers/net/ethernet/cisco/enic/enic_main.c
@@ -2747,11 +2747,11 @@ static int enic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	pci_set_master(pdev);
 
 	/* Query PCI controller on system for DMA addressing
-	 * limitation for the device.  Try 64-bit first, and
+	 * limitation for the device.  Try 47-bit first, and
 	 * fail to 32-bit.
 	 */
 
-	err = pci_set_dma_mask(pdev, DMA_BIT_MASK(64));
+	err = pci_set_dma_mask(pdev, DMA_BIT_MASK(47));
 	if (err) {
 		err = pci_set_dma_mask(pdev, DMA_BIT_MASK(32));
 		if (err) {
@@ -2765,10 +2765,10 @@ static int enic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 			goto err_out_release_regions;
 		}
 	} else {
-		err = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64));
+		err = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(47));
 		if (err) {
 			dev_err(dev, "Unable to obtain %u-bit DMA "
-				"for consistent allocations, aborting\n", 64);
+				"for consistent allocations, aborting\n", 47);
 			goto err_out_release_regions;
 		}
 		using_dac = 1;
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH net-next v2 1/7] net: dsa: qca8k: Add QCA8334 binding documentation
From: Michal @ 2018-05-22 20:50 UTC (permalink / raw)
  To: Rob Herring
  Cc: netdev, linux-kernel, devicetree, f.fainelli, vivien.didelot,
	andrew, mark.rutland, davem, michal.vokac
In-Reply-To: <20180522194039.GA15413@rob-hp-laptop>

On 22.5.2018 21:40, Rob Herring wrote:
> On Tue, May 22, 2018 at 01:16:26PM +0200, Michal Vokáč wrote:
>> Add support for the four-port variant of the Qualcomm QCA833x switch.
>>
>> The CPU port default link settings can be reconfigured using
>> a fixed-link sub-node.
>>
>> Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com>
>> ---
>> Changes in v2:
>>   - Add commit message and document fixed-link binding.
>>
>>   .../devicetree/bindings/net/dsa/qca8k.txt          | 23 +++++++++++++++++++++-
>>   1 file changed, 22 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/devicetree/bindings/net/dsa/qca8k.txt b/Documentation/devicetree/bindings/net/dsa/qca8k.txt
>> index 9c67ee4..15b9057 100644
>> --- a/Documentation/devicetree/bindings/net/dsa/qca8k.txt
>> +++ b/Documentation/devicetree/bindings/net/dsa/qca8k.txt
>> @@ -2,7 +2,10 @@
>>   
>>   Required properties:
>>   
>> -- compatible: should be "qca,qca8337"
>> +- compatible: should be one of:
>> +    "qca,qca8334"
>> +    "qca,qca8337"
>> +
>>   - #size-cells: must be 0
>>   - #address-cells: must be 1
>>   
>> @@ -14,6 +17,20 @@ port and PHY id, each subnode describing a port needs to have a valid phandle
>>   referencing the internal PHY connected to it. The CPU port of this switch is
>>   always port 0.
>>   
>> +A CPU port node has the following optional property:
> 
> s/property/node/
> 
> Otherwise,
> 
> Reviewed-by: Rob Herring <robh@kernel.org>

Good catch, I will correct this.
Thanks for the review Rob.

Michal

^ permalink raw reply

* Re: [PATCH net-next v11 2/5] netvsc: refactor notifier/event handling code to use the failover framework
From: Samudrala, Sridhar @ 2018-05-22 20:54 UTC (permalink / raw)
  To: Jiri Pirko, Michael S. Tsirkin
  Cc: stephen, davem, netdev, virtualization, virtio-dev,
	jesse.brandeburg, alexander.h.duyck, kubakici, jasowang,
	loseweigh, aaron.f.brown, anjali.singhai
In-Reply-To: <20180522161246.GN2149@nanopsycho>



On 5/22/2018 9:12 AM, Jiri Pirko wrote:
> Fixing the subj, sorry about that.
>
> Tue, May 22, 2018 at 05:46:21PM CEST, mst@redhat.com wrote:
>> On Tue, May 22, 2018 at 05:36:14PM +0200, Jiri Pirko wrote:
>>> Tue, May 22, 2018 at 05:28:42PM CEST, sridhar.samudrala@intel.com wrote:
>>>> On 5/22/2018 2:08 AM, Jiri Pirko wrote:
>>>>> Tue, May 22, 2018 at 11:06:37AM CEST, jiri@resnulli.us wrote:
>>>>>> Tue, May 22, 2018 at 04:06:18AM CEST, sridhar.samudrala@intel.com wrote:
>>>>>>> Use the registration/notification framework supported by the generic
>>>>>>> failover infrastructure.
>>>>>>>
>>>>>>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>>>>>> In previous patchset versions, the common code did
>>>>>> netdev_rx_handler_register() and netdev_upper_dev_link() etc
>>>>>> (netvsc_vf_join()). Now, this is still done in netvsc. Why?
>>>>>>
>>>>>> This should be part of the common "failover" code.
>>>> Based on Stephen's feedback on earlier patches, i tried to minimize the changes to
>>>> netvsc and only commonize the notifier and the main event handler routine.
>>>> Another complication is that netvsc does part of registration in a delayed workqueue.
>>> :( This kind of degrades the whole efford of having single solution
>>> in "failover" module. I think that common parts, as
>>> netdev_rx_handler_register() and others certainly is should be inside
>>> the common module. This is not a good time to minimize changes. Let's do
>>> the thing properly and fix the netvsc mess now.
>>>
>>>
>>>> It should be possible to move some of the code from net_failover.c to generic
>>>> failover.c in future if Stephen is ok with it.
>>>>
>>>>
>>>>> Also note that in the current patchset you use IFF_FAILOVER flag for
>>>>> master, yet for the slave you use IFF_SLAVE. That is wrong.
>>>>> IFF_FAILOVER_SLAVE should be used.
>>>> Not sure which code you are referring to.  I only set IFF_FAILOVER_SLAVE
>>>> in patch 3.
>>> The existing netvsc driver.
>> We really can't change netvsc's flags now, even if it's interface is
>> messy, it's being used in the field. We can add a flag that makes netvsc
>> behave differently, and if this flag also allows enhanced functionality
>> userspace will gradually switch.
> Okay, although in this case, it really does not make much sense, so be
> it. Leave the netvsc set the ->priv flag to IFF_SLAVE as it is doing
> now. (This once-wrong-forever-wrong policy is flustrating me).
>
> But since this patchset introduces private flag IFF_FAILOVER and
> IFF_FAILOVER_SLAVE, and we set IFF_FAILOVER to the netvsc netdev
> instance, we should also set IFF_FAILOVER_SLAVE to the enslaved VF
> netdevice to get at least some consistency between virtio_net and
> netvsc.

OK. I can make this change to set/unset IFF_FAILOVER_SLAVE in the netvsc
register/unregister routines so that it is consistent with virtio_net.

Based on your discussion with mst, i think we can even remove IFF_SLAVE
setting on netvsc as it should not impact userspace.  If Stephen is OK
we can make this change too.

Do you see any other items that need to be resolved for this series to go in
this merge window?



>
>> Anything breaking userspace I fully expect Stephen to nack and
>> IMO with good reason.
>>
>> -- 
>> MST

^ permalink raw reply

* Re: [net-next] i40iw/i40e: Remove link dependency on i40e
From: Jason Gunthorpe @ 2018-05-22 20:56 UTC (permalink / raw)
  To: Jeff Kirsher
  Cc: davem, dledford, Sindhu Devale, netdev, linux-rdma, nhorman,
	sassmann, jogreene, Shiraz Saleem
In-Reply-To: <20180522203831.20624-1-jeffrey.t.kirsher@intel.com>

On Tue, May 22, 2018 at 01:38:31PM -0700, Jeff Kirsher wrote:
> From: Sindhu Devale <sindhu.devale@intel.com>
> 
> Currently i40iw is dependent on i40e symbols
> i40e_register_client and i40e_unregister_client due to
> which i40iw cannot be loaded without i40e being loaded.
> 
> This patch allows RDMA driver to build and load without
> linking to LAN driver and without LAN driver being loaded
> first. Once the LAN driver is loaded, the RDMA driver
> is notified through the netdevice notifiers to register
> as client to the LAN driver. Add function pointers to IDC
> register/unregister in the private VSI structure. This
> allows a RDMA driver to build without linking to i40e.

Why would you want to do this? The rdma driver is non-functional
without the ethernet driver, so why on earth would we want to defeat
the module dependency mechanism?

Jason

^ permalink raw reply

* Re: [net-next] i40iw/i40e: Remove link dependency on i40e
From: Jeff Kirsher @ 2018-05-22 21:04 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: davem, dledford, Sindhu Devale, netdev, linux-rdma, nhorman,
	sassmann, jogreene, Shiraz Saleem
In-Reply-To: <20180522205612.GD7502@mellanox.com>

[-- Attachment #1: Type: text/plain, Size: 1091 bytes --]

On Tue, 2018-05-22 at 14:56 -0600, Jason Gunthorpe wrote:
> On Tue, May 22, 2018 at 01:38:31PM -0700, Jeff Kirsher wrote:
> > From: Sindhu Devale <sindhu.devale@intel.com>
> > 
> > Currently i40iw is dependent on i40e symbols
> > i40e_register_client and i40e_unregister_client due to
> > which i40iw cannot be loaded without i40e being loaded.
> > 
> > This patch allows RDMA driver to build and load without
> > linking to LAN driver and without LAN driver being loaded
> > first. Once the LAN driver is loaded, the RDMA driver
> > is notified through the netdevice notifiers to register
> > as client to the LAN driver. Add function pointers to IDC
> > register/unregister in the private VSI structure. This
> > allows a RDMA driver to build without linking to i40e.
> 
> Why would you want to do this? The rdma driver is non-functional
> without the ethernet driver, so why on earth would we want to defeat
> the module dependency mechanism?

This change is driven by the OSV's like Red Hat, where customer's were
updating the i40e driver, which in turn broke i40iw.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* [PATCH net-next v5 0/3] fib rule selftest
From: Roopa Prabhu @ 2018-05-22 21:03 UTC (permalink / raw)
  To: davem; +Cc: netdev, nikolay, dsa, idosch, eric.dumazet

From: Roopa Prabhu <roopa@cumulusnetworks.com>

This series adds a new test to test fib rules.
ip route get is used to test fib rule matches.
This series also extends ip route get to match on
sport and dport to test recent support of sport
and dport fib rule match.

v2 - address ido's commemt to make sport dport
ip route get to work correctly for input route
get. I don't support ip route get on ip-proto match yet.
ip route get creates a udp packet and i have left
it at that. We could extend ip route get to support
a few ip proto matches in followup patches.

v3 - Support ip_proto (only tcp and udp) match in getroute.
dropped printing of new match attrs in ip route get, 
because ipv6 does not print it. And ipv6 currrently shares
the dump api with ipv6 notify and its better to not add them
to the notify api. dropped it to keep the api consistent between
ipv4 and ipv6 (though uid is already printed in the ipv4 case).
If we need it, both ipv4 and ipv6 can be enhanced to provide
a separate get api. Moved skb creation for ipv4 to a separate func.

v4 - drop separate skb for netlink and fix concerns around rcu and netlink
     reply (as pointed out by DaveM). I now try to reset the skb after the route
     lookup and before the netlink send (testing shows this is ok. More eyes and
     any feedback here will be helpful)

v5 - dropped RTA_TABLE ipv4_rtm_policy update from this series and posted
     it separately for net (feedback from Eric)

Roopa Prabhu (3):
  ipv4: support sport, dport and ip_proto in RTM_GETROUTE
  ipv6: support sport, dport and ip_proto in RTM_GETROUTE
  selftests: net: initial fib rule tests

 include/uapi/linux/rtnetlink.h                |   2 +
 net/ipv4/route.c                              | 152 ++++++++++++-----
 net/ipv6/route.c                              |  25 +++
 tools/testing/selftests/net/Makefile          |   2 +-
 tools/testing/selftests/net/fib_rule_tests.sh | 224 ++++++++++++++++++++++++++
 5 files changed, 366 insertions(+), 39 deletions(-)
 create mode 100644 tools/testing/selftests/net/fib_rule_tests.sh

-- 
2.1.4

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox