Netdev List
 help / color / mirror / Atom feed
* [PATCH net 2/3] net: dsa: sja1105: fix checks for VLAN state in redirect action
From: Vladimir Oltean @ 2020-06-16 23:58 UTC (permalink / raw)
  To: netdev; +Cc: andrew, f.fainelli, vivien.didelot, kuba
In-Reply-To: <20200616235843.756413-1-olteanv@gmail.com>

From: Vladimir Oltean <vladimir.oltean@nxp.com>

This action requires the VLAN awareness state of the switch to be of the
same type as the key that's being added:

- If the switch is unaware of VLAN, then the tc filter key must only
  contain the destination MAC address.
- If the switch is VLAN-aware, the key must also contain the VLAN ID and
  PCP.

But this check doesn't work unless we verify the VLAN awareness state on
both the "if" and the "else" branches.

Fixes: dfacc5a23e22 ("net: dsa: sja1105: support flow-based redirection via virtual links")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 drivers/net/dsa/sja1105/sja1105_vl.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/dsa/sja1105/sja1105_vl.c b/drivers/net/dsa/sja1105/sja1105_vl.c
index 32eca3e660e1..a176f39a052b 100644
--- a/drivers/net/dsa/sja1105/sja1105_vl.c
+++ b/drivers/net/dsa/sja1105/sja1105_vl.c
@@ -342,7 +342,9 @@ int sja1105_vl_redirect(struct sja1105_private *priv, int port,
 		NL_SET_ERR_MSG_MOD(extack,
 				   "Can only redirect based on DMAC");
 		return -EOPNOTSUPP;
-	} else if (key->type != SJA1105_KEY_VLAN_AWARE_VL) {
+	} else if ((priv->vlan_state == SJA1105_VLAN_BEST_EFFORT ||
+		    priv->vlan_state == SJA1105_VLAN_FILTERING_FULL) &&
+		   key->type != SJA1105_KEY_VLAN_AWARE_VL) {
 		NL_SET_ERR_MSG_MOD(extack,
 				   "Can only redirect based on {DMAC, VID, PCP}");
 		return -EOPNOTSUPP;
-- 
2.25.1


^ permalink raw reply related

* Re: [PATCH v4 0/3] mm, treewide: Rename kzfree() to kfree_sensitive()
From: Matthew Wilcox @ 2020-06-17  0:37 UTC (permalink / raw)
  To: dsterba, Joe Perches, Waiman Long, Andrew Morton, David Howells,
	Jarkko Sakkinen, James Morris, Serge E. Hallyn, Linus Torvalds,
	David Rientjes, Michal Hocko, Johannes Weiner, Dan Carpenter,
	Jason A . Donenfeld, linux-mm, keyrings, linux-kernel,
	linux-crypto, linux-pm, linux-stm32, linux-amlogic,
	linux-mediatek, linuxppc-dev, virtualization, netdev, linux-ppp,
	wireguard, linux-wireless, devel, linux-scsi, target-devel,
	linux-btrfs, linux-cifs, linux-fscrypt, ecryptfs, kasan-dev,
	linux-bluetooth, linux-wpan, linux-sctp, linux-nfs,
	tipc-discussion, linux-security-module, linux-integrity
In-Reply-To: <20200616230130.GJ27795@twin.jikos.cz>

On Wed, Jun 17, 2020 at 01:01:30AM +0200, David Sterba wrote:
> On Tue, Jun 16, 2020 at 11:53:50AM -0700, Joe Perches wrote:
> > On Mon, 2020-06-15 at 21:57 -0400, Waiman Long wrote:
> > >  v4:
> > >   - Break out the memzero_explicit() change as suggested by Dan Carpenter
> > >     so that it can be backported to stable.
> > >   - Drop the "crypto: Remove unnecessary memzero_explicit()" patch for
> > >     now as there can be a bit more discussion on what is best. It will be
> > >     introduced as a separate patch later on after this one is merged.
> > 
> > To this larger audience and last week without reply:
> > https://lore.kernel.org/lkml/573b3fbd5927c643920e1364230c296b23e7584d.camel@perches.com/
> > 
> > Are there _any_ fastpath uses of kfree or vfree?
> 
> I'd consider kfree performance critical for cases where it is called
> under locks. If possible the kfree is moved outside of the critical
> section, but we have rbtrees or lists that get deleted under locks and
> restructuring the code to do eg. splice and free it outside of the lock
> is not always possible.

Not just performance critical, but correctness critical.  Since kvfree()
may allocate from the vmalloc allocator, I really think that kvfree()
should assert that it's !in_atomic().  Otherwise we can get into trouble
if we end up calling vfree() and have to take the mutex.

^ permalink raw reply

* Re: [GIT] Networking
From: pr-tracker-bot @ 2020-06-17  0:50 UTC (permalink / raw)
  To: David Miller; +Cc: torvalds, akpm, netdev, linux-kernel
In-Reply-To: <20200616.162551.466272432384185418.davem@davemloft.net>

The pull request you sent on Tue, 16 Jun 2020 16:25:51 -0700 (PDT):

> git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net refs/heads/master

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/69119673bd50b176ded34032fadd41530fb5af21

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker

^ permalink raw reply

* Re: [PATCH] net: usb: ax88179_178a: fix packet alignment padding
From: Jeremy Kerr @ 2020-06-17  0:56 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, allan, freddy, pfink, linux-usb, louis
In-Reply-To: <20200616.135535.379478681934951754.davem@davemloft.net>

Hi David,

> It seems logical to me that what the chip does is align up the total
> sub-packet length to a multiple of 4 or larger, and then add those two
> prefix padding bytes.  Otherwise the prefix padding won't necessarily
> and reliably align the IP header after the link level header.

Yep, that makes sense, and is what the driver is currently doing;
between clustered packets, the header is aligned (up) to 8 bytes, then
the 2-byte padding is added to that.

For this change, I have assumed that the packet length behaviour (ie,
describing the un-padded length) is consistent between clustered
packets.

[If you have any hints for forcing clustered packets, I'll see if I can
probe the behaviour a little better to confirm]

Cheers,


Jeremy


^ permalink raw reply

* [PATCH bpf v5 2/3] selftests/bpf: make sure optvals > PAGE_SIZE are bypassed
From: Stanislav Fomichev @ 2020-06-17  1:04 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, Stanislav Fomichev
In-Reply-To: <20200617010416.93086-1-sdf@google.com>

We are relying on the fact, that we can pass > sizeof(int) optvals
to the SOL_IP+IP_FREEBIND option (the kernel will take first 4 bytes).
In the BPF program we check that we can only touch PAGE_SIZE bytes,
but the real optlen is PAGE_SIZE * 2. In both cases, we override it to
some predefined value and trim the optlen.

Also, let's modify exiting IP_TOS usecase to test optlen=0 case
where BPF program just bypasses the data as is.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 .../selftests/bpf/prog_tests/sockopt_sk.c     | 46 +++++++++++++---
 .../testing/selftests/bpf/progs/sockopt_sk.c  | 54 ++++++++++++++++++-
 2 files changed, 91 insertions(+), 9 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c b/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c
index 2061a6beac0f..5f54c6aec7f0 100644
--- a/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c
+++ b/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c
@@ -13,6 +13,7 @@ static int getsetsockopt(void)
 		char cc[16]; /* TCP_CA_NAME_MAX */
 	} buf = {};
 	socklen_t optlen;
+	char *big_buf = NULL;
 
 	fd = socket(AF_INET, SOCK_STREAM, 0);
 	if (fd < 0) {
@@ -22,24 +23,31 @@ static int getsetsockopt(void)
 
 	/* IP_TOS - BPF bypass */
 
-	buf.u8[0] = 0x08;
-	err = setsockopt(fd, SOL_IP, IP_TOS, &buf, 1);
+	optlen = getpagesize() * 2;
+	big_buf = calloc(1, optlen);
+	if (!big_buf) {
+		log_err("Couldn't allocate two pages");
+		goto err;
+	}
+
+	*(int *)big_buf = 0x08;
+	err = setsockopt(fd, SOL_IP, IP_TOS, big_buf, optlen);
 	if (err) {
 		log_err("Failed to call setsockopt(IP_TOS)");
 		goto err;
 	}
 
-	buf.u8[0] = 0x00;
+	memset(big_buf, 0, optlen);
 	optlen = 1;
-	err = getsockopt(fd, SOL_IP, IP_TOS, &buf, &optlen);
+	err = getsockopt(fd, SOL_IP, IP_TOS, big_buf, &optlen);
 	if (err) {
 		log_err("Failed to call getsockopt(IP_TOS)");
 		goto err;
 	}
 
-	if (buf.u8[0] != 0x08) {
-		log_err("Unexpected getsockopt(IP_TOS) buf[0] 0x%02x != 0x08",
-			buf.u8[0]);
+	if (*(int *)big_buf != 0x08) {
+		log_err("Unexpected getsockopt(IP_TOS) optval 0x%x != 0x08",
+			*(int *)big_buf);
 		goto err;
 	}
 
@@ -78,6 +86,28 @@ static int getsetsockopt(void)
 		goto err;
 	}
 
+	/* IP_FREEBIND - BPF can't access optval past PAGE_SIZE */
+
+	optlen = getpagesize() * 2;
+	memset(big_buf, 0, optlen);
+
+	err = setsockopt(fd, SOL_IP, IP_FREEBIND, big_buf, optlen);
+	if (err != 0) {
+		log_err("Failed to call setsockopt, ret=%d", err);
+		goto err;
+	}
+
+	err = getsockopt(fd, SOL_IP, IP_FREEBIND, big_buf, &optlen);
+	if (err != 0) {
+		log_err("Failed to call getsockopt, ret=%d", err);
+		goto err;
+	}
+
+	if (optlen != 1 || *(__u8 *)big_buf != 0x55) {
+		log_err("Unexpected IP_FREEBIND getsockopt, optlen=%d, optval=0x%x",
+			optlen, *(__u8 *)big_buf);
+	}
+
 	/* SO_SNDBUF is overwritten */
 
 	buf.u32 = 0x01010101;
@@ -124,9 +154,11 @@ static int getsetsockopt(void)
 		goto err;
 	}
 
+	free(big_buf);
 	close(fd);
 	return 0;
 err:
+	free(big_buf);
 	close(fd);
 	return -1;
 }
diff --git a/tools/testing/selftests/bpf/progs/sockopt_sk.c b/tools/testing/selftests/bpf/progs/sockopt_sk.c
index d5a5eeb5fb52..712df7b49cb1 100644
--- a/tools/testing/selftests/bpf/progs/sockopt_sk.c
+++ b/tools/testing/selftests/bpf/progs/sockopt_sk.c
@@ -8,6 +8,10 @@
 char _license[] SEC("license") = "GPL";
 __u32 _version SEC("version") = 1;
 
+#ifndef PAGE_SIZE
+#define PAGE_SIZE 4096
+#endif
+
 #define SOL_CUSTOM			0xdeadbeef
 
 struct sockopt_sk {
@@ -28,12 +32,14 @@ int _getsockopt(struct bpf_sockopt *ctx)
 	__u8 *optval = ctx->optval;
 	struct sockopt_sk *storage;
 
-	if (ctx->level == SOL_IP && ctx->optname == IP_TOS)
+	if (ctx->level == SOL_IP && ctx->optname == IP_TOS) {
 		/* Not interested in SOL_IP:IP_TOS;
 		 * let next BPF program in the cgroup chain or kernel
 		 * handle it.
 		 */
+		ctx->optlen = 0; /* bypass optval>PAGE_SIZE */
 		return 1;
+	}
 
 	if (ctx->level == SOL_SOCKET && ctx->optname == SO_SNDBUF) {
 		/* Not interested in SOL_SOCKET:SO_SNDBUF;
@@ -51,6 +57,26 @@ int _getsockopt(struct bpf_sockopt *ctx)
 		return 1;
 	}
 
+	if (ctx->level == SOL_IP && ctx->optname == IP_FREEBIND) {
+		if (optval + 1 > optval_end)
+			return 0; /* EPERM, bounds check */
+
+		ctx->retval = 0; /* Reset system call return value to zero */
+
+		/* Always export 0x55 */
+		optval[0] = 0x55;
+		ctx->optlen = 1;
+
+		/* Userspace buffer is PAGE_SIZE * 2, but BPF
+		 * program can only see the first PAGE_SIZE
+		 * bytes of data.
+		 */
+		if (optval_end - optval != PAGE_SIZE)
+			return 0; /* EPERM, unexpected data size */
+
+		return 1;
+	}
+
 	if (ctx->level != SOL_CUSTOM)
 		return 0; /* EPERM, deny everything except custom level */
 
@@ -81,12 +107,14 @@ int _setsockopt(struct bpf_sockopt *ctx)
 	__u8 *optval = ctx->optval;
 	struct sockopt_sk *storage;
 
-	if (ctx->level == SOL_IP && ctx->optname == IP_TOS)
+	if (ctx->level == SOL_IP && ctx->optname == IP_TOS) {
 		/* Not interested in SOL_IP:IP_TOS;
 		 * let next BPF program in the cgroup chain or kernel
 		 * handle it.
 		 */
+		ctx->optlen = 0; /* bypass optval>PAGE_SIZE */
 		return 1;
+	}
 
 	if (ctx->level == SOL_SOCKET && ctx->optname == SO_SNDBUF) {
 		/* Overwrite SO_SNDBUF value */
@@ -112,6 +140,28 @@ int _setsockopt(struct bpf_sockopt *ctx)
 		return 1;
 	}
 
+	if (ctx->level == SOL_IP && ctx->optname == IP_FREEBIND) {
+		/* Original optlen is larger than PAGE_SIZE. */
+		if (ctx->optlen != PAGE_SIZE * 2)
+			return 0; /* EPERM, unexpected data size */
+
+		if (optval + 1 > optval_end)
+			return 0; /* EPERM, bounds check */
+
+		/* Make sure we can trim the buffer. */
+		optval[0] = 0;
+		ctx->optlen = 1;
+
+		/* Usepace buffer is PAGE_SIZE * 2, but BPF
+		 * program can only see the first PAGE_SIZE
+		 * bytes of data.
+		 */
+		if (optval_end - optval != PAGE_SIZE)
+			return 0; /* EPERM, unexpected data size */
+
+		return 1;
+	}
+
 	if (ctx->level != SOL_CUSTOM)
 		return 0; /* EPERM, deny everything except custom level */
 
-- 
2.27.0.290.gba653c62da-goog


^ permalink raw reply related

* [PATCH bpf v5 1/3] bpf: don't return EINVAL from {get,set}sockopt when optlen > PAGE_SIZE
From: Stanislav Fomichev @ 2020-06-17  1:04 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, Stanislav Fomichev, David Laight

Attaching to these hooks can break iptables because its optval is
usually quite big, or at least bigger than the current PAGE_SIZE limit.
David also mentioned some SCTP options can be big (around 256k).

For such optvals we expose only the first PAGE_SIZE bytes to
the BPF program. BPF program has two options:
1. Set ctx->optlen to 0 to indicate that the BPF's optval
   should be ignored and the kernel should use original userspace
   value.
2. Set ctx->optlen to something that's smaller than the PAGE_SIZE.

v5:
* use ctx->optlen == 0 with trimmed buffer (Alexei Starovoitov)
* update the docs accordingly

v4:
* use temporary buffer to avoid optval == optval_end == NULL;
  this removes the corner case in the verifier that might assume
  non-zero PTR_TO_PACKET/PTR_TO_PACKET_END.

v3:
* don't increase the limit, bypass the argument

v2:
* proper comments formatting (Jakub Kicinski)

Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks")
Cc: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 kernel/bpf/cgroup.c | 53 ++++++++++++++++++++++++++++-----------------
 1 file changed, 33 insertions(+), 20 deletions(-)

diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 4d76f16524cc..ac53102e244a 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -1276,16 +1276,23 @@ static bool __cgroup_bpf_prog_array_is_empty(struct cgroup *cgrp,
 
 static int sockopt_alloc_buf(struct bpf_sockopt_kern *ctx, int max_optlen)
 {
-	if (unlikely(max_optlen > PAGE_SIZE) || max_optlen < 0)
+	if (unlikely(max_optlen < 0))
 		return -EINVAL;
 
+	if (unlikely(max_optlen > PAGE_SIZE)) {
+		/* We don't expose optvals that are greater than PAGE_SIZE
+		 * to the BPF program.
+		 */
+		max_optlen = PAGE_SIZE;
+	}
+
 	ctx->optval = kzalloc(max_optlen, GFP_USER);
 	if (!ctx->optval)
 		return -ENOMEM;
 
 	ctx->optval_end = ctx->optval + max_optlen;
 
-	return 0;
+	return max_optlen;
 }
 
 static void sockopt_free_buf(struct bpf_sockopt_kern *ctx)
@@ -1319,13 +1326,13 @@ int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level,
 	 */
 	max_optlen = max_t(int, 16, *optlen);
 
-	ret = sockopt_alloc_buf(&ctx, max_optlen);
-	if (ret)
-		return ret;
+	max_optlen = sockopt_alloc_buf(&ctx, max_optlen);
+	if (max_optlen < 0)
+		return max_optlen;
 
 	ctx.optlen = *optlen;
 
-	if (copy_from_user(ctx.optval, optval, *optlen) != 0) {
+	if (copy_from_user(ctx.optval, optval, min(*optlen, max_optlen)) != 0) {
 		ret = -EFAULT;
 		goto out;
 	}
@@ -1353,8 +1360,14 @@ int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level,
 		/* export any potential modifications */
 		*level = ctx.level;
 		*optname = ctx.optname;
-		*optlen = ctx.optlen;
-		*kernel_optval = ctx.optval;
+
+		/* optlen == 0 from BPF indicates that we should
+		 * use original userspace data.
+		 */
+		if (ctx.optlen != 0) {
+			*optlen = ctx.optlen;
+			*kernel_optval = ctx.optval;
+		}
 	}
 
 out:
@@ -1385,12 +1398,12 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
 	    __cgroup_bpf_prog_array_is_empty(cgrp, BPF_CGROUP_GETSOCKOPT))
 		return retval;
 
-	ret = sockopt_alloc_buf(&ctx, max_optlen);
-	if (ret)
-		return ret;
-
 	ctx.optlen = max_optlen;
 
+	max_optlen = sockopt_alloc_buf(&ctx, max_optlen);
+	if (max_optlen < 0)
+		return max_optlen;
+
 	if (!retval) {
 		/* If kernel getsockopt finished successfully,
 		 * copy whatever was returned to the user back
@@ -1404,10 +1417,8 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
 			goto out;
 		}
 
-		if (ctx.optlen > max_optlen)
-			ctx.optlen = max_optlen;
-
-		if (copy_from_user(ctx.optval, optval, ctx.optlen) != 0) {
+		if (copy_from_user(ctx.optval, optval,
+				   min(ctx.optlen, max_optlen)) != 0) {
 			ret = -EFAULT;
 			goto out;
 		}
@@ -1436,10 +1447,12 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
 		goto out;
 	}
 
-	if (copy_to_user(optval, ctx.optval, ctx.optlen) ||
-	    put_user(ctx.optlen, optlen)) {
-		ret = -EFAULT;
-		goto out;
+	if (ctx.optlen != 0) {
+		if (copy_to_user(optval, ctx.optval, ctx.optlen) ||
+		    put_user(ctx.optlen, optlen)) {
+			ret = -EFAULT;
+			goto out;
+		}
 	}
 
 	ret = ctx.retval;
-- 
2.27.0.290.gba653c62da-goog


^ permalink raw reply related

* [PATCH bpf v5 3/3] bpf: document optval > PAGE_SIZE behavior for sockopt hooks
From: Stanislav Fomichev @ 2020-06-17  1:04 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, Stanislav Fomichev
In-Reply-To: <20200617010416.93086-1-sdf@google.com>

Extend existing doc with more details about requiring ctx->optlen = 0
for handling optval > PAGE_SIZE.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 Documentation/bpf/prog_cgroup_sockopt.rst | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/Documentation/bpf/prog_cgroup_sockopt.rst b/Documentation/bpf/prog_cgroup_sockopt.rst
index c47d974629ae..172f957204bf 100644
--- a/Documentation/bpf/prog_cgroup_sockopt.rst
+++ b/Documentation/bpf/prog_cgroup_sockopt.rst
@@ -86,6 +86,20 @@ then the next program in the chain (A) will see those changes,
 *not* the original input ``setsockopt`` arguments. The potentially
 modified values will be then passed down to the kernel.
 
+Large optval
+============
+When the ``optval`` is greater than the ``PAGE_SIZE``, the BPF program
+can access only the first ``PAGE_SIZE`` of that data. So it has to options:
+
+* Set ``optlen`` to zero, which indicates that the kernel should
+  use the original buffer from the userspace. Any modifications
+  done by the BPF program to the ``optval`` are ignored.
+* Set ``optlen`` to the value less than ``PAGE_SIZE``, which
+  indicates that the kernel should use BPF's trimmed ``optval``.
+
+When the BPF program returns with the ``optlen`` greater than
+``PAGE_SIZE``, the userspace will receive ``EFAULT`` errno.
+
 Example
 =======
 
-- 
2.27.0.290.gba653c62da-goog


^ permalink raw reply related

* Re: [RFC PATCH bpf-next 2/8] libbpf: add support for extracting kernel symbol addresses
From: Hao Luo @ 2020-06-17  1:24 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team, Arnaldo Carvalho de Melo, Song Liu,
	Quentin Monnet
In-Reply-To: <CA+khW7iU4oT3N2fYK6ym7XtWAnyD4fmiMpkuNybrJSROJeuk8A@mail.gmail.com>

Andrii,

Do you think we need to put the kernel's variables in one single
DATASEC in vmlinux BTF? It looks like all the ksyms in the program
will be under one ".ksyms" section, so we are not able to tell whether
a ksym is from a percpu section or a .rodata section. Without this
information, if the vmlinux has multiple DATASECs, the loader may need
to traverse all of them. If vmlinux BTF has only one DATASEC, it
matches the object's BTF better.

Right now, the percpu vars are in a ".data..percpu" DATASEC in my
patch and the plan seems that we will introduce more DATASECs to hold
other data.

Please let me know your insights here. Thanks.

Hao

On Tue, Jun 16, 2020 at 1:05 AM Hao Luo <haoluo@google.com> wrote:
>
> On Mon, Jun 15, 2020 at 12:08 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Mon, Jun 15, 2020 at 9:44 AM Hao Luo <haoluo@google.com> wrote:
> > >
> > > Thanks, Andrii,
> > >
> > > This change looks nice! A couple of comments:
> > >
> > > 1. A 'void' type variable looks slightly odd from a user's perspective. How about using 'u64' or 'void *'? Or at least, a named type, which aliases to 'void'?
> >
> > That choice is very deliberate one. `extern const void` is the right
> > way in C language to access linker-generated symbols, for instance,
> > which is quite similar to what the intent is her. Having void type is
> > very explicit that you don't know/care about that value pointed to by
> > extern address, the only operation you can perform is to get it's
> > address.
> >
> > Once we add kernel variables support, that's when types will start to
> > be specified and libbpf will do extra checks (type matching) and extra
> > work (generating ldimm64 with BTF ID, for instance), to allow C code
> > to access data pointed to by extern address.
> >
> > Switching type to u64 would be misleading in allowing C code to
> > implicitly dereference value of extern. E.g., there is a big
> > difference between:
> >
> > extern u64 bla;
> >
> > printf("%lld\n", bla); /* de-reference happens here, we get contents
> > of memory pointed to by "bla" symbol */
> >
> > printf("%p\n", &bla); /* here we get value of linker symbol/address of
> > extern variable */
> >
> > Currently I explicitly support only the latter and want to prevent the
> > former, until we have kernel variables in BTF. Using `extern void`
> > makes compiler enforce that only the &bla form is allowed. Everything
> > else is compilation error.
> >
>
> Ah, I see. I've been taking the extern variable as an actual variable
> that contains the symbol's address, which is the first case. Your
> approach makes sense. Thanks for explaining.
>
> > > 2. About the type size of ksym, IIUC, it looks strange that the values read from kallsyms have 8 bytes but their corresponding vs->size is 4 bytes and vs->type points to 4-byte int. Can we make them of the same size?
> >
> > That's a bit of a hack on my part. Variable needs to point to some
> > type, which size will match the size of datasec's varinfo entry. This
> > is checked and enforced by kernel. I'm looking for 4-byte int, because
> > it's almost guaranteed that it will be present in program's BTF and I
> > won't have to explicitly add it (it's because all BPF programs return
> > int, so it must be in program's BTF already). While 8-byte long is
> > less likely to be there.
> >
> > In the future, if we have a nicer way to extend BTF (and we will
> > soon), we can do this a bit better, but either way that .ksyms DATASEC
> > type isn't used for anything (there is no map with that DATASEC as a
> > value type), so it doesn't matter.
> >
>
> Thanks for explaining, Andrii.
>
> These explanations as comments in the code would be quite helpful, IMHO.

^ permalink raw reply

* Re: [RFC PATCH bpf-next 2/8] libbpf: add support for extracting kernel symbol addresses
From: Andrii Nakryiko @ 2020-06-17  1:36 UTC (permalink / raw)
  To: Hao Luo
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team, Arnaldo Carvalho de Melo, Song Liu,
	Quentin Monnet
In-Reply-To: <CA+khW7gRw+4o2P+cDY+D08OPa8xH3msgQC7C5+9qMy0yashOsA@mail.gmail.com>

On Tue, Jun 16, 2020 at 6:24 PM Hao Luo <haoluo@google.com> wrote:
>
> Andrii,
>
> Do you think we need to put the kernel's variables in one single
> DATASEC in vmlinux BTF? It looks like all the ksyms in the program
> will be under one ".ksyms" section, so we are not able to tell whether
> a ksym is from a percpu section or a .rodata section. Without this
> information, if the vmlinux has multiple DATASECs, the loader may need
> to traverse all of them. If vmlinux BTF has only one DATASEC, it
> matches the object's BTF better.
>
> Right now, the percpu vars are in a ".data..percpu" DATASEC in my
> patch and the plan seems that we will introduce more DATASECs to hold
> other data.
>
> Please let me know your insights here. Thanks.

I think we should keep original DATASECs in vmlinux's BTF, so that
they match ELF sections. Otherwise BTF is going to lie and will cause
confusion down the road in the longer term.

On the BPF program side, though, I think we'll limit it to just two
special sections: .ksyms and .ksyms.percpu. libbpf will have to
traverse all vmlinux DATASECs to find corresponding variables, but
that's ok, it has to do the linear scan either way.

>
> Hao
>
> On Tue, Jun 16, 2020 at 1:05 AM Hao Luo <haoluo@google.com> wrote:
> >
> > On Mon, Jun 15, 2020 at 12:08 PM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Mon, Jun 15, 2020 at 9:44 AM Hao Luo <haoluo@google.com> wrote:
> > > >
> > > > Thanks, Andrii,
> > > >
> > > > This change looks nice! A couple of comments:
> > > >
> > > > 1. A 'void' type variable looks slightly odd from a user's perspective. How about using 'u64' or 'void *'? Or at least, a named type, which aliases to 'void'?
> > >
> > > That choice is very deliberate one. `extern const void` is the right
> > > way in C language to access linker-generated symbols, for instance,
> > > which is quite similar to what the intent is her. Having void type is
> > > very explicit that you don't know/care about that value pointed to by
> > > extern address, the only operation you can perform is to get it's
> > > address.
> > >
> > > Once we add kernel variables support, that's when types will start to
> > > be specified and libbpf will do extra checks (type matching) and extra
> > > work (generating ldimm64 with BTF ID, for instance), to allow C code
> > > to access data pointed to by extern address.
> > >
> > > Switching type to u64 would be misleading in allowing C code to
> > > implicitly dereference value of extern. E.g., there is a big
> > > difference between:
> > >
> > > extern u64 bla;
> > >
> > > printf("%lld\n", bla); /* de-reference happens here, we get contents
> > > of memory pointed to by "bla" symbol */
> > >
> > > printf("%p\n", &bla); /* here we get value of linker symbol/address of
> > > extern variable */
> > >
> > > Currently I explicitly support only the latter and want to prevent the
> > > former, until we have kernel variables in BTF. Using `extern void`
> > > makes compiler enforce that only the &bla form is allowed. Everything
> > > else is compilation error.
> > >
> >
> > Ah, I see. I've been taking the extern variable as an actual variable
> > that contains the symbol's address, which is the first case. Your
> > approach makes sense. Thanks for explaining.
> >
> > > > 2. About the type size of ksym, IIUC, it looks strange that the values read from kallsyms have 8 bytes but their corresponding vs->size is 4 bytes and vs->type points to 4-byte int. Can we make them of the same size?
> > >
> > > That's a bit of a hack on my part. Variable needs to point to some
> > > type, which size will match the size of datasec's varinfo entry. This
> > > is checked and enforced by kernel. I'm looking for 4-byte int, because
> > > it's almost guaranteed that it will be present in program's BTF and I
> > > won't have to explicitly add it (it's because all BPF programs return
> > > int, so it must be in program's BTF already). While 8-byte long is
> > > less likely to be there.
> > >
> > > In the future, if we have a nicer way to extend BTF (and we will
> > > soon), we can do this a bit better, but either way that .ksyms DATASEC
> > > type isn't used for anything (there is no map with that DATASEC as a
> > > value type), so it doesn't matter.
> > >
> >
> > Thanks for explaining, Andrii.
> >
> > These explanations as comments in the code would be quite helpful, IMHO.

^ permalink raw reply

* [PATCH] net: Fix the arp error in some cases
From: guodeqing @ 2020-06-17  2:07 UTC (permalink / raw)
  To: davem; +Cc: kuznet, netdev, dsa, kuba, geffrey.guo

ie.,
$ ifconfig eth0 6.6.6.6 netmask 255.255.255.0

$ ip rule add from 6.6.6.6 table 6666

$ ip route add 9.9.9.9 via 6.6.6.6

$ ping -I 6.6.6.6 9.9.9.9
PING 9.9.9.9 (9.9.9.9) from 6.6.6.6 : 56(84) bytes of data.

3 packets transmitted, 0 received, 100% packet loss, time 2079ms

$ arp
Address     HWtype  HWaddress           Flags Mask            Iface
6.6.6.6             (incomplete)                              eth0

The arp request address is error, this is because fib_table_lookup in 
fib_check_nh lookup the destnation 9.9.9.9 nexthop, the scope of 
the fib result is RT_SCOPE_LINK,the correct scope is RT_SCOPE_HOST.  
Here I add a check of whether this is RT_TABLE_MAIN to solve this problem. 

Fixes: 3bfd847203c6("net: Use passed in table for nexthop lookups")
Signed-off-by: guodeqing <geffrey.guo@huawei.com>
---
 net/ipv4/fib_semantics.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index e53871e..1f75dc6 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -1109,7 +1109,7 @@ static int fib_check_nh_v4_gw(struct net *net, struct fib_nh *nh, u32 table,
 		if (fl4.flowi4_scope < RT_SCOPE_LINK)
 			fl4.flowi4_scope = RT_SCOPE_LINK;
 
-		if (table)
+		if (table && table != RT_TABLE_MAIN)
 			tbl = fib_get_table(net, table);
 
 		if (tbl)
-- 
2.7.4


^ permalink raw reply related

* Re: [PATCH net v3 3/4] net/sched: cls_api: fix nooffloaddevcnt warning dmesg log
From: wenxu @ 2020-06-17  2:29 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netdev, davem, vladbu
In-Reply-To: <20200616201750.GA27024@salvia>


On 6/17/2020 4:17 AM, Pablo Neira Ayuso wrote:
> On Tue, Jun 16, 2020 at 11:19:39AM +0800, wenxu@ucloud.cn wrote:
>> From: wenxu <wenxu@ucloud.cn>
>>
>> When a indr device add in offload success. The block->nooffloaddevcnt
>> should be 0. After the representor go away. When the dir device go away
>> the flow_block UNBIND operation with -EOPNOTSUPP which lead the warning
>> dmesg log. 
>>
>> The block->nooffloaddevcnt should always count for indr block.
>> even the indr block offload successful. The representor maybe
>> gone away and the ingress qdisc can work in software mode.
>>
>> block->nooffloaddevcnt warning with following dmesg log:
>>
>> [  760.667058] #####################################################
>> [  760.668186] ## TEST test-ecmp-add-vxlan-encap-disable-sriov.sh ##
>> [  760.669179] #####################################################
>> [  761.780655] :test: Fedora 30 (Thirty)
>> [  761.783794] :test: Linux reg-r-vrt-018-180 5.7.0+
>> [  761.822890] :test: NIC ens1f0 FW 16.26.6000 PCI 0000:81:00.0 DEVICE 0x1019 ConnectX-5 Ex
>> [  761.860244] mlx5_core 0000:81:00.0 ens1f0: Link up
>> [  761.880693] IPv6: ADDRCONF(NETDEV_CHANGE): ens1f0: link becomes ready
>> [  762.059732] mlx5_core 0000:81:00.1 ens1f1: Link up
>> [  762.234341] :test: unbind vfs of ens1f0
>> [  762.257825] :test: Change ens1f0 eswitch (0000:81:00.0) mode to switchdev
>> [  762.291363] :test: unbind vfs of ens1f1
>> [  762.306914] :test: Change ens1f1 eswitch (0000:81:00.1) mode to switchdev
>> [  762.309237] mlx5_core 0000:81:00.1: E-Switch: Disable: mode(LEGACY), nvfs(2), active vports(3)
>> [  763.282598] mlx5_core 0000:81:00.1: E-Switch: Supported tc offload range - chains: 4294967294, prios: 4294967295
>> [  763.362825] mlx5_core 0000:81:00.1: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
>> [  763.444465] mlx5_core 0000:81:00.1 ens1f1: renamed from eth0
>> [  763.460088] mlx5_core 0000:81:00.1: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
>> [  763.502586] mlx5_core 0000:81:00.1: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
>> [  763.552429] ens1f1_0: renamed from eth0
>> [  763.569569] mlx5_core 0000:81:00.1: E-Switch: Enable: mode(OFFLOADS), nvfs(2), active vports(3)
>> [  763.629694] ens1f1_1: renamed from eth1
>> [  764.631552] IPv6: ADDRCONF(NETDEV_CHANGE): ens1f1_0: link becomes ready
>> [  764.670841] :test: unbind vfs of ens1f0
>> [  764.681966] :test: unbind vfs of ens1f1
>> [  764.726762] mlx5_core 0000:81:00.0 ens1f0: Link up
>> [  764.766511] mlx5_core 0000:81:00.1 ens1f1: Link up
>> [  764.797325] :test: Add multipath vxlan encap rule and disable sriov
>> [  764.798544] :test: config multipath route
>> [  764.812732] mlx5_core 0000:81:00.0: lag map port 1:2 port 2:2
>> [  764.874556] mlx5_core 0000:81:00.0: modify lag map port 1:1 port 2:2
>> [  765.603681] :test: OK
>> [  765.659048] IPv6: ADDRCONF(NETDEV_CHANGE): ens1f1_1: link becomes ready
>> [  765.675085] :test: verify rule in hw
>> [  765.694237] IPv6: ADDRCONF(NETDEV_CHANGE): ens1f0: link becomes ready
>> [  765.711892] IPv6: ADDRCONF(NETDEV_CHANGE): ens1f1: link becomes ready
>> [  766.979230] :test: OK
>> [  768.125419] :test: OK
>> [  768.127519] :test: - disable sriov ens1f1
>> [  768.131160] pci 0000:81:02.2: Removing from iommu group 75
>> [  768.132646] pci 0000:81:02.3: Removing from iommu group 76
>> [  769.179749] mlx5_core 0000:81:00.1: E-Switch: Disable: mode(OFFLOADS), nvfs(2), active vports(3)
>> [  769.455627] mlx5_core 0000:81:00.0: modify lag map port 1:1 port 2:1
>> [  769.703990] mlx5_core 0000:81:00.1: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
>> [  769.988637] mlx5_core 0000:81:00.1 ens1f1: renamed from eth0
>> [  769.990022] :test: - disable sriov ens1f0
>> [  769.994922] pci 0000:81:00.2: Removing from iommu group 73
>> [  769.997048] pci 0000:81:00.3: Removing from iommu group 74
>> [  771.035813] mlx5_core 0000:81:00.0: E-Switch: Disable: mode(OFFLOADS), nvfs(2), active vports(3)
>> [  771.339091] ------------[ cut here ]------------
>> [  771.340812] WARNING: CPU: 6 PID: 3448 at net/sched/cls_api.c:749 tcf_block_offload_unbind.isra.0+0x5c/0x60
>> [  771.341728] Modules linked in: act_mirred act_tunnel_key cls_flower dummy vxlan ip6_udp_tunnel udp_tunnel sch_ingress nfsv3 nfs_acl nfs lockd grace fscache tun bridge stp llc sunrpc rdma_ucm rdma_cm iw_cm ib_cm mlx5_ib ib_uverbs ib_core mlx5_core intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp mlxfw act_ct nf_flow_table kvm_intel nf_nat kvm nf_conntrack irqbypass crct10dif_pclmul igb crc32_pclmul nf_defrag_ipv6 libcrc32c nf_defrag_ipv4 crc32c_intel ghash_clmulni_intel ptp ipmi_ssif intel_cstate pps_c
>> ore ses intel_uncore mei_me iTCO_wdt joydev ipmi_si iTCO_vendor_support i2c_i801 enclosure mei ioatdma dca lpc_ich wmi ipmi_devintf pcspkr acpi_power_meter ipmi_msghandler acpi_pad ast i2c_algo_bit drm_vram_helper drm_kms_helper drm_ttm_helper ttm drm mpt3sas raid_class scsi_transport_sas
>> [  771.347818] CPU: 6 PID: 3448 Comm: test-ecmp-add-v Not tainted 5.7.0+ #1146
>> [  771.348727] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
>> [  771.349646] RIP: 0010:tcf_block_offload_unbind.isra.0+0x5c/0x60
>> [  771.350553] Code: 4a fd ff ff 83 f8 a1 74 0e 5b 4c 89 e7 5d 41 5c 41 5d e9 07 93 89 ff 8b 83 a0 00 00 00 8d 50 ff 89 93 a0 00 00 00 85 c0 75 df <0f> 0b eb db 0f 1f 44 00 00 41 57 41 56 41 55 41 89 cd 41 54 49 89
>> [  771.352420] RSP: 0018:ffffb33144cd3b00 EFLAGS: 00010246
>> [  771.353353] RAX: 0000000000000000 RBX: ffff8b37cf4b2800 RCX: 0000000000000000
>> [  771.354294] RDX: 00000000ffffffff RSI: ffff8b3b9aad0000 RDI: ffffffff8d5c6e20
>> [  771.355245] RBP: ffff8b37eb546948 R08: ffffffffc0b7a348 R09: ffff8b3b9aad0000
>> [  771.356189] R10: 0000000000000001 R11: ffff8b3ba7a0a1c0 R12: ffff8b37cf4b2850
>> [  771.357123] R13: ffff8b3b9aad0000 R14: ffff8b37cf4b2820 R15: ffff8b37cf4b2820
>> [  771.358039] FS:  00007f8a19b6e740(0000) GS:ffff8b3befa00000(0000) knlGS:0000000000000000
>> [  771.358965] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  771.359885] CR2: 00007f3afb91c1a0 CR3: 000000045133c004 CR4: 00000000001606e0
>> [  771.360825] Call Trace:
>> [  771.361764]  __tcf_block_put+0x84/0x150
>> [  771.362712]  ingress_destroy+0x1b/0x20 [sch_ingress]
>> [  771.363658]  qdisc_destroy+0x3e/0xc0
>> [  771.364594]  dev_shutdown+0x7a/0xa5
>> [  771.365522]  rollback_registered_many+0x20d/0x530
>> [  771.366458]  ? netdev_upper_dev_unlink+0x15d/0x1c0
>> [  771.367387]  unregister_netdevice_many.part.0+0xf/0x70
>> [  771.368310]  vxlan_netdevice_event+0xa4/0x110 [vxlan]
>> [  771.369454]  notifier_call_chain+0x4c/0x70
>> [  771.370579]  rollback_registered_many+0x2f5/0x530
>> [  771.371719]  rollback_registered+0x56/0x90
>> [  771.372843]  unregister_netdevice_queue+0x73/0xb0
>> [  771.373982]  unregister_netdev+0x18/0x20
>> [  771.375168]  mlx5e_vport_rep_unload+0x56/0xc0 [mlx5_core]
>> [  771.376327]  esw_offloads_disable+0x81/0x90 [mlx5_core]
>> [  771.377512]  mlx5_eswitch_disable_locked.cold+0xcb/0x1af [mlx5_core]
>> [  771.378679]  mlx5_eswitch_disable+0x44/0x60 [mlx5_core]
>> [  771.379822]  mlx5_device_disable_sriov+0xad/0xb0 [mlx5_core]
>> [  771.380968]  mlx5_core_sriov_configure+0xc1/0xe0 [mlx5_core]
>> [  771.382087]  sriov_numvfs_store+0xfc/0x130
>> [  771.383195]  kernfs_fop_write+0xce/0x1b0
>> [  771.384302]  vfs_write+0xb6/0x1a0
>> [  771.385410]  ksys_write+0x5f/0xe0
>> [  771.386500]  do_syscall_64+0x5b/0x1d0
>> [  771.387569]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>
>> Fixes: 0fdcf78d5973 ("net: use flow_indr_dev_setup_offload()")
>> Signed-off-by: wenxu <wenxu@ucloud.cn>
>> ---
>>  net/sched/cls_api.c | 24 ++++++++++++++----------
>>  1 file changed, 14 insertions(+), 10 deletions(-)
>>
>> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
>> index a00a203..86c3937 100644
>> --- a/net/sched/cls_api.c
>> +++ b/net/sched/cls_api.c
>> @@ -671,25 +671,29 @@ static int tcf_block_offload_cmd(struct tcf_block *block,
>>  				 struct netlink_ext_ack *extack)
>>  {
>>  	struct flow_block_offload bo = {};
>> -	int err;
>>  
>>  	tcf_block_offload_init(&bo, dev, command, ei->binder_type,
>>  			       &block->flow_block, tcf_block_shared(block),
>>  			       extack);
>>  
>> -	if (dev->netdev_ops->ndo_setup_tc)
>> +	if (dev->netdev_ops->ndo_setup_tc) {
>> +		int err;
>> +
>>  		err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_BLOCK, &bo);
>> -	else
>> -		err = flow_indr_dev_setup_offload(dev, TC_SETUP_BLOCK, block,
>> -						  &bo, tc_block_indr_cleanup);
>> +		if (err < 0) {
>> +			if (err != -EOPNOTSUPP)
>> +				NL_SET_ERR_MSG(extack, "Driver ndo_setup_tc failed");
>> +			return err;
>> +		}
>>  
>> -	if (err < 0) {
>> -		if (err != -EOPNOTSUPP)
>> -			NL_SET_ERR_MSG(extack, "Driver ndo_setup_tc failed");
>> -		return err;
>> +		return tcf_block_setup(block, &bo);
>>  	}
>>  
>> -	return tcf_block_setup(block, &bo);
>> +	flow_indr_dev_setup_offload(dev, TC_SETUP_BLOCK, block, &bo,
>> +				    tc_block_indr_cleanup);
>> +	tcf_block_setup(block, &bo);
>> +
>> +	return -EOPNOTSUPP;
> So tcf_block_offload_cmd() always return -EOPNOTSUPP for _BIND and
> _UNBIND operations after this patch ?

yes, The original design for tc subsystem block->nooffloaddevcnt should

always inc for indr tunnel.

>

^ permalink raw reply

* Re: [PATCH net v3 3/4] net/sched: cls_api: fix nooffloaddevcnt warning dmesg log
From: wenxu @ 2020-06-17  2:34 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netdev, davem, vladbu
In-Reply-To: <20200616203023.GA26605@salvia>


On 6/17/2020 4:30 AM, Pablo Neira Ayuso wrote:
> On Tue, Jun 16, 2020 at 10:17:50PM +0200, Pablo Neira Ayuso wrote:
>> On Tue, Jun 16, 2020 at 11:19:39AM +0800, wenxu@ucloud.cn wrote:
>>> From: wenxu <wenxu@ucloud.cn>
>>>
>>> When a indr device add in offload success. The block->nooffloaddevcnt
>>> should be 0. After the representor go away. When the dir device go away
>>> the flow_block UNBIND operation with -EOPNOTSUPP which lead the warning
>>> dmesg log. 
>>>
>>> The block->nooffloaddevcnt should always count for indr block.
>>> even the indr block offload successful. The representor maybe
>>> gone away and the ingress qdisc can work in software mode.
>>>
>>> block->nooffloaddevcnt warning with following dmesg log:
>>>
>>> [  760.667058] #####################################################
>>> [  760.668186] ## TEST test-ecmp-add-vxlan-encap-disable-sriov.sh ##
>>> [  760.669179] #####################################################
>>> [  761.780655] :test: Fedora 30 (Thirty)
>>> [  761.783794] :test: Linux reg-r-vrt-018-180 5.7.0+
>>> [  761.822890] :test: NIC ens1f0 FW 16.26.6000 PCI 0000:81:00.0 DEVICE 0x1019 ConnectX-5 Ex
>>> [  761.860244] mlx5_core 0000:81:00.0 ens1f0: Link up
>>> [  761.880693] IPv6: ADDRCONF(NETDEV_CHANGE): ens1f0: link becomes ready
>>> [  762.059732] mlx5_core 0000:81:00.1 ens1f1: Link up
>>> [  762.234341] :test: unbind vfs of ens1f0
>>> [  762.257825] :test: Change ens1f0 eswitch (0000:81:00.0) mode to switchdev
>>> [  762.291363] :test: unbind vfs of ens1f1
>>> [  762.306914] :test: Change ens1f1 eswitch (0000:81:00.1) mode to switchdev
>>> [  762.309237] mlx5_core 0000:81:00.1: E-Switch: Disable: mode(LEGACY), nvfs(2), active vports(3)
>>> [  763.282598] mlx5_core 0000:81:00.1: E-Switch: Supported tc offload range - chains: 4294967294, prios: 4294967295
>>> [  763.362825] mlx5_core 0000:81:00.1: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
>>> [  763.444465] mlx5_core 0000:81:00.1 ens1f1: renamed from eth0
>>> [  763.460088] mlx5_core 0000:81:00.1: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
>>> [  763.502586] mlx5_core 0000:81:00.1: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
>>> [  763.552429] ens1f1_0: renamed from eth0
>>> [  763.569569] mlx5_core 0000:81:00.1: E-Switch: Enable: mode(OFFLOADS), nvfs(2), active vports(3)
>>> [  763.629694] ens1f1_1: renamed from eth1
>>> [  764.631552] IPv6: ADDRCONF(NETDEV_CHANGE): ens1f1_0: link becomes ready
>>> [  764.670841] :test: unbind vfs of ens1f0
>>> [  764.681966] :test: unbind vfs of ens1f1
>>> [  764.726762] mlx5_core 0000:81:00.0 ens1f0: Link up
>>> [  764.766511] mlx5_core 0000:81:00.1 ens1f1: Link up
>>> [  764.797325] :test: Add multipath vxlan encap rule and disable sriov
>>> [  764.798544] :test: config multipath route
>>> [  764.812732] mlx5_core 0000:81:00.0: lag map port 1:2 port 2:2
>>> [  764.874556] mlx5_core 0000:81:00.0: modify lag map port 1:1 port 2:2
>>> [  765.603681] :test: OK
>>> [  765.659048] IPv6: ADDRCONF(NETDEV_CHANGE): ens1f1_1: link becomes ready
>>> [  765.675085] :test: verify rule in hw
>>> [  765.694237] IPv6: ADDRCONF(NETDEV_CHANGE): ens1f0: link becomes ready
>>> [  765.711892] IPv6: ADDRCONF(NETDEV_CHANGE): ens1f1: link becomes ready
>>> [  766.979230] :test: OK
>>> [  768.125419] :test: OK
>>> [  768.127519] :test: - disable sriov ens1f1
>>> [  768.131160] pci 0000:81:02.2: Removing from iommu group 75
>>> [  768.132646] pci 0000:81:02.3: Removing from iommu group 76
>>> [  769.179749] mlx5_core 0000:81:00.1: E-Switch: Disable: mode(OFFLOADS), nvfs(2), active vports(3)
>>> [  769.455627] mlx5_core 0000:81:00.0: modify lag map port 1:1 port 2:1
>>> [  769.703990] mlx5_core 0000:81:00.1: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
>>> [  769.988637] mlx5_core 0000:81:00.1 ens1f1: renamed from eth0
>>> [  769.990022] :test: - disable sriov ens1f0
>>> [  769.994922] pci 0000:81:00.2: Removing from iommu group 73
>>> [  769.997048] pci 0000:81:00.3: Removing from iommu group 74
>>> [  771.035813] mlx5_core 0000:81:00.0: E-Switch: Disable: mode(OFFLOADS), nvfs(2), active vports(3)
>>> [  771.339091] ------------[ cut here ]------------
>>> [  771.340812] WARNING: CPU: 6 PID: 3448 at net/sched/cls_api.c:749 tcf_block_offload_unbind.isra.0+0x5c/0x60
>>> [  771.341728] Modules linked in: act_mirred act_tunnel_key cls_flower dummy vxlan ip6_udp_tunnel udp_tunnel sch_ingress nfsv3 nfs_acl nfs lockd grace fscache tun bridge stp llc sunrpc rdma_ucm rdma_cm iw_cm ib_cm mlx5_ib ib_uverbs ib_core mlx5_core intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp mlxfw act_ct nf_flow_table kvm_intel nf_nat kvm nf_conntrack irqbypass crct10dif_pclmul igb crc32_pclmul nf_defrag_ipv6 libcrc32c nf_defrag_ipv4 crc32c_intel ghash_clmulni_intel ptp ipmi_ssif intel_cstate pps_c
>>> ore ses intel_uncore mei_me iTCO_wdt joydev ipmi_si iTCO_vendor_support i2c_i801 enclosure mei ioatdma dca lpc_ich wmi ipmi_devintf pcspkr acpi_power_meter ipmi_msghandler acpi_pad ast i2c_algo_bit drm_vram_helper drm_kms_helper drm_ttm_helper ttm drm mpt3sas raid_class scsi_transport_sas
>>> [  771.347818] CPU: 6 PID: 3448 Comm: test-ecmp-add-v Not tainted 5.7.0+ #1146
>>> [  771.348727] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
>>> [  771.349646] RIP: 0010:tcf_block_offload_unbind.isra.0+0x5c/0x60
>>> [  771.350553] Code: 4a fd ff ff 83 f8 a1 74 0e 5b 4c 89 e7 5d 41 5c 41 5d e9 07 93 89 ff 8b 83 a0 00 00 00 8d 50 ff 89 93 a0 00 00 00 85 c0 75 df <0f> 0b eb db 0f 1f 44 00 00 41 57 41 56 41 55 41 89 cd 41 54 49 89
>>> [  771.352420] RSP: 0018:ffffb33144cd3b00 EFLAGS: 00010246
>>> [  771.353353] RAX: 0000000000000000 RBX: ffff8b37cf4b2800 RCX: 0000000000000000
>>> [  771.354294] RDX: 00000000ffffffff RSI: ffff8b3b9aad0000 RDI: ffffffff8d5c6e20
>>> [  771.355245] RBP: ffff8b37eb546948 R08: ffffffffc0b7a348 R09: ffff8b3b9aad0000
>>> [  771.356189] R10: 0000000000000001 R11: ffff8b3ba7a0a1c0 R12: ffff8b37cf4b2850
>>> [  771.357123] R13: ffff8b3b9aad0000 R14: ffff8b37cf4b2820 R15: ffff8b37cf4b2820
>>> [  771.358039] FS:  00007f8a19b6e740(0000) GS:ffff8b3befa00000(0000) knlGS:0000000000000000
>>> [  771.358965] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [  771.359885] CR2: 00007f3afb91c1a0 CR3: 000000045133c004 CR4: 00000000001606e0
>>> [  771.360825] Call Trace:
>>> [  771.361764]  __tcf_block_put+0x84/0x150
>>> [  771.362712]  ingress_destroy+0x1b/0x20 [sch_ingress]
>>> [  771.363658]  qdisc_destroy+0x3e/0xc0
>>> [  771.364594]  dev_shutdown+0x7a/0xa5
>>> [  771.365522]  rollback_registered_many+0x20d/0x530
>>> [  771.366458]  ? netdev_upper_dev_unlink+0x15d/0x1c0
>>> [  771.367387]  unregister_netdevice_many.part.0+0xf/0x70
>>> [  771.368310]  vxlan_netdevice_event+0xa4/0x110 [vxlan]
>>> [  771.369454]  notifier_call_chain+0x4c/0x70
>>> [  771.370579]  rollback_registered_many+0x2f5/0x530
>>> [  771.371719]  rollback_registered+0x56/0x90
>>> [  771.372843]  unregister_netdevice_queue+0x73/0xb0
>>> [  771.373982]  unregister_netdev+0x18/0x20
>>> [  771.375168]  mlx5e_vport_rep_unload+0x56/0xc0 [mlx5_core]
>>> [  771.376327]  esw_offloads_disable+0x81/0x90 [mlx5_core]
>>> [  771.377512]  mlx5_eswitch_disable_locked.cold+0xcb/0x1af [mlx5_core]
>>> [  771.378679]  mlx5_eswitch_disable+0x44/0x60 [mlx5_core]
>>> [  771.379822]  mlx5_device_disable_sriov+0xad/0xb0 [mlx5_core]
>>> [  771.380968]  mlx5_core_sriov_configure+0xc1/0xe0 [mlx5_core]
>>> [  771.382087]  sriov_numvfs_store+0xfc/0x130
>>> [  771.383195]  kernfs_fop_write+0xce/0x1b0
>>> [  771.384302]  vfs_write+0xb6/0x1a0
>>> [  771.385410]  ksys_write+0x5f/0xe0
>>> [  771.386500]  do_syscall_64+0x5b/0x1d0
>>> [  771.387569]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>
>>> Fixes: 0fdcf78d5973 ("net: use flow_indr_dev_setup_offload()")
>>> Signed-off-by: wenxu <wenxu@ucloud.cn>
>>> ---
>>>  net/sched/cls_api.c | 24 ++++++++++++++----------
>>>  1 file changed, 14 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
>>> index a00a203..86c3937 100644
>>> --- a/net/sched/cls_api.c
>>> +++ b/net/sched/cls_api.c
>>> @@ -671,25 +671,29 @@ static int tcf_block_offload_cmd(struct tcf_block *block,
>>>  				 struct netlink_ext_ack *extack)
>>>  {
>>>  	struct flow_block_offload bo = {};
>>> -	int err;
>>>  
>>>  	tcf_block_offload_init(&bo, dev, command, ei->binder_type,
>>>  			       &block->flow_block, tcf_block_shared(block),
>>>  			       extack);
>>>  
>>> -	if (dev->netdev_ops->ndo_setup_tc)
>>> +	if (dev->netdev_ops->ndo_setup_tc) {
>>> +		int err;
>>> +
>>>  		err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_BLOCK, &bo);
>>> -	else
>>> -		err = flow_indr_dev_setup_offload(dev, TC_SETUP_BLOCK, block,
>>> -						  &bo, tc_block_indr_cleanup);
>>> +		if (err < 0) {
>>> +			if (err != -EOPNOTSUPP)
>>> +				NL_SET_ERR_MSG(extack, "Driver ndo_setup_tc failed");
>>> +			return err;
>>> +		}
>>>  
>>> -	if (err < 0) {
>>> -		if (err != -EOPNOTSUPP)
>>> -			NL_SET_ERR_MSG(extack, "Driver ndo_setup_tc failed");
>>> -		return err;
>>> +		return tcf_block_setup(block, &bo);
>>>  	}
>>>  
>>> -	return tcf_block_setup(block, &bo);
>>> +	flow_indr_dev_setup_offload(dev, TC_SETUP_BLOCK, block, &bo,
>>> +				    tc_block_indr_cleanup);
>>> +	tcf_block_setup(block, &bo);
>>> +
>>> +	return -EOPNOTSUPP;
>> So tcf_block_offload_cmd() always return -EOPNOTSUPP for _BIND and
>> _UNBIND operations after this patch ?
> tcf_block_offload_unbind() is not called from the
> tc_block_indr_cleanup() path.
>
> How is this patch related to 1/4, 2/4 and 4/4 ?
The bug will be triggered only with the problem described in patch1/2 are fixed.
>

^ permalink raw reply

* Re: [PATCH net v3 2/4] flow_offload: fix incorrect cb_priv check for flow_block_cb
From: wenxu @ 2020-06-17  2:42 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netdev, davem, vladbu
In-Reply-To: <20200616201348.GB26932@salvia>


On 6/17/2020 4:13 AM, Pablo Neira Ayuso wrote:
> On Tue, Jun 16, 2020 at 11:19:38AM +0800, wenxu@ucloud.cn wrote:
>> From: wenxu <wenxu@ucloud.cn>
>>
>> In the function __flow_block_indr_cleanup, The match stataments
>> this->cb_priv == cb_priv is always false, the flow_block_cb->cb_priv
>> is totally different data with the flow_indr_dev->cb_priv.
>>
>> Store the representor cb_priv to the flow_block_cb->indr.cb_priv in
>> the driver.
>>
>> Fixes: 1fac52da5942 ("net: flow_offload: consolidate indirect flow_block infrastructure")
>> Signed-off-by: wenxu <wenxu@ucloud.cn>
>> ---
>>  drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c        | 1 +
>>  drivers/net/ethernet/mellanox/mlx5/core/en/rep/tc.c | 2 +-
>>  drivers/net/ethernet/netronome/nfp/flower/offload.c | 1 +
>>  include/net/flow_offload.h                          | 1 +
>>  net/core/flow_offload.c                             | 2 +-
>>  5 files changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c
>> index ef7f6bc..042c285 100644
>> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c
>> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c
>> @@ -1918,6 +1918,7 @@ static int bnxt_tc_setup_indr_block(struct net_device *netdev, struct bnxt *bp,
>>  
>>  		flow_block_cb_add(block_cb, f);
>>  		list_add_tail(&block_cb->driver_list, &bnxt_block_cb_list);
>> +		block_cb->indr.cb_priv = bp;
> cb_indent ?
>
> Why are you splitting the fix in multiple patches? This makes it
> harder to review.
>
> I think patch 1/4, 2/4 and 4/4 belong to the same logical change?
> Collapse them.

I think patch 1/4, 2/4, 4/4 are different bugs, Although they are all in the new indirect

flow_block infrastructure.

patch1 fix the miss cleanup for flow_block_cb of flowtable

patch2 fix the incorrect check the cb_priv of flow_block_cb

patch4 fix the miss driver_list del in the cleanup callback

So maybe make them alone is better?


^ permalink raw reply

* Re: [PATCH net v3 2/4] flow_offload: fix incorrect cb_priv check for flow_block_cb
From: wenxu @ 2020-06-17  2:47 UTC (permalink / raw)
  To: Simon Horman; +Cc: netdev, davem, pablo, vladbu
In-Reply-To: <20200616154716.GA16382@netronome.com>


On 6/16/2020 11:47 PM, Simon Horman wrote:
> On Tue, Jun 16, 2020 at 11:18:16PM +0800, wenxu wrote:
>> 在 2020/6/16 22:34, Simon Horman 写道:
>>> On Tue, Jun 16, 2020 at 10:20:46PM +0800, wenxu wrote:
>>>> 在 2020/6/16 18:51, Simon Horman 写道:
>>>>> On Tue, Jun 16, 2020 at 11:19:38AM +0800, wenxu@ucloud.cn wrote:
>>>>>> From: wenxu <wenxu@ucloud.cn>
>>>>>>
>>>>>> In the function __flow_block_indr_cleanup, The match stataments
>>>>>> this->cb_priv == cb_priv is always false, the flow_block_cb->cb_priv
>>>>>> is totally different data with the flow_indr_dev->cb_priv.
>>>>>>
>>>>>> Store the representor cb_priv to the flow_block_cb->indr.cb_priv in
>>>>>> the driver.
>>>>>>
>>>>>> Fixes: 1fac52da5942 ("net: flow_offload: consolidate indirect flow_block infrastructure")
>>>>>> Signed-off-by: wenxu <wenxu@ucloud.cn>
>>>>> Hi Wenxu,
>>>>>
>>>>> I wonder if this can be resolved by using the cb_ident field of struct
>>>>> flow_block_cb.
>>>>>
>>>>> I observe that mlx5e_rep_indr_setup_block() seems to be the only call-site
>>>>> where the value of the cb_ident parameter of flow_block_cb_alloc() is
>>>>> per-block rather than per-device. So part of my proposal is to change
>>>>> that.
>>>> I check all the xxdriver_indr_setup_block. It seems all the cb_ident parameter of
>>>>
>>>> flow_block_cb_alloc is per-block. Both in the nfp_flower_setup_indr_tc_block
>>>>
>>>> and bnxt_tc_setup_indr_block.
>>>>
>>>>
>>>> nfp_flower_setup_indr_tc_block:
>>>>
>>>> struct nfp_flower_indr_block_cb_priv *cb_priv;
>>>>
>>>> block_cb = flow_block_cb_alloc(nfp_flower_setup_indr_block_cb,
>>>>                                                cb_priv, cb_priv,
>>>>                                                nfp_flower_setup_indr_tc_release);
>>>>
>>>>
>>>> bnxt_tc_setup_indr_block:
>>>>
>>>> struct bnxt_flower_indr_block_cb_priv *cb_priv;
>>>>
>>>> block_cb = flow_block_cb_alloc(bnxt_tc_setup_indr_block_cb,
>>>>                                                cb_priv, cb_priv,
>>>>                                                bnxt_tc_setup_indr_rel);
>>>>
>>>>
>>>> And the function flow_block_cb_is_busy called in most place. Pass the
>>>>
>>>> parameter as cb_priv but not cb_indent .
>>> Thanks, I see that now. But I still think it would be useful to understand
>>> the purpose of cb_ident. It feels like it would lead to a clean solution
>>> to the problem you have highlighted.
>> I think The cb_ident means identify.  It is used to identify the each flow block cb.
>>
>> In the both flow_block_cb_is_busy and flow_block_cb_lookup function check
>>
>> the block_cb->cb_ident == cb_ident.
> Thanks, I think that I now see what you mean about the different scope of
> cb_ident and your proposal to allow cleanup by flow_indr_dev_unregister().
>
> I do, however, still wonder if there is a nicer way than reaching into
> the structure and manually setting block_cb->indr.cb_priv
> at each call-site.
>
> Perhaps a variant of flow_block_cb_alloc() for indirect blocks
> would be nicer?
Yes, It seems a variant of flow_block_cb_alloc() for indirect blocks is better, Thanks.
>

^ permalink raw reply

* Re: [PATCH] net: Fix the arp error in some cases
From: David Ahern @ 2020-06-17  3:10 UTC (permalink / raw)
  To: guodeqing, davem; +Cc: kuznet, netdev, dsa, kuba
In-Reply-To: <1592359636-107798-1-git-send-email-geffrey.guo@huawei.com>

On 6/16/20 8:07 PM, guodeqing wrote:
> ie.,
> $ ifconfig eth0 6.6.6.6 netmask 255.255.255.0
> 
> $ ip rule add from 6.6.6.6 table 6666
> 
> $ ip route add 9.9.9.9 via 6.6.6.6
> 
> $ ping -I 6.6.6.6 9.9.9.9
> PING 9.9.9.9 (9.9.9.9) from 6.6.6.6 : 56(84) bytes of data.
> 
> 3 packets transmitted, 0 received, 100% packet loss, time 2079ms
> 
> $ arp
> Address     HWtype  HWaddress           Flags Mask            Iface
> 6.6.6.6             (incomplete)                              eth0
> 
> The arp request address is error, this is because fib_table_lookup in 
> fib_check_nh lookup the destnation 9.9.9.9 nexthop, the scope of 
> the fib result is RT_SCOPE_LINK,the correct scope is RT_SCOPE_HOST.  
> Here I add a check of whether this is RT_TABLE_MAIN to solve this problem.

fib_check_nh* is only used when the route is installed into the FIB to
verify the gateway is legit. It is not used when processing arp
requests. Why then, do you believe this fixes something?

^ permalink raw reply

* Re: [PATCH RFC v8 02/11] vhost: use batched get_vq_desc version
From: Jason Wang @ 2020-06-17  3:19 UTC (permalink / raw)
  To: Michael S. Tsirkin, linux-kernel; +Cc: kvm, virtualization, netdev, eperezma
In-Reply-To: <20200611113404.17810-3-mst@redhat.com>


On 2020/6/11 下午7:34, Michael S. Tsirkin wrote:
>   static void vhost_vq_free_iovecs(struct vhost_virtqueue *vq)
>   {
>   	kfree(vq->descs);
> @@ -394,6 +400,9 @@ static long vhost_dev_alloc_iovecs(struct vhost_dev *dev)
>   	for (i = 0; i < dev->nvqs; ++i) {
>   		vq = dev->vqs[i];
>   		vq->max_descs = dev->iov_limit;
> +		if (vhost_vq_num_batch_descs(vq) < 0) {
> +			return -EINVAL;
> +		}


This check breaks vdpa which set iov_limit to zero. Consider iov_limit 
is meaningless to vDPA, I wonder we can skip the test when device 
doesn't use worker.

Thanks


^ permalink raw reply

* Re: [PATCH net v3 2/4] flow_offload: fix incorrect cb_priv check for flow_block_cb
From: wenxu @ 2020-06-17  3:36 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Simon Horman; +Cc: netdev, davem, vladbu
In-Reply-To: <20200616203834.GA27394@salvia>


On 6/17/2020 4:38 AM, Pablo Neira Ayuso wrote:
> On Tue, Jun 16, 2020 at 05:47:17PM +0200, Simon Horman wrote:
>> On Tue, Jun 16, 2020 at 11:18:16PM +0800, wenxu wrote:
>>> 在 2020/6/16 22:34, Simon Horman 写道:
>>>> On Tue, Jun 16, 2020 at 10:20:46PM +0800, wenxu wrote:
>>>>> 在 2020/6/16 18:51, Simon Horman 写道:
>>>>>> On Tue, Jun 16, 2020 at 11:19:38AM +0800, wenxu@ucloud.cn wrote:
>>>>>>> From: wenxu <wenxu@ucloud.cn>
>>>>>>>
>>>>>>> In the function __flow_block_indr_cleanup, The match stataments
>>>>>>> this->cb_priv == cb_priv is always false, the flow_block_cb->cb_priv
>>>>>>> is totally different data with the flow_indr_dev->cb_priv.
>>>>>>>
>>>>>>> Store the representor cb_priv to the flow_block_cb->indr.cb_priv in
>>>>>>> the driver.
>>>>>>>
>>>>>>> Fixes: 1fac52da5942 ("net: flow_offload: consolidate indirect flow_block infrastructure")
>>>>>>> Signed-off-by: wenxu <wenxu@ucloud.cn>
>>>>>> Hi Wenxu,
>>>>>>
>>>>>> I wonder if this can be resolved by using the cb_ident field of struct
>>>>>> flow_block_cb.
>>>>>>
>>>>>> I observe that mlx5e_rep_indr_setup_block() seems to be the only call-site
>>>>>> where the value of the cb_ident parameter of flow_block_cb_alloc() is
>>>>>> per-block rather than per-device. So part of my proposal is to change
>>>>>> that.
>>>>> I check all the xxdriver_indr_setup_block. It seems all the cb_ident parameter of
>>>>>
>>>>> flow_block_cb_alloc is per-block. Both in the nfp_flower_setup_indr_tc_block
>>>>>
>>>>> and bnxt_tc_setup_indr_block.
>>>>>
>>>>>
>>>>> nfp_flower_setup_indr_tc_block:
>>>>>
>>>>> struct nfp_flower_indr_block_cb_priv *cb_priv;
>>>>>
>>>>> block_cb = flow_block_cb_alloc(nfp_flower_setup_indr_block_cb,
>>>>>                                                cb_priv, cb_priv,
>>>>>                                                nfp_flower_setup_indr_tc_release);
>>>>>
>>>>>
>>>>> bnxt_tc_setup_indr_block:
>>>>>
>>>>> struct bnxt_flower_indr_block_cb_priv *cb_priv;
>>>>>
>>>>> block_cb = flow_block_cb_alloc(bnxt_tc_setup_indr_block_cb,
>>>>>                                                cb_priv, cb_priv,
>>>>>                                                bnxt_tc_setup_indr_rel);
>>>>>
>>>>>
>>>>> And the function flow_block_cb_is_busy called in most place. Pass the
>>>>>
>>>>> parameter as cb_priv but not cb_indent .
>>>> Thanks, I see that now. But I still think it would be useful to understand
>>>> the purpose of cb_ident. It feels like it would lead to a clean solution
>>>> to the problem you have highlighted.
>>> I think The cb_ident means identify.  It is used to identify the each flow block cb.
>>>
>>> In the both flow_block_cb_is_busy and flow_block_cb_lookup function check
>>>
>>> the block_cb->cb_ident == cb_ident.
>> Thanks, I think that I now see what you mean about the different scope of
>> cb_ident and your proposal to allow cleanup by flow_indr_dev_unregister().
>>
>> I do, however, still wonder if there is a nicer way than reaching into
>> the structure and manually setting block_cb->indr.cb_priv
>> at each call-site.
>>
>> Perhaps a variant of flow_block_cb_alloc() for indirect blocks
>> would be nicer?
> A follow up patch to add this new variant would be good. Probably
> __flow_block_indr_binding() can go away with this new variant to set
> up the indirect flow block.


Maybe __flow_block_indr_binding() can't go away. The data and cleanup callback which should

init the flow_block_indr is only in the flow_indr_dev_setup_offload. This can't be gotten in

flow_indr_block_cb_alloc.

>

^ permalink raw reply

* 答复: [PATCH] net: Fix the arp error in some cases
From: Guodeqing (A) @ 2020-06-17  3:38 UTC (permalink / raw)
  To: David Ahern, davem@davemloft.net
  Cc: kuznet@ms2.inr.ac.ru, netdev@vger.kernel.org,
	dsa@cumulusnetworks.com, kuba@kernel.org
In-Reply-To: <39780a81-8ac8-871b-2176-2102322f9321@gmail.com>

rt_set_nexthop in __mkroute_output will check the nh->nh_scope value to determine whether to use the gw or not.
		if (nh->nh_gw && nh->nh_scope == RT_SCOPE_LINK) {
			rt->rt_gateway = nh->nh_gw;
			rt->rt_uses_gateway = 1;
		}

(ip_route_output_key_hash-> ip_route_output_key_hash_rcu-> __mkroute_output-> rt_set_nexthop)

-----邮件原件-----
发件人: David Ahern [mailto:dsahern@gmail.com] 
发送时间: Wednesday, June 17, 2020 11:10
收件人: Guodeqing (A) <geffrey.guo@huawei.com>; davem@davemloft.net
抄送: kuznet@ms2.inr.ac.ru; netdev@vger.kernel.org; dsa@cumulusnetworks.com; kuba@kernel.org
主题: Re: [PATCH] net: Fix the arp error in some cases

On 6/16/20 8:07 PM, guodeqing wrote:
> ie.,
> $ ifconfig eth0 6.6.6.6 netmask 255.255.255.0
> 
> $ ip rule add from 6.6.6.6 table 6666
> 
> $ ip route add 9.9.9.9 via 6.6.6.6
> 
> $ ping -I 6.6.6.6 9.9.9.9
> PING 9.9.9.9 (9.9.9.9) from 6.6.6.6 : 56(84) bytes of data.
> 
> 3 packets transmitted, 0 received, 100% packet loss, time 2079ms
> 
> $ arp
> Address     HWtype  HWaddress           Flags Mask            Iface
> 6.6.6.6             (incomplete)                              eth0
> 
> The arp request address is error, this is because fib_table_lookup in 
> fib_check_nh lookup the destnation 9.9.9.9 nexthop, the scope of the 
> fib result is RT_SCOPE_LINK,the correct scope is RT_SCOPE_HOST.
> Here I add a check of whether this is RT_TABLE_MAIN to solve this problem.

fib_check_nh* is only used when the route is installed into the FIB to verify the gateway is legit. It is not used when processing arp requests. Why then, do you believe this fixes something?

^ permalink raw reply

* [PATCH 0/4] Bluetooth: Implement get/set device flags and device flags changed
From: Abhishek Pandit-Subedi @ 2020-06-17  4:00 UTC (permalink / raw)
  To: marcel, linux-bluetooth
  Cc: alainm, chromeos-bluetooth-upstreaming, Abhishek Pandit-Subedi,
	David S. Miller, Johan Hedberg, netdev, linux-kernel,
	Jakub Kicinski


Hi linux-bluetooth,

This series adds support for configuring the Remote Wakeup flag on
devices by implementing Get Device Flags, Set Device Flags and Device
Flags Changed.

This was tested with some userspace changes to update the Remote Wakeup
flag (these changes will be upstreamed as Bluez patches once they're
cleaned up). I verified that Add Device generates the Device Flags
changed on all mgmt interfaces and Set Device Flags skips the one that
requested it.

This was tested on a Chromebook running kernel 5.4.

Abhishek



Abhishek Pandit-Subedi (4):
  Bluetooth: Add bdaddr_list_with_flags for classic whitelist
  Bluetooth: Replace wakeable list with flag
  Bluetooth: Replace wakeable in hci_conn_params
  Bluetooth: Add get/set device flags mgmt op

 include/net/bluetooth/hci.h      |   1 +
 include/net/bluetooth/hci_core.h |  31 ++++++-
 include/net/bluetooth/mgmt.h     |  28 +++++++
 net/bluetooth/hci_core.c         |  59 ++++++++++++-
 net/bluetooth/hci_event.c        |   8 +-
 net/bluetooth/hci_request.c      |  15 ++--
 net/bluetooth/hci_sock.c         |   1 +
 net/bluetooth/mgmt.c             | 139 ++++++++++++++++++++++++++++++-
 8 files changed, 266 insertions(+), 16 deletions(-)

-- 
2.27.0.290.gba653c62da-goog


^ permalink raw reply

* [PATCH 2/4] Bluetooth: Replace wakeable list with flag
From: Abhishek Pandit-Subedi @ 2020-06-17  4:00 UTC (permalink / raw)
  To: marcel, linux-bluetooth
  Cc: alainm, chromeos-bluetooth-upstreaming, Abhishek Pandit-Subedi,
	David S. Miller, Johan Hedberg, netdev, linux-kernel,
	Jakub Kicinski
In-Reply-To: <20200617040022.174448-1-abhishekpandit@chromium.org>

Since the classic device list now supports flags, convert the wakeable
list into a flag on the existing device list.

Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>
Reviewed-by: Alain Michaud <alainm@chromium.org>
---

 include/net/bluetooth/hci_core.h | 11 ++++++++++-
 net/bluetooth/hci_core.c         |  1 -
 net/bluetooth/hci_request.c      | 12 ++++++++----
 3 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h
index 95a3935325bbbc..0643c737ba8528 100644
--- a/include/net/bluetooth/hci_core.h
+++ b/include/net/bluetooth/hci_core.h
@@ -143,6 +143,16 @@ struct bdaddr_list_with_flags {
 	u32 current_flags;
 };
 
+enum hci_conn_flags {
+	HCI_CONN_FLAG_REMOTE_WAKEUP,
+	HCI_CONN_FLAG_MAX
+};
+
+#define hci_conn_test_flag(nr, flags) ((flags) & (1U << nr))
+
+/* Make sure number of flags doesn't exceed sizeof(current_flags) */
+static_assert(HCI_CONN_FLAG_MAX < 32);
+
 struct bt_uuid {
 	struct list_head list;
 	u8 uuid[16];
@@ -463,7 +473,6 @@ struct hci_dev {
 	struct list_head	mgmt_pending;
 	struct list_head	blacklist;
 	struct list_head	whitelist;
-	struct list_head	wakeable;
 	struct list_head	uuids;
 	struct list_head	link_keys;
 	struct list_head	long_term_keys;
diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
index 8a471bec2731ed..8e01afb2ee8c5c 100644
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -3499,7 +3499,6 @@ struct hci_dev *hci_alloc_dev(void)
 	INIT_LIST_HEAD(&hdev->mgmt_pending);
 	INIT_LIST_HEAD(&hdev->blacklist);
 	INIT_LIST_HEAD(&hdev->whitelist);
-	INIT_LIST_HEAD(&hdev->wakeable);
 	INIT_LIST_HEAD(&hdev->uuids);
 	INIT_LIST_HEAD(&hdev->link_keys);
 	INIT_LIST_HEAD(&hdev->long_term_keys);
diff --git a/net/bluetooth/hci_request.c b/net/bluetooth/hci_request.c
index a7f572ad38ef08..a5b53d3ea50802 100644
--- a/net/bluetooth/hci_request.c
+++ b/net/bluetooth/hci_request.c
@@ -968,15 +968,19 @@ static void hci_req_clear_event_filter(struct hci_request *req)
 
 static void hci_req_set_event_filter(struct hci_request *req)
 {
-	struct bdaddr_list *b;
+	struct bdaddr_list_with_flags *b;
 	struct hci_cp_set_event_filter f;
 	struct hci_dev *hdev = req->hdev;
-	u8 scan;
+	u8 scan = SCAN_DISABLED;
 
 	/* Always clear event filter when starting */
 	hci_req_clear_event_filter(req);
 
-	list_for_each_entry(b, &hdev->wakeable, list) {
+	list_for_each_entry(b, &hdev->whitelist, list) {
+		if (!hci_conn_test_flag(HCI_CONN_FLAG_REMOTE_WAKEUP,
+					b->current_flags))
+			continue;
+
 		memset(&f, 0, sizeof(f));
 		bacpy(&f.addr_conn_flt.bdaddr, &b->bdaddr);
 		f.flt_type = HCI_FLT_CONN_SETUP;
@@ -985,9 +989,9 @@ static void hci_req_set_event_filter(struct hci_request *req)
 
 		bt_dev_dbg(hdev, "Adding event filters for %pMR", &b->bdaddr);
 		hci_req_add(req, HCI_OP_SET_EVENT_FLT, sizeof(f), &f);
+		scan = SCAN_PAGE;
 	}
 
-	scan = !list_empty(&hdev->wakeable) ? SCAN_PAGE : SCAN_DISABLED;
 	hci_req_add(req, HCI_OP_WRITE_SCAN_ENABLE, 1, &scan);
 }
 
-- 
2.27.0.290.gba653c62da-goog


^ permalink raw reply related

* [PATCH 1/4] Bluetooth: Add bdaddr_list_with_flags for classic whitelist
From: Abhishek Pandit-Subedi @ 2020-06-17  4:00 UTC (permalink / raw)
  To: marcel, linux-bluetooth
  Cc: alainm, chromeos-bluetooth-upstreaming, Abhishek Pandit-Subedi,
	David S. Miller, Johan Hedberg, netdev, linux-kernel,
	Jakub Kicinski
In-Reply-To: <20200617040022.174448-1-abhishekpandit@chromium.org>

In order to more easily add device flags to classic devices, create
a new type of bdaddr_list that supports setting flags.

Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>
Reviewed-by: Alain Michaud <alainm@chromium.org>
---

 include/net/bluetooth/hci_core.h | 18 ++++++++--
 net/bluetooth/hci_core.c         | 58 ++++++++++++++++++++++++++++++++
 net/bluetooth/hci_event.c        |  8 ++---
 net/bluetooth/mgmt.c             |  5 +--
 4 files changed, 81 insertions(+), 8 deletions(-)

diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h
index 0d5dbb6cb5a089..95a3935325bbbc 100644
--- a/include/net/bluetooth/hci_core.h
+++ b/include/net/bluetooth/hci_core.h
@@ -136,6 +136,13 @@ struct bdaddr_list_with_irk {
 	u8 local_irk[16];
 };
 
+struct bdaddr_list_with_flags {
+	struct list_head list;
+	bdaddr_t bdaddr;
+	u8 bdaddr_type;
+	u32 current_flags;
+};
+
 struct bt_uuid {
 	struct list_head list;
 	u8 uuid[16];
@@ -1169,12 +1176,19 @@ struct bdaddr_list *hci_bdaddr_list_lookup(struct list_head *list,
 struct bdaddr_list_with_irk *hci_bdaddr_list_lookup_with_irk(
 				    struct list_head *list, bdaddr_t *bdaddr,
 				    u8 type);
+struct bdaddr_list_with_flags *
+hci_bdaddr_list_lookup_with_flags(struct list_head *list, bdaddr_t *bdaddr,
+				  u8 type);
 int hci_bdaddr_list_add(struct list_head *list, bdaddr_t *bdaddr, u8 type);
 int hci_bdaddr_list_add_with_irk(struct list_head *list, bdaddr_t *bdaddr,
-					u8 type, u8 *peer_irk, u8 *local_irk);
+				 u8 type, u8 *peer_irk, u8 *local_irk);
+int hci_bdaddr_list_add_with_flags(struct list_head *list, bdaddr_t *bdaddr,
+				   u8 type, u32 flags);
 int hci_bdaddr_list_del(struct list_head *list, bdaddr_t *bdaddr, u8 type);
 int hci_bdaddr_list_del_with_irk(struct list_head *list, bdaddr_t *bdaddr,
-								u8 type);
+				 u8 type);
+int hci_bdaddr_list_del_with_flags(struct list_head *list, bdaddr_t *bdaddr,
+				   u8 type);
 void hci_bdaddr_list_clear(struct list_head *list);
 
 struct hci_conn_params *hci_conn_params_lookup(struct hci_dev *hdev,
diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
index 4f1052a7c488e5..8a471bec2731ed 100644
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -3023,6 +3023,20 @@ struct bdaddr_list_with_irk *hci_bdaddr_list_lookup_with_irk(
 	return NULL;
 }
 
+struct bdaddr_list_with_flags *
+hci_bdaddr_list_lookup_with_flags(struct list_head *bdaddr_list,
+				  bdaddr_t *bdaddr, u8 type)
+{
+	struct bdaddr_list_with_flags *b;
+
+	list_for_each_entry(b, bdaddr_list, list) {
+		if (!bacmp(&b->bdaddr, bdaddr) && b->bdaddr_type == type)
+			return b;
+	}
+
+	return NULL;
+}
+
 void hci_bdaddr_list_clear(struct list_head *bdaddr_list)
 {
 	struct bdaddr_list *b, *n;
@@ -3084,6 +3098,30 @@ int hci_bdaddr_list_add_with_irk(struct list_head *list, bdaddr_t *bdaddr,
 	return 0;
 }
 
+int hci_bdaddr_list_add_with_flags(struct list_head *list, bdaddr_t *bdaddr,
+				   u8 type, u32 flags)
+{
+	struct bdaddr_list_with_flags *entry;
+
+	if (!bacmp(bdaddr, BDADDR_ANY))
+		return -EBADF;
+
+	if (hci_bdaddr_list_lookup(list, bdaddr, type))
+		return -EEXIST;
+
+	entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+	if (!entry)
+		return -ENOMEM;
+
+	bacpy(&entry->bdaddr, bdaddr);
+	entry->bdaddr_type = type;
+	entry->current_flags = flags;
+
+	list_add(&entry->list, list);
+
+	return 0;
+}
+
 int hci_bdaddr_list_del(struct list_head *list, bdaddr_t *bdaddr, u8 type)
 {
 	struct bdaddr_list *entry;
@@ -3123,6 +3161,26 @@ int hci_bdaddr_list_del_with_irk(struct list_head *list, bdaddr_t *bdaddr,
 	return 0;
 }
 
+int hci_bdaddr_list_del_with_flags(struct list_head *list, bdaddr_t *bdaddr,
+				   u8 type)
+{
+	struct bdaddr_list_with_flags *entry;
+
+	if (!bacmp(bdaddr, BDADDR_ANY)) {
+		hci_bdaddr_list_clear(list);
+		return 0;
+	}
+
+	entry = hci_bdaddr_list_lookup_with_flags(list, bdaddr, type);
+	if (!entry)
+		return -ENOENT;
+
+	list_del(&entry->list);
+	kfree(entry);
+
+	return 0;
+}
+
 /* This function requires the caller holds hdev->lock */
 struct hci_conn_params *hci_conn_params_lookup(struct hci_dev *hdev,
 					       bdaddr_t *addr, u8 addr_type)
diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
index cfeaee347db32d..8981954ff4c47d 100644
--- a/net/bluetooth/hci_event.c
+++ b/net/bluetooth/hci_event.c
@@ -2697,10 +2697,10 @@ static void hci_conn_request_evt(struct hci_dev *hdev, struct sk_buff *skb)
 	 */
 	if (hci_dev_test_flag(hdev, HCI_MGMT) &&
 	    !hci_dev_test_flag(hdev, HCI_CONNECTABLE) &&
-	    !hci_bdaddr_list_lookup(&hdev->whitelist, &ev->bdaddr,
-				    BDADDR_BREDR)) {
-		    hci_reject_conn(hdev, &ev->bdaddr);
-		    return;
+	    !hci_bdaddr_list_lookup_with_flags(&hdev->whitelist, &ev->bdaddr,
+					       BDADDR_BREDR)) {
+		hci_reject_conn(hdev, &ev->bdaddr);
+		return;
 	}
 
 	/* Connection accepted */
diff --git a/net/bluetooth/mgmt.c b/net/bluetooth/mgmt.c
index 99fbfd467d0465..6d996e5e5bcc2d 100644
--- a/net/bluetooth/mgmt.c
+++ b/net/bluetooth/mgmt.c
@@ -5997,8 +5997,9 @@ static int add_device(struct sock *sk, struct hci_dev *hdev,
 			goto unlock;
 		}
 
-		err = hci_bdaddr_list_add(&hdev->whitelist, &cp->addr.bdaddr,
-					  cp->addr.type);
+		err = hci_bdaddr_list_add_with_flags(&hdev->whitelist,
+						     &cp->addr.bdaddr,
+						     cp->addr.type, 0);
 		if (err)
 			goto unlock;
 
-- 
2.27.0.290.gba653c62da-goog


^ permalink raw reply related

* [PATCH 3/4] Bluetooth: Replace wakeable in hci_conn_params
From: Abhishek Pandit-Subedi @ 2020-06-17  4:00 UTC (permalink / raw)
  To: marcel, linux-bluetooth
  Cc: alainm, chromeos-bluetooth-upstreaming, Abhishek Pandit-Subedi,
	David S. Miller, Johan Hedberg, netdev, linux-kernel,
	Jakub Kicinski
In-Reply-To: <20200617040022.174448-1-abhishekpandit@chromium.org>

Replace the wakeable boolean with flags in hci_conn_params and all users
of this boolean. This will be used by the get/set device flags mgmt op.

Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>
Reviewed-by: Alain Michaud <alainm@chromium.org>
---

 include/net/bluetooth/hci_core.h | 2 +-
 net/bluetooth/hci_request.c      | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h
index 0643c737ba8528..6f88e5d81bd24f 100644
--- a/include/net/bluetooth/hci_core.h
+++ b/include/net/bluetooth/hci_core.h
@@ -660,7 +660,7 @@ struct hci_conn_params {
 
 	struct hci_conn *conn;
 	bool explicit_connect;
-	bool wakeable;
+	u32 current_flags;
 };
 
 extern struct list_head hci_dev_list;
diff --git a/net/bluetooth/hci_request.c b/net/bluetooth/hci_request.c
index a5b53d3ea50802..eee9c007a5fb40 100644
--- a/net/bluetooth/hci_request.c
+++ b/net/bluetooth/hci_request.c
@@ -710,7 +710,8 @@ static int add_to_white_list(struct hci_request *req,
 	}
 
 	/* During suspend, only wakeable devices can be in whitelist */
-	if (hdev->suspended && !params->wakeable)
+	if (hdev->suspended && !hci_conn_test_flag(HCI_CONN_FLAG_REMOTE_WAKEUP,
+						   params->current_flags))
 		return 0;
 
 	*num_entries += 1;
-- 
2.27.0.290.gba653c62da-goog


^ permalink raw reply related

* [PATCH 4/4] Bluetooth: Add get/set device flags mgmt op
From: Abhishek Pandit-Subedi @ 2020-06-17  4:00 UTC (permalink / raw)
  To: marcel, linux-bluetooth
  Cc: alainm, chromeos-bluetooth-upstreaming, Abhishek Pandit-Subedi,
	David S. Miller, Johan Hedberg, netdev, linux-kernel,
	Jakub Kicinski
In-Reply-To: <20200617040022.174448-1-abhishekpandit@chromium.org>

Add the get device flags and set device flags mgmt ops and the device
flags changed event. Their behavior is described in detail in
mgmt-api.txt in bluez.

Sample btmon trace when a HID device is added (trimmed to 75 chars):

@ MGMT Command: Unknown (0x0050) plen 11        {0x0001} [hci0] 18:06:14.98
        90 c5 13 cd f3 cd 02 01 00 00 00                 ...........
@ MGMT Event: Unknown (0x002a) plen 15          {0x0004} [hci0] 18:06:14.98
        90 c5 13 cd f3 cd 02 01 00 00 00 01 00 00 00     ...............
@ MGMT Event: Unknown (0x002a) plen 15          {0x0003} [hci0] 18:06:14.98
        90 c5 13 cd f3 cd 02 01 00 00 00 01 00 00 00     ...............
@ MGMT Event: Unknown (0x002a) plen 15          {0x0002} [hci0] 18:06:14.98
        90 c5 13 cd f3 cd 02 01 00 00 00 01 00 00 00     ...............
@ MGMT Event: Command Compl.. (0x0001) plen 10  {0x0001} [hci0] 18:06:14.98
      Unknown (0x0050) plen 7
        Status: Success (0x00)
        90 c5 13 cd f3 cd 02                             .......
@ MGMT Command: Add Device (0x0033) plen 8      {0x0001} [hci0] 18:06:14.98
        LE Address: CD:F3:CD:13:C5:90 (Static)
        Action: Auto-connect remote device (0x02)
@ MGMT Event: Device Added (0x001a) plen 8      {0x0004} [hci0] 18:06:14.98
        LE Address: CD:F3:CD:13:C5:90 (Static)
        Action: Auto-connect remote device (0x02)
@ MGMT Event: Device Added (0x001a) plen 8      {0x0003} [hci0] 18:06:14.98
        LE Address: CD:F3:CD:13:C5:90 (Static)
        Action: Auto-connect remote device (0x02)
@ MGMT Event: Device Added (0x001a) plen 8      {0x0002} [hci0] 18:06:14.98
        LE Address: CD:F3:CD:13:C5:90 (Static)
        Action: Auto-connect remote device (0x02)
@ MGMT Event: Unknown (0x002a) plen 15          {0x0004} [hci0] 18:06:14.98
        90 c5 13 cd f3 cd 02 01 00 00 00 01 00 00 00     ...............
@ MGMT Event: Unknown (0x002a) plen 15          {0x0003} [hci0] 18:06:14.98
        90 c5 13 cd f3 cd 02 01 00 00 00 01 00 00 00     ...............
@ MGMT Event: Unknown (0x002a) plen 15          {0x0002} [hci0] 18:06:14.98
        90 c5 13 cd f3 cd 02 01 00 00 00 01 00 00 00     ...............
@ MGMT Event: Unknown (0x002a) plen 15          {0x0001} [hci0] 18:06:14.98
        90 c5 13 cd f3 cd 02 01 00 00 00 01 00 00 00     ...............

Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>
Reviewed-by: Alain Michaud <alainm@chromium.org>
---

 include/net/bluetooth/hci.h  |   1 +
 include/net/bluetooth/mgmt.h |  28 ++++++++
 net/bluetooth/hci_sock.c     |   1 +
 net/bluetooth/mgmt.c         | 134 +++++++++++++++++++++++++++++++++++
 4 files changed, 164 insertions(+)

diff --git a/include/net/bluetooth/hci.h b/include/net/bluetooth/hci.h
index 16ab6ce8788341..5e03aac76ad47f 100644
--- a/include/net/bluetooth/hci.h
+++ b/include/net/bluetooth/hci.h
@@ -259,6 +259,7 @@ enum {
 	HCI_MGMT_LOCAL_NAME_EVENTS,
 	HCI_MGMT_OOB_DATA_EVENTS,
 	HCI_MGMT_EXP_FEATURE_EVENTS,
+	HCI_MGMT_DEVICE_FLAGS_EVENTS,
 };
 
 /*
diff --git a/include/net/bluetooth/mgmt.h b/include/net/bluetooth/mgmt.h
index e515288f328f47..8e47b0c5fe52bb 100644
--- a/include/net/bluetooth/mgmt.h
+++ b/include/net/bluetooth/mgmt.h
@@ -720,6 +720,27 @@ struct mgmt_rp_set_exp_feature {
 #define MGMT_OP_SET_DEF_RUNTIME_CONFIG	0x004e
 #define MGMT_SET_DEF_RUNTIME_CONFIG_SIZE	0
 
+#define MGMT_OP_GET_DEVICE_FLAGS	0x004F
+#define MGMT_GET_DEVICE_FLAGS_SIZE	7
+struct mgmt_cp_get_device_flags {
+	struct mgmt_addr_info addr;
+} __packed;
+struct mgmt_rp_get_device_flags {
+	struct mgmt_addr_info addr;
+	__le32 supported_flags;
+	__le32 current_flags;
+} __packed;
+
+#define MGMT_OP_SET_DEVICE_FLAGS	0x0050
+#define MGMT_SET_DEVICE_FLAGS_SIZE	11
+struct mgmt_cp_set_device_flags {
+	struct mgmt_addr_info addr;
+	__le32 current_flags;
+} __packed;
+struct mgmt_rp_set_device_flags {
+	struct mgmt_addr_info addr;
+} __packed;
+
 #define MGMT_EV_CMD_COMPLETE		0x0001
 struct mgmt_ev_cmd_complete {
 	__le16	opcode;
@@ -951,3 +972,10 @@ struct mgmt_ev_exp_feature_changed {
 	__u8	uuid[16];
 	__le32	flags;
 } __packed;
+
+#define MGMT_EV_DEVICE_FLAGS_CHANGED		0x002a
+struct mgmt_ev_device_flags_changed {
+	struct mgmt_addr_info addr;
+	__le32 supported_flags;
+	__le32 current_flags;
+} __packed;
diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
index d5627967fc254f..a7903b6206620c 100644
--- a/net/bluetooth/hci_sock.c
+++ b/net/bluetooth/hci_sock.c
@@ -1354,6 +1354,7 @@ static int hci_sock_bind(struct socket *sock, struct sockaddr *addr,
 			hci_sock_set_flag(sk, HCI_MGMT_SETTING_EVENTS);
 			hci_sock_set_flag(sk, HCI_MGMT_DEV_CLASS_EVENTS);
 			hci_sock_set_flag(sk, HCI_MGMT_LOCAL_NAME_EVENTS);
+			hci_sock_set_flag(sk, HCI_MGMT_DEVICE_FLAGS_EVENTS);
 		}
 		break;
 	}
diff --git a/net/bluetooth/mgmt.c b/net/bluetooth/mgmt.c
index 6d996e5e5bcc2d..2805f662d85695 100644
--- a/net/bluetooth/mgmt.c
+++ b/net/bluetooth/mgmt.c
@@ -114,6 +114,8 @@ static const u16 mgmt_commands[] = {
 	MGMT_OP_SET_EXP_FEATURE,
 	MGMT_OP_READ_DEF_SYSTEM_CONFIG,
 	MGMT_OP_SET_DEF_SYSTEM_CONFIG,
+	MGMT_OP_GET_DEVICE_FLAGS,
+	MGMT_OP_SET_DEVICE_FLAGS,
 };
 
 static const u16 mgmt_events[] = {
@@ -154,6 +156,7 @@ static const u16 mgmt_events[] = {
 	MGMT_EV_EXT_INFO_CHANGED,
 	MGMT_EV_PHY_CONFIGURATION_CHANGED,
 	MGMT_EV_EXP_FEATURE_CHANGED,
+	MGMT_EV_DEVICE_FLAGS_CHANGED,
 };
 
 static const u16 mgmt_untrusted_commands[] = {
@@ -3853,6 +3856,122 @@ static int set_exp_feature(struct sock *sk, struct hci_dev *hdev,
 			       MGMT_STATUS_NOT_SUPPORTED);
 }
 
+#define SUPPORTED_DEVICE_FLAGS() ((1U << HCI_CONN_FLAG_MAX) - 1)
+
+static int get_device_flags(struct sock *sk, struct hci_dev *hdev, void *data,
+			    u16 data_len)
+{
+	struct mgmt_cp_get_device_flags *cp = data;
+	struct mgmt_rp_get_device_flags rp;
+	struct bdaddr_list_with_flags *br_params;
+	struct hci_conn_params *params;
+	u32 supported_flags = SUPPORTED_DEVICE_FLAGS();
+	u32 current_flags = 0;
+	u8 status = MGMT_STATUS_INVALID_PARAMS;
+
+	bt_dev_dbg(hdev, "Get device flags %pMR (type 0x%x)\n",
+		   &cp->addr.bdaddr, cp->addr.type);
+
+	if (cp->addr.type == BDADDR_BREDR) {
+		br_params = hci_bdaddr_list_lookup_with_flags(&hdev->whitelist,
+							      &cp->addr.bdaddr,
+							      cp->addr.type);
+		if (!br_params)
+			goto done;
+
+		current_flags = br_params->current_flags;
+	} else {
+		params = hci_conn_params_lookup(hdev, &cp->addr.bdaddr,
+						le_addr_type(cp->addr.type));
+
+		if (!params)
+			goto done;
+
+		current_flags = params->current_flags;
+	}
+
+	bacpy(&rp.addr.bdaddr, &cp->addr.bdaddr);
+	rp.addr.type = cp->addr.type;
+	rp.supported_flags = cpu_to_le32(supported_flags);
+	rp.current_flags = cpu_to_le32(current_flags);
+
+	status = MGMT_STATUS_SUCCESS;
+
+done:
+	return mgmt_cmd_complete(sk, hdev->id, MGMT_OP_GET_DEVICE_FLAGS, status,
+				&rp, sizeof(rp));
+}
+
+static int device_flags_changed(struct hci_dev *hdev, bdaddr_t *bdaddr,
+				u8 bdaddr_type, u32 supported_flags,
+				u32 current_flags, struct sock *skip)
+{
+	struct mgmt_ev_device_flags_changed ev;
+
+	bacpy(&ev.addr.bdaddr, bdaddr);
+	ev.addr.type = bdaddr_type;
+	ev.supported_flags = cpu_to_le32(supported_flags);
+	ev.current_flags = cpu_to_le32(current_flags);
+
+	return mgmt_limited_event(MGMT_EV_DEVICE_FLAGS_CHANGED, hdev, &ev,
+				  sizeof(ev), HCI_MGMT_DEVICE_FLAGS_EVENTS,
+				  skip);
+}
+
+static int set_device_flags(struct sock *sk, struct hci_dev *hdev, void *data,
+			    u16 len)
+{
+	struct mgmt_cp_set_device_flags *cp = data;
+	struct bdaddr_list_with_flags *br_params;
+	struct hci_conn_params *params;
+	u8 status = MGMT_STATUS_INVALID_PARAMS;
+	u32 supported_flags = SUPPORTED_DEVICE_FLAGS();
+	u32 current_flags = __le32_to_cpu(cp->current_flags);
+
+	bt_dev_dbg(hdev, "Set device flags %pMR (type 0x%x) = 0x%x",
+		   &cp->addr.bdaddr, cp->addr.type,
+		   __le32_to_cpu(current_flags));
+
+	if ((supported_flags | current_flags) != supported_flags) {
+		bt_dev_warn(hdev, "Bad flag given (0x%x) vs supported (0x%0x)",
+			    current_flags, supported_flags);
+		goto done;
+	}
+
+	if (cp->addr.type == BDADDR_BREDR) {
+		br_params = hci_bdaddr_list_lookup_with_flags(&hdev->whitelist,
+							      &cp->addr.bdaddr,
+							      cp->addr.type);
+
+		if (br_params) {
+			br_params->current_flags = current_flags;
+			status = MGMT_STATUS_SUCCESS;
+		} else {
+			bt_dev_warn(hdev, "No such BR/EDR device %pMR (0x%x)",
+				    &cp->addr.bdaddr, cp->addr.type);
+		}
+	} else {
+		params = hci_conn_params_lookup(hdev, &cp->addr.bdaddr,
+						le_addr_type(cp->addr.type));
+		if (params) {
+			params->current_flags = current_flags;
+			status = MGMT_STATUS_SUCCESS;
+		} else {
+			bt_dev_warn(hdev, "No such LE device %pMR (0x%x)",
+				    &cp->addr.bdaddr,
+				    le_addr_type(cp->addr.type));
+		}
+	}
+
+done:
+	if (status == MGMT_STATUS_SUCCESS)
+		device_flags_changed(hdev, &cp->addr.bdaddr, cp->addr.type,
+				     supported_flags, current_flags, sk);
+
+	return mgmt_cmd_complete(sk, hdev->id, MGMT_OP_SET_DEVICE_FLAGS, status,
+				 &cp->addr, sizeof(cp->addr));
+}
+
 static void read_local_oob_data_complete(struct hci_dev *hdev, u8 status,
 				         u16 opcode, struct sk_buff *skb)
 {
@@ -5970,7 +6089,9 @@ static int add_device(struct sock *sk, struct hci_dev *hdev,
 {
 	struct mgmt_cp_add_device *cp = data;
 	u8 auto_conn, addr_type;
+	struct hci_conn_params *params;
 	int err;
+	u32 current_flags = 0;
 
 	bt_dev_dbg(hdev, "sock %p", sk);
 
@@ -6038,12 +6159,19 @@ static int add_device(struct sock *sk, struct hci_dev *hdev,
 					MGMT_STATUS_FAILED, &cp->addr,
 					sizeof(cp->addr));
 		goto unlock;
+	} else {
+		params = hci_conn_params_lookup(hdev, &cp->addr.bdaddr,
+						addr_type);
+		if (params)
+			current_flags = params->current_flags;
 	}
 
 	hci_update_background_scan(hdev);
 
 added:
 	device_added(sk, hdev, &cp->addr.bdaddr, cp->addr.type, cp->action);
+	device_flags_changed(hdev, &cp->addr.bdaddr, cp->addr.type,
+			     SUPPORTED_DEVICE_FLAGS(), current_flags, NULL);
 
 	err = mgmt_cmd_complete(sk, hdev->id, MGMT_OP_ADD_DEVICE,
 				MGMT_STATUS_SUCCESS, &cp->addr,
@@ -7306,6 +7434,12 @@ static const struct hci_mgmt_handler mgmt_handlers[] = {
 						HCI_MGMT_UNTRUSTED },
 	{ set_def_system_config,   MGMT_SET_DEF_SYSTEM_CONFIG_SIZE,
 						HCI_MGMT_VAR_LEN },
+
+	{ NULL }, /* Read default runtime config */
+	{ NULL }, /* Set default runtime config */
+
+	{ get_device_flags,        MGMT_GET_DEVICE_FLAGS_SIZE },
+	{ set_device_flags,        MGMT_SET_DEVICE_FLAGS_SIZE },
 };
 
 void mgmt_index_added(struct hci_dev *hdev)
-- 
2.27.0.290.gba653c62da-goog


^ permalink raw reply related

* [PATCH net v2] bareudp: Fixed multiproto mode configuration
From: Martin Varghese @ 2020-06-17  4:31 UTC (permalink / raw)
  To: netdev, davem; +Cc: Martin

From: Martin <martin.varghese@nokia.com>

Code to handle multiproto configuration is missing.

Signed-off-by: Martin <martin.varghese@nokia.com>
---
Changes in v2:
     - Initialization of conf structure is removed as that change is included
       in another patch.

 drivers/net/bareudp.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/bareudp.c b/drivers/net/bareudp.c
index 5d3c691..3dd46cd 100644
--- a/drivers/net/bareudp.c
+++ b/drivers/net/bareudp.c
@@ -572,6 +572,9 @@ static int bareudp2info(struct nlattr *data[], struct bareudp_conf *conf,
 	if (data[IFLA_BAREUDP_SRCPORT_MIN])
 		conf->sport_min =  nla_get_u16(data[IFLA_BAREUDP_SRCPORT_MIN]);
 
+	if (data[IFLA_BAREUDP_MULTIPROTO_MODE])
+		conf->multi_proto_mode = true;
+
 	return 0;
 }
 
-- 
1.8.3.1


^ permalink raw reply related

* Re: [PATCH 01/02] net: phy: marvell: Add Marvell 88E1340 support
From: Maxim Kochetkov @ 2020-06-17  4:49 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: netdev, f.fainelli, hkallweit1, linux
In-Reply-To: <20200616145459.GA197468@lunn.ch>

Yes. This is 88E1340S. My mistake.

16.06.2020 17:54, Andrew Lunn wrote:
> On Tue, Jun 16, 2020 at 10:01:11AM +0300, Maxim Kochetkov wrote:
>> Add Marvell 88E1340 support
> 
> Hi Maxim
> 
> Are you sure this is an 88E1340, not a 88E1340S?
> 
> Marvells DSDT SDK has:
> 
>      MAD_88E1340S = 0x1C,    /* 88E1340S */
>      MAD_88E1340  = 0x1e,    /* 88E1340/x0a */
>      MAD_88E1340M = 0x1f,    /* 88E1340M/x0a */
>   
>>   #define MARVELL_PHY_ID_88E1149R		0x01410e50
>>   #define MARVELL_PHY_ID_88E1240		0x01410e30
>>   #define MARVELL_PHY_ID_88E1318S		0x01410e90
>> +#define MARVELL_PHY_ID_88E1340		0x01410dc0
>>   #define MARVELL_PHY_ID_88E1116R		0x01410e40
>>   #define MARVELL_PHY_ID_88E1510		0x01410dd0
>>   #define MARVELL_PHY_ID_88E1540		0x01410eb0
> 
> For 88E1340 i would expect the ID to be 0x01410de0
> 
>      Andrew
> 

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox